Commit Graph

88960 Commits

Author SHA1 Message Date
Eric Dumazet
9d18562a22 fq_codel: add batch ability to fq_codel_drop()
In presence of inelastic flows and stress, we can call
fq_codel_drop() for every packet entering fq_codel qdisc.

fq_codel_drop() is quite expensive, as it does a linear scan
of 4 KB of memory to find a fat flow.
Once found, it drops the oldest packet of this flow.

Instead of dropping a single packet, try to drop 50% of the backlog
of this fat flow, with a configurable limit of 64 packets per round.

TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this
limit configurable.

With this strategy the 4 KB search is amortized to a single cache line
per drop [1], so fq_codel_drop() no longer appears at the top of kernel
profile in presence of few inelastic flows.

[1] Assuming a 64byte cache line, and 1024 buckets

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Taht <dave.taht@gmail.com>
Cc: Jonathan Morton <chromatix99@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Dave Taht
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03 12:47:09 -04:00
Bjorn Helgaas
58f8b094e9 Merge branches 'pci/host-armada', 'pci/host-designware', 'pci/host-hv', 'pci/host-imx6', 'pci/host-keystone', 'pci/host-mvebu', 'pci/host-rcar', 'pci/host-thunder' and 'pci/host-vmd' into next
* pci/host-armada:
  PCI: armada: Add driver for Marvell Armada 7K/8K PCIe controller
  dt-bindings: pci: add DT binding for Marvell Armada 7K/8K PCIe controller

* pci/host-designware:
  PCI: designware: Remove incorrect RC memory base/limit configuration
  PCI: designware: Move Root Complex setup code to dw_pcie_setup_rc()

* pci/host-hv:
  PCI: hv: Report resources release after stopping the bus

* pci/host-imx6:
  ARM: dts: imx6qp: Specify imx6qp version of PCIe core
  PCI: imx6: Implement reset sequence for i.MX6+
  PCI: imx6: Use enum instead of bool for variant indicator
  PCI: imx6: Add DT property for link gen, default to Gen1
  PCI: imx6: Add reset-gpio-active-high boolean property to DT
  ARM: dts: imx6: Fix PCIe reset GPIO polarity on Toradex Apalis Ixora
  PCI: imx6: Add initial imx6sx support
  PCI: imx6: Factor out ref clock enable
  Revert "PCI: imx6: Add support for active-low reset GPIO"

* pci/host-keystone:
  PCI: keystone: Remove unnecessary goto statement
  PCI: keystone: Add error IRQ handler

* pci/host-mvebu:
  PCI: mvebu: Use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS for mvebu_pcie_pm_ops
  PCI: mvebu: Constify mvebu_pcie_pm_ops structure

* pci/host-rcar:
  PCI: rcar: Select PCI_MSI_IRQ_DOMAIN

* pci/host-thunder:
  PCI: thunder: Don't clobber read-only bits in bridge config registers

* pci/host-vmd:
  PCI: Remove return values from pcie_port_platform_notify() and relatives
  PCI/ACPI: Allow all PCIe services on non-ACPI host bridges
2016-05-03 11:42:30 -05:00
Keith Busch
26e5157133 PCI: Add Downstream Port Containment driver
Add driver for the PCI Express Downstream Port Containment extended
capability.  DPC is an optional capability to contain uncorrectable errors
below a port.

For more information on DPC, please see PCI Express Base Specification
Revision 4, section 7.31, or view the PCI-SIG DPC ECN here:

  https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf

When a DPC event is triggered, the hardware disables downstream links, so
the DPC driver schedules removal for all devices below this port.  This may
happen concurrently with a PCIe hotplug driver if enabled.  When all
downstream devices are removed and the link state transitions to disabled,
the DPC driver clears the DPC status and interrupt bits so the link may
retrain for a newly connected device.

[bhelgaas: clear (not set) DPC_CTL bits on remove, whitespace cleanup]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
2016-05-03 10:39:24 -05:00
Keith Busch
10126ac14d PCI: Add Downstream Port Containment portdrv service type
Add the Downstream Port Containment (PCIE_PORT_SERVICE_DPC) portdrv service
type, available if the device has the DPC extended capability.

[bhelgaas: split to separate patch, changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-05-03 10:35:49 -05:00
Boris Brezillon
e39c0df1be pwm: Introduce the pwm_args concept
Currently the PWM core mixes the current PWM state with the per-platform
reference config (specified through the PWM lookup table, DT definition
or directly hardcoded in PWM drivers).

Create a struct pwm_args to store this reference configuration, so that
PWM users can differentiate between the current and reference
configurations.

Patch all places where pwm->args should be initialized. We keep the
pwm_set_polarity/period() calls until all PWM users are patched to use
pwm_args instead of pwm_get_period/polarity().

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
[thierry.reding@gmail.com: reword kerneldoc comments]
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2016-05-03 13:44:37 +02:00
Julien Grall
a53d892dfb clocksource: arm_arch_timer: Remove arch_timer_get_timecounter
The only call of arch_timer_get_timecounter (in KVM) has been removed.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
503a62862e KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware tables
Currently, the firmware tables are parsed 2 times: once in the GIC
drivers, the other time when initializing the vGIC. It means code
duplication and make more tedious to add the support for another
firmware table (like ACPI).

Use the recently introduced helper gic_get_kvm_info() to get
information about the virtual GIC.

With this change, the virtual GIC becomes agnostic to the firmware
table and KVM will be able to initialize the vGIC on ACPI.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
1839e57696 irqchip/gic-v3: Parse and export virtual GIC information
Fill up the recently introduced gic_kvm_info with the hardware
information used for virtualization.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
502d6df11a irqchip/gic-v2: Parse and export virtual GIC information
For now, the firmware tables are parsed 2 times: once in the GIC
drivers, the other timer when initializing the vGIC. It means code
duplication and make more tedious to add the support for another
firmware table (like ACPI).

Introduce a new structure and set of helpers to get/set the virtual GIC
information. Also fill up the structure for GICv2.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
d9b5e41591 clocksource: arm_arch_timer: Extend arch_timer_kvm_info to get the virtual IRQ
Currently, the firmware table is parsed by the virtual timer code in
order to retrieve the virtual timer interrupt. However, this is already
done by the arch timer driver.

To avoid code duplication, extend arch_timer_kvm_info to get the virtual
IRQ.

Note that the KVM code will be modified in a subsequent patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
b4d6ce9776 clocksource: arm_arch_timer: Gather KVM specific information in a structure
Introduce a structure which are filled up by the arch timer driver and
used by the virtual timer in KVM.

The first member of this structure will be the timecounter. More members
will be added later.

A stub for the new helper isn't introduced because KVM requires the arch
timer for both ARM64 and ARM32.

The function arch_timer_get_timecounter is kept for the time being and
will be dropped in a subsequent patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:20 +02:00
Stas Sergeev
2a74213838 signals/sigaltstack: Implement SS_AUTODISARM flag
This patch implements the SS_AUTODISARM flag that can be OR-ed with
SS_ONSTACK when forming ss_flags.

When this flag is set, sigaltstack will be disabled when entering
the signal handler; more precisely, after saving sas to uc_stack.
When leaving the signal handler, the sigaltstack is restored by
uc_stack.

When this flag is used, it is safe to switch from sighandler with
swapcontext(). Without this flag, the subsequent signal will corrupt
the state of the switched-away sighandler.

To detect the support of this functionality, one can do:

  err = sigaltstack(SS_DISABLE | SS_AUTODISARM);
  if (err && errno == EINVAL)
	unsupported();

Signed-off-by: Stas Sergeev <stsp@list.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Amanieu d'Antras <amanieu@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jason Low <jason.low2@hp.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: linux-api@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460665206-13646-4-git-send-email-stsp@list.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-03 08:37:59 +02:00
Stas Sergeev
407bc16ad1 signals/sigaltstack: Prepare to add new SS_xxx flags
This patch adds SS_FLAG_BITS - the mask that splits sigaltstack
mode values and bit-flags. Since there is no bit-flags yet, the
mask is defined to 0. The flags are added by subsequent patches.
With every new flag, the mask should have the appropriate bit cleared.

This makes sure if some flag is tried on a kernel that doesn't
support it, the -EINVAL error will be returned, because such a
flag will be treated as an invalid mode rather than the bit-flag.

That way the existence of the particular features can be probed
at run-time.

This change was suggested by Andy Lutomirski:

  https://lkml.org/lkml/2016/3/6/158

Signed-off-by: Stas Sergeev <stsp@list.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amanieu d'Antras <amanieu@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: linux-api@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460665206-13646-3-git-send-email-stsp@list.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-03 08:37:59 +02:00
Eric Dumazet
9580bf2edb net: relax expensive skb_unclone() in iptunnel_handle_offloads()
Locally generated TCP GSO packets having to go through a GRE/SIT/IPIP
tunnel have to go through an expensive skb_unclone()

Reallocating skb->head is a lot of work.

Test should really check if a 'real clone' of the packet was done.

TCP does not care if the original gso_type is changed while the packet
travels in the stack.

This adds skb_header_unclone() which is a variant of skb_clone()
using skb_header_cloned() check instead of skb_cloned().

This variant can probably be used from other points.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03 00:22:19 -04:00
Florian Westphal
c0ef079ca7 netdevice: shrink size of struct netdev_queue
- trans_timeout is incremented when tx queue timed out (tx watchdog).
- tx_maxrate is set via sysfs

Moving tx_maxrate to read-mostly part shrinks the struct by 64 bytes.
While at it, also move trans_timeout (it is out-of-place in the
'write-mostly' part).

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 22:51:41 -04:00
Nikolay Aleksandrov
a60c090361 bridge: netlink: export per-vlan stats
Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
get_linkxstats_size) in order to export the per-vlan stats.
The paddings were added because soon these fields will be needed for
per-port per-vlan stats (or something else if someone beats me to it) so
avoiding at least a few more netlink attributes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 22:27:06 -04:00
Nikolay Aleksandrov
6dada9b10a bridge: vlan: learn to count
Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
allocated a per-cpu stats which is then set in each per-port vlan context
for quick access. The br_allowed_ingress() common function is used to
account for Rx packets and the br_handle_vlan() common function is used
to account for Tx packets. Stats accounting is performed only if the
bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
A struct hole between vlan_enabled and vlan_proto is used for the new
option so it is in the same cache line. Currently it is binary (on/off)
but it is intentionally restricted to exactly 0 and 1 since other values
will be used in the future for different purposes (e.g. per-port stats).

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 22:27:06 -04:00
Nikolay Aleksandrov
97a47facf3 net: rtnetlink: add linkxstats callbacks and attribute
Add callbacks to calculate the size and fill link extended statistics
which can be split into multiple messages and are dumped via the new
rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute.
Also add that attribute to the idx mask check since it is expected to
be able to save state and resume dumping (e.g. future bridge per-vlan
stats will be dumped via this attribute and callbacks).
Each link type should nest its private attributes under the per-link type
attribute. This allows to have any number of separated private attributes
and to avoid one call to get the dev link type.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 22:27:06 -04:00
Chanwoo Choi
996133119f PM / devfreq: Add new passive governor
This patch adds the new passive governor for DEVFREQ framework. The following
governors are already present and used for DVFS (Dynamic Voltage and Frequency
Scaling) drivers. The following governors are independently used for one device
driver which don't give the influence to other device drviers and also don't
receive the effect from other device drivers.
- ondemand / performance / powersave / userspace

The passive governor depends on operation of parent driver with specific
governos extremely and is not able to decide the new frequency by oneself.
According to the decided new frequency of parent driver with governor,
the passive governor uses it to decide the appropriate frequency for own
device driver. The passive governor must need the following information
from device tree:
- the source clock and OPP tables
- the instance of parent device

For exameple,
there are one more devfreq device drivers which need to change their source
clock according to their utilization on runtime. But, they share the same
power line (e.g., regulator). So, specific device driver is operated as parent
with ondemand governor and then the rest device driver with passive governor
is influenced by parent device.

Suggested-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
[tjakobi: Reported RCU locking issue and cw00.choi fix it]
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
[linux.amoon: Reported possible recursive locking and cw00.choi fix it]
Reported-by: Anand Moon <linux.amoon@gmail.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-05-03 11:20:07 +09:00
Chanwoo Choi
0fe3a66410 PM / devfreq: Add new DEVFREQ_TRANSITION_NOTIFIER notifier
This patch adds the new DEVFREQ_TRANSITION_NOTIFIER notifier to send
the notification when the frequency of device is changed.
This notifier has two state as following:
- DEVFREQ_PRECHANGE  : Notify it before chaning the frequency of device
- DEVFREQ_POSTCHANGE : Notify it after changed the frequency of device

And this patch adds the resourced-managed function to release the resource
automatically when error happen.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
[m.reichl and linux.amoon: Tested it on exynos4412-odroidu3 board]
Tested-by: Markus Reichl <m.reichl@fivetechno.de>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-05-03 11:20:07 +09:00
Chanwoo Choi
8f510aeb22 PM / devfreq: Add devfreq_get_devfreq_by_phandle()
This patch adds the new devfreq_get_devfreq_by_phandle() OF helper function
which can find the instance of devfreq device by using phandle ("devfreq").

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
[m.reichl and linux.amoon: Tested it on exynos4412-odroidu3 board]
Tested-by: Markus Reichl <m.reichl@fivetechno.de>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2016-05-03 11:20:06 +09:00
Steven Rostedt (Red Hat)
dcb0b5575d tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-02 21:30:04 -04:00
Stephen Boyd
5bc7532497 Merge tag 'tegra-for-4.7-clk' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into clk-next
Pull tegra clk driver changes from Thierry Reding:

This set of changes contains a bunch of cleanups and minor fixes along
with some new clocks, mainly on Tegra210, in preparation for supporting
DisplayPort and HDMI 2.0.

* tag 'tegra-for-4.7-clk' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  clk: tegra: dfll: Reformat CVB frequency table
  clk: tegra: dfll: Properly clean up on failure and removal
  clk: tegra: dfll: Make code more comprehensible
  clk: tegra: dfll: Reference CVB table instead of copying data
  clk: tegra: dfll: Update kerneldoc
  clk: tegra: Fix PLL_U post divider and initial rate on Tegra30
  clk: tegra: Initialize PLL_C to sane rate on Tegra30
  clk: tegra: Fix pllre Tegra210 and add pll_re_out1
  clk: tegra: Add sor_safe clock
  clk: tegra: dpaux and dpaux1 are fixed factor clocks
  clk: tegra: Add dpaux1 clock
  clk: tegra: Use correct parent for dpaux clock
  clk: tegra: Add fixed factor peripheral clock type
  clk: tegra: Special-case mipi-cal parent on Tegra114
  clk: tegra: Remove trailing blank line
  clk: tegra: Constify peripheral clock registers
  clk: tegra: Add interface to enable hardware control of SATA/XUSB PLLs
2016-05-02 16:53:02 -07:00
Al Viro
df889b3631 Merge branch 'for-linus' into work.lookups 2016-05-02 19:49:46 -04:00
Al Viro
6192269444 introduce a parallel variant of ->iterate()
New method: ->iterate_shared().  Same arguments as in ->iterate(),
called with the directory locked only shared.  Once all filesystems
switch, the old one will be gone.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:29 -04:00
Al Viro
63b6df1413 give readdir(2)/getdents(2)/etc. uniform exclusion with lseek()
same as read() on regular files has, and for the same reason.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:28 -04:00
Al Viro
9902af79c0 parallel lookups: actual switch to rwsem
ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:28 -04:00
Al Viro
d9171b9345 parallel lookups machinery, part 4 (and last)
If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:27 -04:00
Al Viro
94bdd655ca parallel lookups machinery, part 3
We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:27 -04:00
Al Viro
84e710da2a parallel lookups machinery, part 2
We'll need to verify that there's neither a hashed nor in-lookup
dentry with desired parent/name before adding to in-lookup set.

One possible solution would be to hold the parent's ->d_lock through
both checks, but while the in-lookup set is relatively small at any
time, dcache is not.  And holding the parent's ->d_lock through
something like __d_lookup_rcu() would suck too badly.

So we leave the parent's ->d_lock alone, which means that we watch
out for the following scenario:
	* we verify that there's no hashed match
	* existing in-lookup match gets hashed by another process
	* we verify that there's no in-lookup matches and decide
that everything's fine.

Solution: per-directory kinda-sorta seqlock, bumped around the times
we hash something that used to be in-lookup or move (and hash)
something in place of in-lookup.  Then the above would turn into
	* read the counter
	* do dcache lookup
	* if no matches found, check for in-lookup matches
	* if there had been none of those either, check if the
counter has changed; repeat if it has.

The "kinda-sorta" part is due to the fact that we don't have much spare
space in inode.  There is a spare word (shared with i_bdev/i_cdev/i_pipe),
so the counter part is not a problem, but spinlock is a different story.

We could use the parent's ->d_lock, and it would be less painful in
terms of contention, for __d_add() it would be rather inconvenient to
grab; we could do that (using lock_parent()), but...

Fortunately, we can get serialization on the counter itself, and it
might be a good idea in general; we can use cmpxchg() in a loop to
get from even to odd and smp_store_release() from odd to even.

This commit adds the counter and updating logics; the readers will be
added in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:26 -04:00
Al Viro
85c7f81041 beginning of transition to parallel lookups - marking in-lookup dentries
marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:51 -04:00
Al Viro
84695ffee7 Merge getxattr prototype change into work.lookups
The rest of work.xattr stuff isn't needed for this branch
2016-05-02 19:45:47 -04:00
Stephen Boyd
5569aedf1d Merge tag 'v4.7-rockchip-clk3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-next
Pull rockchip clk updates from Heiko Stuebner:

A spelling fix and a bunch of rk3399 clock fixes.

* tag 'v4.7-rockchip-clk3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  clk: rockchip: fix the rk3399 cifout clock
  clk: rockchip: drop unnecessary CLK_IGNORE_UNUSED flags from rk3399
  clk: rockchip: add some frequencies on the rk3399 PLL table
  clk: rockchip: assign more necessary rk3399 clock ids
  clk: rockchip: export some necessary rk3399 clock ids
  clk: rockchip: rename rga clock-id on rk3399
  clk: rockchip: add general gpu soft-reset on rk3399
  clk: rockchip: fix the gate bit for i2c4 and i2c8 on rk3399
  clk: rockchip: fix of spelling mistake on unsuccessful in pll clock type
2016-05-02 16:43:03 -07:00
Tom Herbert
79ecb90e65 ipv6: Generic tunnel cleanup
A few generic changes to generalize tunnels in IPv6:
  - Export ip6_tnl_change_mtu so that it can be called by ip6_gre
  - Add tun_hlen to ip6_tnl structure.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 19:23:32 -04:00
Tom Herbert
182a352d2d gre: Create common functions for transmit
Create common functions for both IPv4 and IPv6 GRE in transmit. These
are put into gre.h.

Common functions are for:
  - GRE checksum calculation. Move gre_checksum to gre.h.
  - Building a GRE header. Move GRE build_header and rename
    gre_build_header.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 19:23:31 -04:00
Tom Herbert
8eb30be035 ipv6: Create ip6_tnl_xmit
This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other
users like GRE will be able to call this. The original ip6_tnl_xmit
function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit
function).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 19:23:31 -04:00
Tom Herbert
95f5c64c3c gre: Move utility functions to common headers
Several of the GRE functions defined in net/ipv4/ip_gre.c are usable
for IPv6 GRE implementation (that is they are protocol agnostic).

These include:
  - GRE flag handling functions are move to gre.h
  - GRE build_header is moved to gre.h and renamed gre_build_header
  - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header
  - iptunnel_pull_header is taken out of gre_parse_header. This is now
    done by caller. The header length is returned from gre_parse_header
    in an int* argument.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 19:23:31 -04:00
Tom Herbert
0d3c703a9d ipv6: Cleanup IPv6 tunnel receive path
Some basic changes to make IPv6 tunnel receive path look more like
IPv4 path:
  - Make ip6_tnl_rcv non-static so that GREv6 and others can call it
  - Make ip6_tnl_rcv look like ip_tunnel_rcv
  - Switch to gro_cells_receive
  - Make ip6_tnl_rcv non-static and export it

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 19:23:31 -04:00
Eric Dumazet
d41a69f1d3 tcp: make tcp_sendmsg() aware of socket backlog
Large sendmsg()/write() hold socket lock for the duration of the call,
unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
are parked into socket backlog for a long time.
Critical decisions like fast retransmit might be delayed.
Receivers have to maintain a big out of order queue with additional cpu
overhead, and also possible stalls in TX once windows are full.

Bidirectional flows are particularly hurt since the backlog can become
quite big if the copy from user space triggers IO (page faults)

Some applications learnt to use sendmsg() (or sendmmsg()) with small
chunks to avoid this issue.

Kernel should know better, right ?

Add a generic sk_flush_backlog() helper and use it right
before a new skb is allocated. Typically we put 64KB of payload
per skb (unless MSG_EOR is requested) and checking socket backlog
every 64KB gives good results.

As a matter of fact, tests with TSO/GSO disabled give very nice
results, as we manage to keep a small write queue and smaller
perceived rtt.

Note that sk_flush_backlog() maintains socket ownership,
so is not equivalent to a {release_sock(sk); lock_sock(sk);},
to ensure implicit atomicity rules that sendmsg() was
giving to (possibly buggy) applications.

In this simple implementation, I chose to not call tcp_release_cb(),
but we might consider this later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 17:02:26 -04:00
Linus Torvalds
689de1d6ca Minimal fix-up of bad hashing behavior of hash_64()
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.

In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much.  In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.

There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem.  It
simply picks a much better constant multiplier, spreading the bits out a
lot better.

NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive.  The bigger hashing
cleanup will replace the 32-bit case with something better.

The new constants were picked by George Spelvin who wrote that bigger
cleanup series.  I just picked out the constants and part of the comment
from that series.

Cc: stable@vger.kernel.org
Cc: George Spelvin <linux@horizon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-02 13:01:51 -07:00
Andrey Smirnov
4d31c6109a PCI: imx6: Implement reset sequence for i.MX6+
I.MX6+ has a dedicated bit for resetting PCIe core, which should be used
instead of a regular reset sequence since using the latter will hang the
SoC.

This commit is based on c34068d48273e24d392d9a49a38be807954420ed from
http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git

Tested-by: Gary Bisson <gary.bisson@boundarydevices.com>
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
2016-05-02 14:33:17 -05:00
Linus Torvalds
9c5d1bc2b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips,
    from Sara Sharon.

 2) Fix SKB size checks in batman-adv stack on receive, from Sven
    Eckelmann.

 3) Leak fix on mac80211 interface add error paths, from Johannes Berg.

 4) Cannot invoke napi_disable() with BH disabled in myri10ge driver,
    fix from Stanislaw Gruszka.

 5) Fix sign extension problem when computing feature masks in
    net_gso_ok(), from Marcelo Ricardo Leitner.

 6) lan78xx driver doesn't count packets and packet lengths in its
    statistics properly, fix from Woojung Huh.

 7) Fix the buffer allocation sizes in pegasus USB driver, from Petko
    Manolov.

 8) Fix refcount overflows in bpf, from Alexei Starovoitov.

 9) Unified dst cache handling introduced a preempt warning in
    ip_tunnel, fix by resetting rather then setting the cached route.
    From Paolo Abeni.

10) Listener hash collision test fix in soreuseport, from Craig Gallak

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  gre: do not pull header in ICMP error processing
  net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
  tipc: only process unicast on intended node
  cxgb3: fix out of bounds read
  net/smscx5xx: use the device tree for mac address
  soreuseport: Fix TCP listener hash collision
  net: l2tp: fix reversed udp6 checksum flags
  ip_tunnel: fix preempt warning in ip tunnel creation/updating
  samples/bpf: fix trace_output example
  bpf: fix check_map_func_compatibility logic
  bpf: fix refcnt overflow
  drivers: net: cpsw: use of_phy_connect() in fixed-link case
  dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
  drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
  drivers: net: cpsw: fix segfault in case of bad phy-handle
  drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
  MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
  gre: reject GUE and FOU in collect metadata mode
  pegasus: fixes reported packet length
  pegasus: fixes URB buffer allocation size;
  ...
2016-05-02 09:40:42 -07:00
William Breathitt Gray
d9a9c6172d isa: Implement the max_num_isa_dev macro
max_num_isa_dev is a macro to determine the maximum possible number of
ISA devices which may be registered in the I/O port address space given
the address extent of the ISA devices.

The highest base address possible for an ISA device is 0x3FF; this
results in 1024 possible base addresses. Dividing the number of possible
base addresses by the address extent taken by each device results in the
maximum number of devices on a system.

Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-02 09:32:04 -07:00
William Breathitt Gray
339e6e31d2 isa: Implement the module_isa_driver macro
The module_isa_driver macro is a helper macro for ISA drivers which do
not do anything special in module init/exit. This eliminates a lot of
boilerplate code. Each module may only use this macro once, and calling
it replaces module_init and module_exit.

Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-02 09:32:04 -07:00
Christoph Hellwig
38f2525533 block: add __blkdev_issue_discard
This is a version of blkdev_issue_discard which doesn't wait for
the I/O to complete, but instead allows the caller to submit
the final bio and/or chain it to others.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:19:46 -06:00
Wang Sheng-Hui
a5b714ad39 NVMe: correct comment for offset enum of controller registers in nvme.h
Section 3.1 gives the comment for the offset of controller registers
in the specification 1.2a.

Some are mis-copied in the header file nvme.h. Correct them.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:13:35 -06:00
Maarten Lankhorst
b837ba0ad9 drm/atomic: Rename drm_atomic_async_commit to nonblocking.
Another step in renaming async to nonblocking for atomic commit.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1461679905-30177-3-git-send-email-maarten.lankhorst@linux.intel.com
2016-05-02 16:36:03 +02:00
Maarten Lankhorst
286dbb8d5d drm/atomic: Rename async parameter to nonblocking.
This is the first step of renaming async commit to nonblocking commit.
The flag passed by userspace is NONBLOCKING, and async has a different
meaning for page flips, where it means as soon as possible.

Fixing up comments in drm core is done manually, to make sure I didn't
miss anything.

For drivers, the following cocci script is used to rename bool async to bool
nonblock:
@@
identifier I =~ "^async";
identifier func;
@@
func(..., bool
- I
+ nonblock
, ...)
{
<...
- I
+ nonblock
...>
}
@@
identifier func;
type T;
identifier I =~ "^async";
@@
T func(..., bool
- I
+ nonblock
, ...);

Thanks to Tvrtko Ursulin for the cocci script.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1461679905-30177-2-git-send-email-maarten.lankhorst@linux.intel.com
2016-05-02 16:35:49 +02:00
Noralf Trønnes
199c77179c drm/fb-cma-helper: Add fb_deferred_io support
This adds fbdev deferred io support if CONFIG_FB_DEFERRED_IO is enabled.
The driver has to provide a (struct drm_framebuffer_funcs *)->dirty()
callback to get notification of fbdev framebuffer changes.
If the dirty() hook is set, then fb_deferred_io is set up automatically
by the helper.

Two functions have been added so that the driver can provide a dirty()
function:
- drm_fbdev_cma_init_with_funcs()
  This makes it possible for the driver to provided a custom
  (struct drm_fb_helper_funcs *)->fb_probe() function.
- drm_fbdev_cma_create_with_funcs()
  This is used by the .fb_probe hook to set a driver provided
  (struct drm_framebuffer_funcs *)->dirty() function.

Cc: laurent.pinchart@ideasonboard.com
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1461856717-6476-6-git-send-email-noralf@tronnes.org
2016-05-02 16:25:08 +02:00
Noralf Trønnes
ba0263340a fbdev: fb_defio: Export fb_deferred_io_mmap
Export fb_deferred_io_mmap so drivers can change vma->vm_page_prot.
When the framebuffer memory is allocated using dma_alloc_writecombine()
instead of vmalloc(), I get cache syncing problems on ARM.
This solves it:

static int drm_fbdev_cma_deferred_io_mmap(struct fb_info *info,
					  struct vm_area_struct *vma)
{
	fb_deferred_io_mmap(info, vma);
	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

	return 0;
}

Could this have been done in the core?
Drivers that don't set (struct fb_ops *)->fb_mmap, gets a call to
fb_pgprotect() at the end of the default fb_mmap implementation
(drivers/video/fbdev/core/fbmem.c). This is an architecture specific
function that on many platforms uses pgprot_writecombine(), but not on
all. And looking at some of the fb_mmap implementations, some of them
sets vm_page_prot to nocache for instance, so I think the safest bet is
to do this in the driver and not in the fbdev core. And we can't call
fb_pgprotect() from fb_deferred_io_mmap() either because we don't have
access to the file pointer that powerpc needs.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1461856717-6476-5-git-send-email-noralf@tronnes.org
2016-05-02 16:24:49 +02:00