Commit Graph

48594 Commits

Author SHA1 Message Date
Lorenzo Pieralisi
d2a7926d42 PCI: Add pci_scan_root_bus_msi()
Add a pci_scan_root_bus_msi() interface so an arch can specify the MSI
controller up front.  This removes the need for a pcibios callback to set
the MSI controller later.

This is not exported because I'd like to replace the variety of "scan root
bus" interfaces with a single, more extensible interface that can handle
the MSI controller, domain, pci_ops, resources, etc.  I hope this interface
is temporary.

[bhelgaas: changelog, split into separate patch]
Suggested-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jingoo Han <jingoohan1@gmail.com>
2015-08-20 12:02:49 -05:00
Ingo Molnar
40a2ea1bd9 Merge branch 'perf/urgent' into perf/core, to pick up fixes before adding more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-20 11:48:56 +02:00
Tomi Valkeinen
6080a89357 fbdev: fix cea_modes array size
CEA defines 64 modes, indexed from 1 to 64. modedb has cea_modes arrays,
which contains 64 entries. However, the code uses the CEA indices
directly, i.e. the first mode is at cea_modes[1]. This means the array
is one too short.

This does not cause references to uninitialized memory as the code in
fbmon only allows indexes up to 63, and the cea_modes does not contain
an entry for the mode 64 so it could not be used in any case.

However, the code contains a check 'if (idx > ARRAY_SIZE(cea_modes)',
and while that check is a no-op as at that point idx cannot be >= 63, it
upsets static checkers.

Fix this by increasing the cea_array size to be 65, and change the code
to allow mode 64.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2015-08-20 10:20:11 +03:00
Geert Uytterhoeven
7dfffb9541 dmaengine: Stricter legacy checking in dma_request_slave_channel_compat()
dma_request_slave_channel_compat() is meant for drivers that support
both DT and legacy platform device based probing: if DT channel DMA
setup fails, it will fall back to platform data based DMA channel setup,
using hardcoded DMA channel IDs and a filter function.

However, if the DTS doesn't provide a "dmas" property for the device,
the fallback is also used. If the legacy filter function is not
hardcoded in the DMA slave driver, but comes from platform data, it will
be NULL. Then dma_request_slave_channel_compat() will succeed
incorrectly, and return a DMA channel, as a NULL legacy filter function
actually means "all channels are OK", not "do not match".

Later, when trying to use that DMA channel, it will fail with:

    rcar-dmac e6700000.dma-controller: rcar_dmac_prep_slave_sg: bad parameter: len=1, id=-22

To fix this, ensure that both the filter function and the DMA channel ID
are not NULL before using the legacy fallback.

Note that some DMA slave drivers can handle this failure, and will fall
back to PIO.

See also commit 056f6c8702 ("dmaengine: shdma: Make dummy
shdma_chan_filter() always return false"), which fixed the same issue
for the case where shdma_chan_filter() is hardcoded in a DMA slave
driver.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-08-20 12:01:03 +05:30
Trond Myklebust
2a606188c5 NFSv4: Enable delegated opens even when reboot recovery is pending
Unlike the previous attempt, this takes into account the fact that
we may be calling it from the recovery thread itself. Detect this
by looking at what kind of open we're doing, and checking the state
of the NFS_DELEGATION_NEED_RECLAIM if it turns out we're doing a
reboot reclaim-type open.

Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-19 23:01:54 -05:00
Grygorii Strashko
b7560de198 genirq: Introduce irq_chip_set_type_parent() helper
This helper is required for irq chips which do not implement a
irq_set_type callback and need to call down the irq domain hierarchy
for the actual trigger type change.

This helper is required to fix further wreckage caused by the
conversion of TI OMAP to hierarchical irq domains and therefor tagged
for stable.

[ tglx: Massaged changelog ]

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: <linux@arm.linux.org.uk>
Cc: <nsekhar@ti.com>
Cc: <jason@lakedaemon.net>
Cc: <balbi@ti.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: <tony@atomide.com>
Cc: <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org # 4.1
Link: http://lkml.kernel.org/r/1439554830-19502-3-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-20 00:25:25 +02:00
Keith Busch
03100aada9 block: Replace SG_GAPS with new queue limits mask
The SG_GAPS queue flag caused checks for bio vector alignment against
PAGE_SIZE, but the device may have different constraints. This patch
adds a queue limits so a driver with such constraints can set to allow
requests that would have been unnecessarily split. The new gaps check
takes the request_queue as a parameter to simplify the logic around
invoking this function.

This new limit makes the queue flag redundant, so removing it and
all usage. Device-mappers will inherit the correct settings through
blk_stack_limits().

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-19 14:26:02 -07:00
Tejun Heo
69d7fde590 blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
cgroup is trying to make interface consistent across different
controllers.  For weight based resource control, the knob should have
the range [1, 10000] and default to 100.  This patch updates
cfq-iosched so that the weight range conforms.  The internal
calculations have enough range and the widening of the weight range
shouldn't cause any problem.

* blkcg_policy->cpd_bind_fn() is added.  If present, this is invoked
  when blkcg is attached to a hierarchy.

* cfq_cpd_init() is updated to use the new default value on the
  unified hierarchy.

* cfq_cpd_bind() callback is implemented to clear per-blkg configs and
  apply the default config matching the hierarchy type.

* cfqd->root_group->[leaf_]weight initialization in cfq_init_queue()
  is moved into !CONFIG_CFQ_GROUP_IOSCHED block.  cfq_cpd_bind() is
  now responsible for initializing the initial weights when blkcg is
  enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:36 -07:00
Tejun Heo
2ee867dcfa blkcg: implement interface for the unified hierarchy
blkcg interface grew to be the biggest of all controllers and
unfortunately most inconsistent too.  The interface files are
inconsistent with a number of cloes duplicates.  Some files have
recursive variants while others don't.  There's distinction between
normal and leaf weights which isn't intuitive and there are a lot of
stat knobs which don't make much sense outside of debugging and expose
too much implementation details to userland.

In the unified hierarchy, everything is always hierarchical and
internal nodes can't have tasks rendering the two structural issues
twisting the current interface.  The interface has to be updated in a
significant anyway and this is a good chance to revamp it as a whole.
This patch implements blkcg interface for the unified hierarchy.

* (from a previous patch) blkcg is identified by "io" instead of
  "blkio" on the unified hierarchy.  Given that the whole interface is
  updated anyway, the rename shouldn't carry noticeable conversion
  overhead.

* The original interface consisted of 27 files is replaced with the
  following three files.

  blkio.stat	: per-blkcg stats
  blkio.weight	: per-cgroup and per-cgroup-queue weight settings
  blkio.max	: per-cgroup-queue bps and iops max limits

Documentation/cgroups/unified-hierarchy.txt updated accordingly.

v2: blkcg_policy->dfl_cftypes wasn't removed on
    blkcg_policy_unregister() corrupting the cftypes list.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:35 -07:00
Tejun Heo
dd165eb3bb blkcg: misc preparations for unified hierarchy interface
* Export blkg_dev_name()

* Drop unnecessary @cft from __cfq_set_weight().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:18 -07:00
Tejun Heo
36aa9e5f59 blkcg: move body parsing from blkg_conf_prep() to its callers
Currently, blkg_conf_prep() expects input to be of the following form

 MAJ:MIN NUM

and reads the NUM part into blkg_conf_ctx->v.  This is quite
restrictive and gets in the way in implementing blkcg interface for
the unified hierarchy.  This patch updates blkg_conf_prep() so that it
expects

 MAJ:MIN BODY_STR

where BODY_STR is an arbitrary string.  blkg_conf_ctx->v is replaced
with ->body which is a char pointer pointing to the start of BODY_STR.
Parsing of the body is moved to blkg_conf_prep()'s callers.

To allow using, for example, strsep() on blkg_conf_ctx->val, it is a
non-const pointer and to accommodate that const is dropped from @input
too.

This doesn't cause any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:18 -07:00
Tejun Heo
880f50e228 blkcg: mark existing cftypes as legacy
blkcg is about to grow interface for the unified hierarchy.  Add
legacy to existing cftypes.

* blkcg_policy->cftypes -> blkcg_policy->legacy_cftypes
* blk-cgroup.c:blkcg_files -> blkcg_legacy_files
* cfq-iosched.c:cfq_blkcg_files -> cfq_blkcg_legacy_files
* blk-throttle.c:throtl_files -> throtl_legacy_files

Pure renames.  No functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:18 -07:00
Tejun Heo
c165b3e3c7 blkcg: rename subsystem name from blkio to io
blkio interface has become messy over time and is currently the
largest.  In addition to the inconsistent naming scheme, it has
multiple stat files which report more or less the same thing, a number
of debug stat files which expose internal details which shouldn't have
been part of the public interface in the first place, recursive and
non-recursive stats and leaf and non-leaf knobs.

Both recursive vs. non-recursive and leaf vs. non-leaf distinctions
don't make any sense on the unified hierarchy as only leaf cgroups can
contain processes.  cgroups is going through a major interface
revision with the unified hierarchy involving significant fundamental
usage changes and given that a significant portion of the interface
doesn't make sense anymore, it's a good time to reorganize the
interface.

As the first step, this patch renames the external visible subsystem
name from "blkio" to "io".  This is more concise, matches the other
two major subsystem names, "cpu" and "memory", and better suited as
blkcg will be involved in anything writeback related too whether an
actual block device is involved or not.

As the subsystem legacy_name is set to "blkio", the only userland
visible change outside the unified hierarchy is that blkcg is reported
as "io" instead of "blkio" in the subsystem initialized message during
boot.  On the unified hierarchy, blkcg now appears as "io".

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:18 -07:00
Tejun Heo
77ea733884 blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
Currently, both cfq-iosched and blk-throttle keep track of
io_service_bytes and io_serviced stats.  While keeping track of them
separately may be useful during development, it doesn't make much
sense otherwise.  Also, blk-throttle was counting bio's as IOs while
cfq-iosched request's, which is more confusing than informative.

This patch adds ->stat_bytes and ->stat_ios to blkg (blkcg_gq),
removes the counterparts from cfq-iosched and blk-throttle and let
them print from the common blkg counters.  The common counters are
incremented during bio issue in blkcg_bio_issue_check().

The outputs are still filtered by whether the policy has
blkg_policy_data on a given blkg, so cfq's output won't show up if it
has never been used for a given blkg.  The only times when the outputs
would differ significantly are when policies are attached on the fly
or elevators are switched back and forth.  Those are quite exceptional
operations and I don't think they warrant keeping separate counters.

v3: Update blkio-controller.txt accordingly.

v2: Account IOs during bio issues instead of request completions so
    that bio-based drivers can be handled the same way.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
f12c74cab1 blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
Currently, blkg_[rw]stat_recursive_sum() assume that the target
counter is located in pd (blkg_policy_data); however, some counters
are planned to be moved to blkg (blkcg_gq).

This patch updates blkg_[rw]stat_recursive_sum() to take blkg and
blkg_policy pointers instead of pd.  If policy is NULL, it indexes
into blkg.  If non-NULL, into the blkg's pd of the policy.

The existing usages are updated to maintain the current behaviors.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
24bdb8ef06 blkcg: make blkcg_[rw]stat per-cpu
blkcg_[rw]stat are used as stat counters for blkcg policies.  It isn't
per-cpu by itself and blk-throttle makes it per-cpu by wrapping around
it.  This patch makes blkcg_[rw]stat per-cpu and drop the ad-hoc
per-cpu wrapping in blk-throttle.

* blkg_[rw]stat->cnt is replaced with cpu_cnt which is struct
  percpu_counter.  This makes syncp unnecessary as remote accesses are
  handled by percpu_counter itself.

* blkg_[rw]stat_init() can now fail due to percpu allocation failure
  and thus are updated to return int.

* percpu_counters need explicit freeing.  blkg_[rw]stat_exit() added.

* As blkg_rwstat->cpu_cnt[] can't be read directly anymore, reading
  and summing results are stored in ->aux_cnt[] instead.

* Custom per-cpu stat implementation in blk-throttle is removed.

This makes all blkcg stat counters per-cpu without complicating policy
implmentations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
e6269c4454 blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
cgroup stats are local to each cgroup and doesn't propagate to
ancestors by default.  When recursive stats are necessary, the sum is
calculated over all the descendants.  This initially was for backward
compatibility to support both group-local and recursive stats but this
mode of operation makes general sense as stat update is much hotter
thafn reporting those stats.

This however ends up losing recursive stats when a child is removed.
To work around this, cfq-iosched adds its stats to its parent
cfq_group->dead_stats which is summed up together when calculating
recursive stats.

It's planned that the core stats will be moved to blkcg_gq, so we want
to move the mechanism for keeping track of the stats of dead children
from cfq to blkcg core.  This patch adds blkg_[rw]stat->aux_cnt which
are atomic64_t's keeping track of auxiliary counts which are excluded
when reading local counts but included for recursive.

blkg_[rw]stat_merge() which were used by cfq to implement dead_stats
are replaced by blkg_[rw]stat_add_aux(), and cfq now forwards stats of
a dead cgroup to the aux counts of parent->stats instead of separate
->dead_stats.

This will also help making blkg_[rw]stats per-cpu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
ae11889636 blkcg: consolidate blkg creation in blkcg_bio_issue_check()
blkg (blkcg_gq) currently is created by blkcg policies invoking
blkg_lookup_create() which ends up repeating about the same code in
different policies.  Theoretically, this can avoid the overhead of
looking and/or creating blkg's if blkcg is enabled but no policy is in
use; however, the cost of blkg lookup / creation is very low
especially if only the root blkcg is in use which is highly likely if
no blkcg policy is in active use - it boils down to a single very
predictable conditional and surrounding RCU protection.

This patch consolidates blkg creation to a new function
blkcg_bio_issue_check() which is called during bio issue from
generic_make_request_checks().  blkcg_bio_issue_check() is now the
only function which tries to create missing blkg's.  The subsequent
policy and request_list operations just perform blkg_lookup() and if
missing falls back to the root.

* blk_get_rl() no longer tries to create blkg.  It uses blkg_lookup()
  instead of blkg_lookup_create().

* blk_throtl_bio() is now called from blkcg_bio_issue_check() with rcu
  read locked and blkg already looked up.  Both throtl_lookup_tg() and
  throtl_lookup_create_tg() are dropped.

* cfq is similarly updated.  cfq_lookup_create_cfqg() is replaced with
  cfq_lookup_cfqg()which uses blkg_lookup().

This consolidates blkg handling and avoids unnecessary blkg creation
retries under memory pressure.  In addition, this provides a common
bio entry point into blkcg where things like common accounting can be
performed.

v2: Build fixes for !CONFIG_CFQ_GROUP_IOSCHED and
    !CONFIG_BLK_DEV_THROTTLING.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
85b6bc9db6 blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
Currently, both throttle and cfq policies implement their own root
blkg (blkcg_gq) lookup fast path.  This patch moves root blkg
optimization from throtl_lookup_tg() to __blkg_lookup().  cfq-iosched
currently doesn't use blkg_lookup() but will be converted and drop the
optimization too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
24f290466f blkcg: inline [__]blkg_lookup()
blkg_lookup() checks whether the target queue is bypassing and, if
not, calls __blkg_lookup() which first checks the lookup hint and then
performs radix tree walk.  The operations upto hint checking are
trivial and there are many users of this function.  This patch inlines
blkg_lookup() and the fast path part of __blkg_lookup().  The radix
tree lookup and hint update are now in blkg_lookup_slowpath().

This will help consolidating blkg handling by easing moving root blkcg
short-circuit to inlined lookup fast path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
e4a9bde958 blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods
Each active policy has a cpd (blkcg_policy_data) on each blkcg.  The
cpd's were allocated by blkcg core and each policy could request to
allocate extra space at the end by setting blkcg_policy->cpd_size
larger than the size of cpd.

This is a bit unusual but blkg (blkcg_gq) policy data used to be
handled this way too so it made sense to be consistent; however, blkg
policy data switched to alloc/free callbacks.

This patch makes similar changes to cpd handling.
blkcg_policy->cpd_alloc/free_fn() are added to replace ->cpd_size.  As
cpd allocation is now done from policy side, it can simply allocate a
larger area which embeds cpd at the beginning.

As ->cpd_alloc_fn() may be able to perform all necessary
initializations, this patch makes ->cpd_init_fn() optional.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
814376483e blkcg: minor updates around blkcg_policy_data
* Rename blkcg->pd[] to blkcg->cpd[] so that cpd is consistently used
  for blkcg_policy_data.

* Make blkcg_policy->cpd_init_fn() take blkcg_policy_data instead of
  blkcg.  This makes it consistent with blkg_policy_data methods and
  to-be-added cpd alloc/free methods.

* blkcg_policy_data->blkcg and cpd_to_blkcg() added so that
  cpd_init_fn() can determine the associated blkcg from
  blkcg_policy_data.

v2: blkcg_policy_data->blkcg initializations were missing.  Added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
a9520cd6f2 blkcg: make blkcg_policy methods take a pointer to blkcg_policy_data
The newly added ->pd_alloc_fn() and ->pd_free_fn() deal with pd
(blkg_policy_data) while the older ones use blkg (blkcg_gq).  As using
blkg doesn't make sense for ->pd_alloc_fn() and after allocation pd
can always be mapped to blkg and given that these are policy-specific
methods, it makes sense to converge on pd.

This patch makes all methods deal with pd instead of blkg.  Most
conversions are trivial.  In blk-cgroup.c, a couple method invocation
sites now test whether pd exists instead of policy state for
consistency.  This shouldn't cause any behavioral differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
b2ce2643cc blk-throttle: clean up blkg_policy_data alloc/init/exit/free methods
With the recent addition of alloc and free methods, things became
messier.  This patch reorganizes them according to the followings.

* ->pd_alloc_fn()

  Responsible for allocation and static initializations - the ones
  which can be done independent of where the pd might be attached.

* ->pd_init_fn()

  Initializations which require the knowledge of where the pd is
  attached.

* ->pd_free_fn()

  The counter part of pd_alloc_fn().  Static de-init and freeing.

This leaves ->pd_exit_fn() without any users.  Removed.

While at it, collapse an one liner function throtl_pd_exit(), which
has only one user, into its user.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:17 -07:00
Tejun Heo
001bea73e7 blkcg: replace blkcg_policy->pd_size with ->pd_alloc/free_fn() methods
A blkg (blkcg_gq) represents the relationship between a cgroup and
request_queue.  Each active policy has a pd (blkg_policy_data) on each
blkg.  The pd's were allocated by blkcg core and each policy could
request to allocate extra space at the end by setting
blkcg_policy->pd_size larger than the size of pd.

This is a bit unusual but was done this way mostly to simplify error
handling and all the existing use cases could be handled this way;
however, this is becoming too restrictive now that percpu memory can
be allocated without blocking.

This introduces two new mandatory blkcg_policy methods - pd_alloc_fn()
and pd_free_fn() - which are used to allocate and release pd for a
given policy.  As pd allocation is now done from policy side, it can
simply allocate a larger area which embeds pd at the beginning.  This
change makes ->pd_size pointless.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:16 -07:00
Tejun Heo
4c55f4f9ad blkcg: restructure blkg_policy_data allocation in blkcg_activate_policy()
When a policy gets activated, it needs to allocate and install its
policy data on all existing blkg's (blkcg_gq's).  Because blkg
iteration is protected by a spinlock, it currently counts the total
number of blkg's in the system, allocates the matching number of
policy data on a list and installs them during a single iteration.

This can be simplified by using speculative GFP_NOWAIT allocations
while iterating and falling back to a preallocated policy data on
failure.  If the preallocated one has already been consumed, it
releases the lock, preallocate with GFP_KERNEL and then restarts the
iteration.  This can be a bit more expensive than before but policy
activation is a very cold path and shouldn't matter.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:16 -07:00
Tejun Heo
401efbf835 blkcg: remove unnecessary request_list->blkg NULL test in blk_put_rl()
Since ec13b1d6f0 ("blkcg: always create the blkcg_gq for the root
blkcg"), a request_list always has its blkg associated.  Drop
unnecessary rl->blkg NULL test from blk_put_rl().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:16 -07:00
Tejun Heo
9acee9c551 kernfs: implement kernfs_path_len()
Add a function to determine the path length of a kernfs node.  This
for now will be used by writeback tracepoint updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:15 -07:00
Tejun Heo
1ed8d48c57 writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg
wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an
earlier implementation, wb's were keyed by blkcg ID.
bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and
allows iterations to start from an arbitrary ID which is used to
interrupt and resume iterations.

Unfortunately, while changing wb to be keyed by memcg ID instead of
blkcg, bdi_for_each_wb() was missed and is still assuming that wb's
are keyed by blkcg ID.  This doesn't affect iterations which don't get
interrupted but bdi_split_work_to_wbs() makes use of iteration
resuming on allocation failures and thus may incorrectly skip or
repeat wb's.

Fix it by changing bdi_for_each_wb() to take memcg IDs instead of
blkcg IDs and updating bdi_split_work_to_wbs() accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 15:49:15 -07:00
Jens Axboe
11743ee047 Merge branch 'for-4.3-unified-base' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.3/blkcg 2015-08-18 15:49:11 -07:00
Tejun Heo
3e1d2eed39 cgroup: introduce cgroup_subsys->legacy_name
This allows cgroup subsystems to use a different name on the unified
hierarchy.  cgroup_subsys->name is used on the unified hierarchy,
->legacy_name elsewhere.  If ->legacy_name is not explicitly set, it's
automatically set to ->name and the userland visible behavior remains
unchanged.

v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount
    options are used only on legacy hierarchies.  Suggested by Li
    Zefan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
2015-08-18 13:58:16 -07:00
Olof Johansson
a005bc6f69 Merge tag 'renesas-clk-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers
Renesas ARM Based SoC CPG/MSTP Clock Driver Updates for v4.3

* Add Clock Domain support to the Clock Pulse Generator
  (CPG) Module Stop (MSTP) Clocks driver using the generic PM Domain.

* tag 'renesas-clk-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  clk: shmobile: rz: Add CPG/MSTP Clock Domain support
  clk: shmobile: rcar-gen2: Add CPG/MSTP Clock Domain support
  clk: shmobile: r8a7779: Add CPG/MSTP Clock Domain support
  clk: shmobile: r8a7778: Add CPG/MSTP Clock Domain support
  clk: shmobile: Add CPG/MSTP Clock Domain support

Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-18 13:30:18 -07:00
Jeff Moyer
d2be537c3b block: bump BLK_DEF_MAX_SECTORS to 2560
A value of 2560 (1280k) will accommodate a 10-data-disk stripe
write with chunk size 128k.  In the testing I've done using
iozone, fio, and aio-stress across a number of different storage
devices, a value of 1280 does not show a big performance
difference from 512, but will hopefully help software RAID
setups using SATA disks, as reported by Christoph.

NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to
BLK_DEF_MAX_SECTORS.  So, this patch essentially changes aeoblk to
Use a larger maximum sector size, and I did not test this.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 13:21:15 -07:00
Jeff Moyer
30e2bc08b2 Revert "block: remove artifical max_hw_sectors cap"
This reverts commit 34b48db66e.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays.  It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.

The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k.  In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512.  This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 13:21:13 -07:00
Olof Johansson
207b504a63 Merge tag 'imx-soc-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/soc
The i.MX SoC changes for 4.3:
 - Add i.MX6 Ultralite SoC support, which is the newest addition to
   i.MX6 family.  It integrates a single Cortex-A7 core and a power
   management module that reduces the complexity of external power
   supply and simplifies power sequencing.
 - Change SNVS RTC driver to use syscon interface for register access,
   and add SNVS power key driver support.
 - Add a second clock for mxc rtc driver, and support device tree probe
   for the driver.
 - Add FEC MAC reference clock and phy fixup initialization for i.MX6UL
   platform.

* tag 'imx-soc-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  rtc: snvs: select option REGMAP_MMIO
  ARM: imx6ul: add fec MAC refrence clock and phy fixup init
  ARM: imx6ul: add fec bits to GPR syscon definition
  rtc: mxc: add support of device tree
  dt-binding: document the binding for mxc rtc
  rtc: mxc: use a second rtc clock
  input: snvs_pwrkey: use "wakeup-source" as deivce tree property name
  Document: devicetree: input: imx: i.mx snvs power device tree bindings
  input: keyboard: imx: add snvs power key driver
  Document: dt: fsl: snvs: change support syscon
  rtc: snvs: use syscon to access register
  ARM: imx: add low-level debug support for i.mx6ul
  ARM: imx: add i.mx6ul msl support

Signed-off-by: Olof Johansson <olof@lixom.net>
2015-08-18 13:10:05 -07:00
Jon Derrick
81f03fedcc NVMe: Add nvme subsystem reset IOCTL
Controllers can perform optional subsystem resets as introduced in NVMe
1.1. This patch adds an IOCTL to trigger the subsystem reset by writing
"NVMe" to the NSSR register.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 11:56:13 -06:00
Keith Busch
dfbac8c7ac NVMe: Add nvme subsystem reset support
Controllers part of an NVMe subsystem may be reset by any other controller
in the subsystem. If the device is capable of subsystem resets, this
patch adds detection for such events and performs appropriate controller
initialization upon subsystem reset detection.

The register bit is a RW1C type, so the driver needs to write a 1 to the
status bit to clear the subsystem reset occured bit during initialization.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18 11:56:11 -06:00
Greg Kroah-Hartman
475908bc4d Revert "usb: interface authorization: Declare authorized attribute"
This reverts commit 484ebaedec as the
signed-off-by address is invalid.

Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-18 09:59:12 -07:00
Greg Kroah-Hartman
12e1a6a0f1 Revert "usb: interface authorization: Introduces the default interface authorization"
This reverts commit 1d958bef45 as the
signed-off-by address is invalid.

Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-18 09:58:45 -07:00
Greg Kroah-Hartman
a1b93ab715 Revert "usb: interface authorization: Use a flag for the default device authorization"
This reverts commit 3cf1fc8065 as the
signed-off-by address is invalid.

Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-18 09:56:04 -07:00
Ingo Molnar
a5dd192496 Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes
Conflicts:
	arch/x86/entry/entry_64_compat.S
	arch/x86/math-emu/get_address.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-18 09:39:47 +02:00
Linus Walleij
74f4e0cc61 bcma: switch GPIO portions to use GPIOLIB_IRQCHIP
This switches the BCMA GPIO driver to use GPIOLIB_IRQCHIP to
handle its interrupts instead of rolling its own copy of the
irqdomain handling etc.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-08-18 09:08:47 +03:00
Kalle Valo
15f6d96ded Merge tag 'mac80211-next-for-davem-2015-08-14' mac80211-next.git
iwlwifi needs new mac80211 patches so merge mac80211-next.git to
wireless-drivers-next.git.
2015-08-18 08:44:22 +03:00
Dave Chinner
c7f5408493 inode: rename i_wb_list to i_io_list
There's a small consistency problem between the inode and writeback
naming. Writeback calls the "for IO" inode queues b_io and
b_more_io, but the inode calls these the "writeback list" or
i_wb_list. This makes it hard to an new "under writeback" list to
the inode, or call it an "under IO" list on the bdi because either
way we'll have writeback on IO and IO on writeback and it'll just be
confusing. I'm getting confused just writing this!

So, rename the inode "for IO" list variable to i_io_list so we can
add a new "writeback list" in a subsequent patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 23:38:10 -04:00
Linus Torvalds
4e7fca0d0a Merge branch 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Three minor device-specific fixes and revert of NCQ autosense added
  during this -rc1.

  It turned out that NCQ autosense as currently implemented interferes
  with the usual error handling behavior.  It will be revisited in the
  near future"

* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata: ahci_brcmstb: Fix misuse of IS_ENABLED
  sata_sx4: Check return code from pdc20621_i2c_read()
  Revert "libata: Implement NCQ autosense"
  Revert "libata: Implement support for sense data reporting"
  Revert "libata-eh: Set 'information' field for autosense"
  ata: ahci_brcmstb: Fix warnings with CONFIG_PM_SLEEP=n
2015-08-17 16:20:45 -07:00
David Ahern
808d28c468 net: Fix docbook warning for IFF_VRF_MASTER enum
kbuild test robot reported:
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   d52736e24f
commit: 4e3c89920c [751/762] net: Introduce VRF related flags and helpers
reproduce: make htmldocs

>> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:56:35 -07:00
David Ahern
2f52bdcf6b net: Updates to netif_index_is_vrf
As Eric noted netif_index_is_vrf is not called with rcu_read_lock held,
so wrap the dev_get_by_index_rcu in rcu_read_lock and unlock.

If VRF is not enabled or oif is 0 skip the device lookup. In both cases
index cannot be the VRF master.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:55:15 -07:00
Achiad Shochat
3c2d18ef22 net/mlx5e: Support ethtool get/set_pauseparam
Only rx/tx pause settings.
Autoneg setting is currently not supported.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:51:36 -07:00
Achiad Shochat
6fa1bcab6b net/mlx5e: Ethtool link speed setting fixes
- Port speed settings are applied by the device only upon
  port admin status transition from DOWN to UP.
  So we enforce this transition regardless of the port's
  current operation state (which may be occasionally DOWN if
  for example the network cable is disconnected).
- Fix the PORT_UP/DOWN device interface enum
- Set the local_port bit in the device PAOS register
- EXPORT the PAOS (Port Administrative and Operational Status)
  register set/query access functions.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:51:35 -07:00
Dave Chinner
e97fedb9ef sync: serialise per-superblock sync operations
When competing sync(2) calls walk the same filesystem, they need to
walk the list of inodes on the superblock to find all the inodes
that we need to wait for IO completion on. However, when multiple
wait_sb_inodes() calls do this at the same time, they contend on the
the inode_sb_list_lock and the contention causes system wide
slowdowns. In effect, concurrent sync(2) calls can take longer and
burn more CPU than if they were serialised.

Stop the worst of the contention by adding a per-sb mutex to wrap
around wait_sb_inodes() so that we only execute one sync(2) IO
completion walk per superblock superblock at a time and hence avoid
contention being triggered by concurrent sync(2) calls.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:47 -04:00