Commit Graph

54535 Commits

Author SHA1 Message Date
Ingo Molnar
c930b2c0de sched/core: Convert ___assert_task_state() link time assert to BUILD_BUG_ON()
The length of TASK_STATE_TO_CHAR_STR was still checked using the old
link-time manual error method - convert it to BUILD_BUG_ON(). This
has a couple of advantages:

 - it's more obvious what's going on

 - it reduces the size and complexity of <linux/sched.h>

 - BUILD_BUG_ON() will fail during compilation, with a clearer
   error message than the link time assert.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:23 +01:00
Ingo Molnar
9ccd27cc2e sched/headers: Make all include/linux/sched/*.h headers build standalone
Make each header self-sufficient, so that it can be built successfully
both in an allnoconfig and allyesconfig kernel.

Also standardize the naming of their header guards.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:23 +01:00
Johannes Berg
eb1e011a14 average: change to declare precision, not factor
Declaring the factor is counter-intuitive, and people are prone
to using small(-ish) values even when that makes no sense.

Change the DECLARE_EWMA() macro to take the fractional precision,
in bits, rather than a factor, and update all users.

While at it, add some more documentation.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-02 08:32:46 +01:00
Linus Torvalds
4977ab6e92 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool relocation fixes from Ingo Molnar:
 "Two fixes related to the module loading regression introduced by the
  recent objtool changes"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool, modules: Discard objtool annotation sections for modules
  objtool, compiler.h: Fix __unreachable section relocation size
2017-03-01 17:04:50 -08:00
Linus Torvalds
8f03cf50bc Merge tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
   - xprtrdma: Fix Read chunk padding
   - xprtrdma: Per-connection pad optimization
   - xprtrdma: Disable pad optimization by default
   - xprtrdma: Reduce required number of send SGEs
   - nlm: Ensure callback code also checks that the files match
   - pNFS/flexfiles: If the layout is invalid, it must be updated before
     retrying
   - NFSv4: Fix reboot recovery in copy offload
   - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION
     replies to OP_SEQUENCE"
   - NFSv4: fix getacl head length estimation
   - NFSv4: fix getacl ERANGE for sum ACL buffer sizes

  Features:
   - Add and use dprintk_cont macros
   - Various cleanups to NFS v4.x to reduce code duplication and
     complexity
   - Remove unused cr_magic related code
   - Improvements to sunrpc "read from buffer" code
   - Clean up sunrpc timeout code and allow changing TCP timeout
     parameters
   - Remove duplicate mw_list management code in xprtrdma
   - Add generic functions for encoding and decoding xdr streams

  Bugfixes:
   - Clean up nfs_show_mountd_netid
   - Make layoutreturn_ops static and use NULL instead of 0 to fix
     sparse warnings
   - Properly handle -ERESTARTSYS in nfs_rename()
   - Check if register_shrinker() failed during rpcauth_init()
   - Properly clean up procfs/pipefs entries
   - Various NFS over RDMA related fixes
   - Silence unititialized variable warning in sunrpc"

* tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits)
  NFSv4: fix getacl ERANGE for some ACL buffer sizes
  NFSv4: fix getacl head length estimation
  Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE"
  NFSv4: Fix reboot recovery in copy offload
  pNFS/flexfiles: If the layout is invalid, it must be updated before retrying
  NFSv4: Clean up owner/group attribute decode
  SUNRPC: Add a helper function xdr_stream_decode_string_dup()
  NFSv4: Remove bogus "struct nfs_client" argument from decode_ace()
  NFSv4: Fix the underestimation of delegation XDR space reservation
  NFSv4: Replace callback string decode function with a generic
  NFSv4: Replace the open coded decode_opaque_inline() with the new generic
  NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics
  SUNRPC: Add generic helpers for xdr_stream encode/decode
  sunrpc: silence uninitialized variable warning
  nlm: Ensure callback code also checks that the files match
  sunrpc: Allow xprt->ops->timer method to sleep
  xprtrdma: Refactor management of mw_list field
  xprtrdma: Handle stale connection rejection
  xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs
  xprtrdma: Shrink send SGEs array
  ...
2017-03-01 16:10:30 -08:00
Linus Torvalds
25c4e6c3f0 Merge tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This round introduces several interesting features such as on-disk NAT
  bitmaps, IO alignment, and a discard thread. And it includes a couple
  of major bug fixes as below.

  Enhancements:

   - introduce on-disk bitmaps to avoid scanning NAT blocks when getting
     free nids

   - support IO alignment to prepare open-channel SSD integration in
     future

   - introduce a discard thread to avoid long latency during checkpoint
     and fstrim

   - use SSR for warm node and enable inline_xattr by default

   - introduce in-memory bitmaps to check FS consistency for debugging

   - improve write_begin by avoiding needless read IO

  Bug fixes:

   - fix broken zone_reset behavior for SMR drive

   - fix wrong victim selection policy during GC

   - fix missing behavior when preparing discard commands

   - fix bugs in atomic write support and fiemap

   - workaround to handle multiple f2fs_add_link calls having same name

  ... and it includes a bunch of clean-up patches as well"

* tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (97 commits)
  f2fs: avoid to flush nat journal entries
  f2fs: avoid to issue redundant discard commands
  f2fs: fix a plint compile warning
  f2fs: add f2fs_drop_inode tracepoint
  f2fs: Fix zoned block device support
  f2fs: remove redundant set_page_dirty()
  f2fs: fix to enlarge size of write_io_dummy mempool
  f2fs: fix memory leak of write_io_dummy mempool during umount
  f2fs: fix to update F2FS_{CP_}WB_DATA count correctly
  f2fs: use MAX_FREE_NIDS for the free nids target
  f2fs: introduce free nid bitmap
  f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint
  f2fs: update the comment of default nr_pages to skipping
  f2fs: drop the duplicate pval in f2fs_getxattr
  f2fs: Don't update the xattr data that same as the exist
  f2fs: kill __is_extent_same
  f2fs: avoid bggc->fggc when enough free segments are avaliable after cp
  f2fs: select target segment with closer temperature in SSR mode
  f2fs: show simple call stack in fault injection message
  f2fs: no need lock_op in f2fs_write_inline_data
  ...
2017-03-01 15:55:04 -08:00
David Howells
0837e49ab3 KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()
rcu_dereference_key() and user_key_payload() are currently being used in
two different, incompatible ways:

 (1) As a wrapper to rcu_dereference() - when only the RCU read lock used
     to protect the key.

 (2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
     used to protect the key and the may be being modified.

Fix this by splitting both of the key wrappers to produce:

 (1) RCU accessors for keys when caller has the key semaphore locked:

	dereference_key_locked()
	user_key_payload_locked()

 (2) RCU accessors for keys when caller holds the RCU read lock:

	dereference_key_rcu()
	user_key_payload_rcu()

This should fix following warning in the NFS idmapper

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.10.0 #1 Tainted: G        W
  -------------------------------
  ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
  other info that might help us debug this:
  rcu_scheduler_active = 2, debug_locks = 0
  1 lock held by mount.nfs/5987:
    #0:  (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
  stack backtrace:
  CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G        W       4.10.0 #1
  Call Trace:
    dump_stack+0xe8/0x154 (unreliable)
    lockdep_rcu_suspicious+0x140/0x190
    nfs_idmap_get_key+0x380/0x420 [nfsv4]
    nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
    decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
    decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
    nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
    rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
    call_decode+0x29c/0x910 [sunrpc]
    __rpc_execute+0x140/0x8f0 [sunrpc]
    rpc_run_task+0x170/0x200 [sunrpc]
    nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
    _nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
    nfs4_lookup_root+0xe0/0x350 [nfsv4]
    nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
    nfs4_find_root_sec+0xc4/0x100 [nfsv4]
    nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
    nfs4_get_rootfh+0x6c/0x190 [nfsv4]
    nfs4_server_common_setup+0xc4/0x260 [nfsv4]
    nfs4_create_server+0x278/0x3c0 [nfsv4]
    nfs4_remote_mount+0x50/0xb0 [nfsv4]
    mount_fs+0x74/0x210
    vfs_kern_mount+0x78/0x220
    nfs_do_root_mount+0xb0/0x140 [nfsv4]
    nfs4_try_mount+0x60/0x100 [nfsv4]
    nfs_fs_mount+0x5ec/0xda0 [nfs]
    mount_fs+0x74/0x210
    vfs_kern_mount+0x78/0x220
    do_mount+0x254/0xf70
    SyS_mount+0x94/0x100
    system_call+0x38/0xe0

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-02 10:09:00 +11:00
Josh Poimboeuf
e390f9a968 objtool, modules: Discard objtool annotation sections for modules
The '__unreachable' and '__func_stack_frame_non_standard' sections are
only used at compile time.  They're discarded for vmlinux but they
should also be discarded for modules.

Since this is a recurring pattern, prefix the section names with
".discard.".  It's a nice convention and vmlinux.lds.h already discards
such sections.

Also remove the 'a' (allocatable) flag from the __unreachable section
since it doesn't make sense for a discarded section.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")
Link: http://lkml.kernel.org/r/20170301180444.lhd53c5tibc4ns77@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 20:32:25 +01:00
Linus Torvalds
544a068f1f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:

 - add thermal driver for R-Car Gen3 thermal sensors.

 - add thermal driver for ZTE' zx2967 family thermal sensors.

 - convert thermal ID allocation from IDR to IDA.

 - fix a possible NULL dereference in imx thermal driver.

 - fix a ti-soc-thermal driver dependency issue so that critical thermal
   control is still available when CPU_THERMAL is not defined.

 - update binding information for QorIQ thermal driver.

 - a couple of cleanups in thermal core, intel_powerclamp, exynos,
   dra752-thermal, mtk-thermal driver.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  powerpc/mpc85xx: Update TMU device tree node for T1023/T1024
  powerpc/mpc85xx: Update TMU device tree node for T1040/T1042
  dt-bindings: Update QorIQ TMU thermal bindings
  thermal: mtk_thermal: Staticise a number of data variables
  thermal: arm: dra752: Remove all TSHUT related definitions
  thermal: arm: dra752: Remove TSHUT configuration
  thermal: ti-soc-thermal: Remove CPU_THERMAL Dependency from TI_THERMAL
  thermal: imx: Fix possible NULL dereference.
  thermal: exynos: Remove parsing unused samsung,tmu_cal_mode property
  thermal: zx2967: add thermal driver for ZTE's zx2967 family
  thermal: use cpumask_var_t for on-stack cpu masks
  dt: bindings: add documentation for zx2967 family thermal sensor
  thermal/intel_powerclamp: Remove set-but-not-used variables
  thermal: rcar_gen3_thermal: Add R-Car Gen3 thermal driver
  thermal: rcar_gen3_thermal: Document the R-Car Gen3
  thermal: convert devfreq_cooling to use an IDA
  thermal: convert cpu_cooling to use an IDA
  thermal: convert clock cooling to use an IDA
  thermal core: convert ID allocation to IDA
2017-03-01 09:54:32 -08:00
Eric Dumazet
39e6c8208d net: solve a NAPI race
While playing with mlx4 hardware timestamping of RX packets, I found
that some packets were received by TCP stack with a ~200 ms delay...

Since the timestamp was provided by the NIC, and my probe was added
in tcp_v4_rcv() while in BH handler, I was confident it was not
a sender issue, or a drop in the network.

This would happen with a very low probability, but hurting RPC
workloads.

A NAPI driver normally arms the IRQ after the napi_complete_done(),
after NAPI_STATE_SCHED is cleared, so that the hard irq handler can grab
it.

Problem is that if another point in the stack grabs NAPI_STATE_SCHED bit
while IRQ are not disabled, we might have later an IRQ firing and
finding this bit set, right before napi_complete_done() clears it.

This can happen with busy polling users, or if gro_flush_timeout is
used. But some other uses of napi_schedule() in drivers can cause this
as well.

thread 1                                 thread 2 (could be on same cpu, or not)

// busy polling or napi_watchdog()
napi_schedule();
...
napi->poll()

device polling:
read 2 packets from ring buffer
                                          Additional 3rd packet is
available.
                                          device hard irq

                                          // does nothing because
NAPI_STATE_SCHED bit is owned by thread 1
                                          napi_schedule();

napi_complete_done(napi, 2);
rearm_irq();

Note that rearm_irq() will not force the device to send an additional
IRQ for the packet it already signaled (3rd packet in my example)

This patch adds a new NAPI_STATE_MISSED bit, that napi_schedule_prep()
can set if it could not grab NAPI_STATE_SCHED

Then napi_complete_done() properly reschedules the napi to make sure
we do not miss something.

Since we manipulate multiple bits at once, use cmpxchg() like in
sk_busy_loop() to provide proper transactions.

In v2, I changed napi_watchdog() to use a relaxed variant of
napi_schedule_prep() : No need to set NAPI_STATE_MISSED from this point.

In v3, I added more details in the changelog and clears
NAPI_STATE_MISSED in busy_poll_stop()

In v4, I added the ideas given by Alexander Duyck in v3 review

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01 09:50:58 -08:00
Dan Carpenter
b2d0fe3547 net/mlx4: && vs & typo
Bitwise & was obviously intended here.

Fixes: 745d8ae462 ("net/mlx4: Spoofcheck and zero MAC can't coexist")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01 09:50:58 -08:00
Linus Torvalds
545b2820c4 Merge tag 'pwm/for-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
 "This set contains mostly fixes to existing drivers as well as cleanup
  of code that's not been in active use for a while"

* tag 'pwm/for-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (27 commits)
  acpi: lpss: call pwm_add_table() for BSW PWM device
  pwm: Try to load modules during pwm_get()
  pwm: Don't hold pwm_lookup_lock longer than necessary
  pwm: Make the PWM_POLARITY flag in DTB optional
  pwm: Print error messages with pr_err() instead of pr_debug()
  pwm: imx: Add polarity inversion support to i.MX's PWMv2
  pwm: imx: doc: Update imx-pwm.txt documentation entry
  pwm: imx: Remove redundant i.MX PWMv2 code
  pwm: imx: Provide atomic PWM support for i.MX PWMv2
  pwm: imx: Move PWMv2 wait for fifo slot code to a separate function
  pwm: imx: Move PWMv2 software reset code to a separate function
  pwm: imx: Rewrite v1 code to facilitate switch to atomic PWM
  pwm: imx: Add separate set of PWM ops for v1 and v2
  pwm: imx: Remove ipg clock and enable per clock when required
  pwm: lpss: Add Intel Gemini Lake PCI ID
  pwm: lpss: Do not export board infos for different PWM types
  pwm: lpss: Avoid reconfiguring while UPDATE bit is still enabled
  pwm: lpss: Switch to new atomic API
  pwm: lpss: Allow duty cycle to be 0
  pwm: lpss: Avoid potential overflow of base_unit
  ...
2017-03-01 09:46:02 -08:00
Elena Reshetova
e3736c3eb3 kvm: convert kvm.users_count from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-01 17:03:21 +01:00
Josh Poimboeuf
55149d0653 objtool, compiler.h: Fix __unreachable section relocation size
Linus reported the following commit broke module loading on his laptop:

  d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")

It showed errors like the following:

  module: overflow in relocation type 10 val ffffffffc02afc81
  module: 'nvme' likely not compiled with -mcmodel=kernel

The problem is that the __unreachable section addresses are stored using
the '.long' asm directive, which isn't big enough for .text section
kernel addresses.  Use relative addresses instead:

  ".long %c0b - .\t\n"

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")
Link: http://lkml.kernel.org/r/20170301060504.oltm3iws6fmubnom@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 14:20:18 +01:00
Dan Williams
86ef58a4e3 nfit, libnvdimm: fix interleave set cookie calculation
The interleave-set cookie is a sum that sanity checks the composition of
an interleave set has not changed from when the namespace was initially
created.  The checksum is calculated by sorting the DIMMs by their
location in the interleave-set. The comparison for the sort must be
64-bit wide, not byte-by-byte as performed by memcmp() in the broken
case.

Fix the implementation to accept correct cookie values in addition to
the Linux "memcmp" order cookies, but only allow correct cookies to be
generated going forward. It does mean that namespaces created by
third-party-tooling, or created by newer kernels with this fix, will not
validate on older kernels. However, there are a couple mitigating
conditions:

    1/ platforms with namespace-label capable NVDIMMs are not widely
       available.

    2/ interleave-sets with a single-dimm are by definition not affected
       (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.

The cookie stored in the namespace label will be fixed by any write the
namespace label, the most straightforward way to achieve this is to
write to the "alt_name" attribute of a namespace in sysfs.

Cc: <stable@vger.kernel.org>
Fixes: eaf961536e ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-03-01 00:49:42 -08:00
Linus Torvalds
2d6be4abf5 Merge tag 'for-linus-4.11' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI updates from Corey Minyard:
 "This is a few small fixes to the main IPMI driver, make some things
  const, fix typos, etc.

  The last patch came in about a week ago, but IMHO it's best to go in
  now. It is not for the main driver, it's for the bt-bmc driver, which
  runs on the managment controller side, not on the host side, so the
  scope is limited and the change is necessary"

* tag 'for-linus-4.11' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: bt-bmc: Use a regmap for register access
  char: ipmi: constify ipmi_smi_handlers structures
  acpi:ipmi: Make IPMI user handler const
  ipmi: make ipmi_usr_hndl const
  Documentation: Fix a typo in IPMI.txt.
2017-02-28 21:06:30 -08:00
Linus Torvalds
cf393195c3 Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax
Pull IDR rewrite from Matthew Wilcox:
 "The most significant part of the following is the patch to rewrite the
  IDR & IDA to be clients of the radix tree. But there's much more,
  including an enhancement of the IDA to be significantly more space
  efficient, an IDR & IDA test suite, some improvements to the IDR API
  (and driver changes to take advantage of those improvements), several
  improvements to the radix tree test suite and RCU annotations.

  The IDR & IDA rewrite had a good spin in linux-next and Andrew's tree
  for most of the last cycle. Coupled with the IDR test suite, I feel
  pretty confident that any remaining bugs are quite hard to hit. 0-day
  did a great job of watching my git tree and pointing out problems; as
  it hit them, I added new test-cases to be sure not to be caught the
  same way twice"

Willy goes on to expand a bit on the IDR rewrite rationale:
 "The radix tree and the IDR use very similar data structures.

  Merging the two codebases lets us share the memory allocation pools,
  and results in a net deletion of 500 lines of code. It also opens up
  the possibility of exposing more of the features of the radix tree to
  users of the IDR (and I have some interesting patches along those
  lines waiting for 4.12)

  It also shrinks the size of the 'struct idr' from 40 bytes to 24 which
  will shrink a fair few data structures that embed an IDR"

* 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax: (32 commits)
  radix tree test suite: Add config option for map shift
  idr: Add missing __rcu annotations
  radix-tree: Fix __rcu annotations
  radix-tree: Add rcu_dereference and rcu_assign_pointer calls
  radix tree test suite: Run iteration tests for longer
  radix tree test suite: Fix split/join memory leaks
  radix tree test suite: Fix leaks in regression2.c
  radix tree test suite: Fix leaky tests
  radix tree test suite: Enable address sanitizer
  radix_tree_iter_resume: Fix out of bounds error
  radix-tree: Store a pointer to the root in each node
  radix-tree: Chain preallocated nodes through ->parent
  radix tree test suite: Dial down verbosity with -v
  radix tree test suite: Introduce kmalloc_verbose
  idr: Return the deleted entry from idr_remove
  radix tree test suite: Build separate binaries for some tests
  ida: Use exceptional entries for small IDAs
  ida: Move ida_bitmap to a percpu variable
  Reimplement IDR and IDA using the radix tree
  radix-tree: Add radix_tree_iter_delete
  ...
2017-02-28 20:29:41 -08:00
Linus Torvalds
8313064c2e Merge tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "The nfsd update this round is mainly a lot of miscellaneous cleanups
  and bugfixes.

  A couple changes could theoretically break working setups on upgrade.
  I don't expect complaints in practice, but they seem worth calling out
  just in case:

   - NFS security labels are now off by default; a new security_label
     export flag reenables it per export. But, having them on by default
     is a disaster, as it generally only makes sense if all your clients
     and servers have similar enough selinux policies. Thanks to Jason
     Tibbitts for pointing this out.

   - NFSv4/UDP support is off. It was never really supported, and the
     spec explicitly forbids it. We only ever left it on out of
     laziness; thanks to Jeff Layton for finally fixing that"

* tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linux: (34 commits)
  nfsd: Fix display of the version string
  nfsd: fix configuration of supported minor versions
  sunrpc: don't register UDP port with rpcbind when version needs congestion control
  nfs/nfsd/sunrpc: enforce transport requirements for NFSv4
  sunrpc: flag transports as having congestion control
  sunrpc: turn bitfield flags in svc_version into bools
  nfsd: remove superfluous KERN_INFO
  nfsd: special case truncates some more
  nfsd: minor nfsd_setattr cleanup
  NFSD: Reserve adequate space for LOCKT operation
  NFSD: Get response size before operation for all RPCs
  nfsd/callback: Drop a useless data copy when comparing sessionid
  nfsd/callback: skip the callback tag
  nfsd/callback: Cleanup callback cred on shutdown
  nfsd/idmap: return nfserr_inval for 0-length names
  SUNRPC/Cache: Always treat the invalid cache as unexpired
  SUNRPC: Drop all entries from cache_detail when cache_purge()
  svcrdma: Poll CQs in "workqueue" mode
  svcrdma: Combine list fields in struct svc_rdma_op_ctxt
  svcrdma: Remove unused sc_dto_q field
  ...
2017-02-28 15:39:09 -08:00
Linus Torvalds
b2deee2dc0 Merge tag 'ceph-for-4.11-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
 "This time around we have:

   - support for rbd data-pool feature, which enables rbd images on
     erasure-coded pools (myself). CEPH_PG_MAX_SIZE has been bumped to
     allow erasure-coded profiles with k+m up to 32.

   - a patch for ceph_d_revalidate() performance regression introduced
     in 4.9, along with some cleanups in the area (Jeff Layton)

   - a set of fixes for unsafe ->d_parent accesses in CephFS (Jeff
     Layton)

   - buffered reads are now processed in rsize windows instead of rasize
     windows (Andreas Gerstmayr). The new default for rsize mount option
     is 64M.

   - ack vs commit distinction is gone, greatly simplifying ->fsync()
     and MOSDOpReply handling code (myself)

  ... also a few filesystem bug fixes from Zheng, a CRUSH sync up (CRUSH
  computations are still serialized though) and several minor fixes and
  cleanups all over"

* tag 'ceph-for-4.11-rc1' of git://github.com/ceph/ceph-client: (52 commits)
  libceph, rbd, ceph: WRITE | ONDISK -> WRITE
  libceph: get rid of ack vs commit
  ceph: remove special ack vs commit behavior
  ceph: tidy some white space in get_nonsnap_parent()
  crush: fix dprintk compilation
  crush: do is_out test only if we do not collide
  ceph: remove req from unsafe list when unregistering it
  rbd: constify device_type structure
  rbd: kill obj_request->object_name and rbd_segment_name_cache
  rbd: store and use obj_request->object_no
  rbd: RBD_V{1,2}_DATA_FORMAT macros
  rbd: factor out __rbd_osd_req_create()
  rbd: set offset and length outside of rbd_obj_request_create()
  rbd: support for data-pool feature
  rbd: introduce rbd_init_layout()
  rbd: use rbd_obj_bytes() more
  rbd: remove now unused rbd_obj_request_wait() and helpers
  rbd: switch rbd_obj_method_sync() to ceph_osdc_call()
  libceph: pass reply buffer length through ceph_osdc_call()
  rbd: do away with obj_request in rbd_obj_read_sync()
  ...
2017-02-28 15:36:09 -08:00
Linus Torvalds
74efe07bc3 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "The main change is the uninlining of large refcount_t APIs, plus a
  header dependency fix.

  Note that the uninlining allowed us to enable the underflow/overflow
  warnings unconditionally and remove the debug Kconfig switch: this
  might trigger new warnings in buggy code and turn
  crashes/use-after-free bugs into less harmful memory leaks"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/refcounts: Add missing kernel.h header to have UINT_MAX defined
  locking/refcounts: Out-of-line everything
2017-02-28 10:44:16 -08:00
Linus Torvalds
e72e58faa7 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
 "A handful of objtool fixes related to unreachable code, plus a build
  fix for out of tree modules"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Enclose contents of unreachable() macro in a block
  objtool: Prevent GCC from merging annotate_unreachable()
  objtool: Improve detection of BUG() and other dead ends
  objtool: Fix CONFIG_STACK_VALIDATION=y warning for out-of-tree modules
2017-02-28 10:15:59 -08:00
Linus Torvalds
86292b33d4 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - a few MM remainders

 - misc things

 - autofs updates

 - signals

 - affs updates

 - ipc

 - nilfs2

 - spelling.txt updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
  mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
  mm: add arch-independent testcases for RODATA
  hfs: atomically read inode size
  mm: clarify mm_struct.mm_{users,count} documentation
  mm: use mmget_not_zero() helper
  mm: add new mmget() helper
  mm: add new mmgrab() helper
  checkpatch: warn when formats use %Z and suggest %z
  lib/vsprintf.c: remove %Z support
  scripts/spelling.txt: add some typo-words
  scripts/spelling.txt: add "followings" pattern and fix typo instances
  scripts/spelling.txt: add "therfore" pattern and fix typo instances
  scripts/spelling.txt: add "overwriten" pattern and fix typo instances
  scripts/spelling.txt: add "overwritting" pattern and fix typo instances
  scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances
  scripts/spelling.txt: add "disassocation" pattern and fix typo instances
  scripts/spelling.txt: add "omited" pattern and fix typo instances
  scripts/spelling.txt: add "explictely" pattern and fix typo instances
  scripts/spelling.txt: add "applys" pattern and fix typo instances
  scripts/spelling.txt: add "configuartion" pattern and fix typo instances
  ...
2017-02-27 23:09:29 -08:00
Josh Poimboeuf
4e4636cf98 objtool: Enclose contents of unreachable() macro in a block
Guenter Roeck reported a boot failure in mips64.  It was bisected to the
following commit:

  d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")

The unreachable() macro was formerly only composed of a single
statement.  The above commit added a second statement, but neglected to
enclose the statements in a block.

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")
Link: http://lkml.kernel.org/r/20170228042116.glmwmwiohcix7o4a@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-28 07:47:26 +01:00
Linus Torvalds
12dfdfedbf Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "Just one patch to silence clang's warnings"

* 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: avoid clang warning
2017-02-27 22:12:20 -08:00
Linus Torvalds
f7878dc3a9 Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Several noteworthy changes.

   - Parav's rdma controller is finally merged. It is very straight
     forward and can limit the abosolute numbers of common rdma
     constructs used by different cgroups.

   - kernel/cgroup.c got too chubby and disorganized. Created
     kernel/cgroup/ subdirectory and moved all cgroup related files
     under kernel/ there and reorganized the core code. This hurts for
     backporting patches but was long overdue.

   - cgroup v2 process listing reimplemented so that it no longer
     depends on allocating a buffer large enough to cache the entire
     result to sort and uniq the output. v2 has always mangled the sort
     order to ensure that users don't depend on the sorted output, so
     this shouldn't surprise anybody. This makes the pid listing
     functions use the same iterators that are used internally, which
     have to have the same iterating capabilities anyway.

   - perf cgroup filtering now works automatically on cgroup v2. This
     patch was posted a long time ago but somehow fell through the
     cracks.

   - misc fixes asnd documentation updates"

* 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
  kernfs: fix locking around kernfs_ops->release() callback
  cgroup: drop the matching uid requirement on migration for cgroup v2
  cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
  cgroup: misc cleanups
  cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
  cgroup: track migration context in cgroup_mgctx
  cgroup: cosmetic update to cgroup_taskset_add()
  rdmacg: Fixed uninitialized current resource usage
  cgroup: Add missing cgroup-v2 PID controller documentation.
  rdmacg: Added documentation for rdmacg
  IB/core: added support to use rdma cgroup controller
  rdmacg: Added rdma cgroup controller
  cgroup: fix a comment typo
  cgroup: fix RCU related sparse warnings
  cgroup: move namespace code to kernel/cgroup/namespace.c
  cgroup: rename functions for consistency
  cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
  cgroup: separate out cgroup1_kf_syscall_ops
  cgroup: refactor mount path and clearly distinguish v1 and v2 paths
  cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
  ...
2017-02-27 21:41:08 -08:00
Linus Torvalds
5782fd14aa Merge tag 'rtc-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
 "Subsystem:
   - constify rtc_class_ops structures

 New driver:
   - STM32

 Drivers:
   - armada38x: fix errata, Armada 7K/8K support
   - ds3232: fix wakeup support
   - gemini: DT support
   - m48t86: huge cleanup and platform_data removal
   - mcp795: alarm support
   - sun6i: proper oscillator handling
   - tegra: proper clock handling
   - tps65910: calibration support"

* tag 'rtc-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (44 commits)
  rtc: ds3232: Call device_init_wakeup before device_register
  rtc: pcf2127: bulk read only date and time registers.
  rtc: armada38x: Add support for Armada 7K/8K
  rtc: armada38x: Prepare driver to manage different versions
  rtc: ds3232: Add regmap max_register definition.
  rtc: ds3232: Cleanup whitespace around register and bit definitions.
  rtc: m48t86: remove unused platform_data
  ARM: Orion5x: ts78xx: allow rtc-m48t86 to manage it's own resources
  ARM: Orion5x: ts78xx: remove RTC detection
  ARM: ep93xx: ts72xx: allow rtc-m48t86 to manage its own resources
  rtc: m48t86: verify that the RTC is actually present
  rtc: m48t86: add NVRAM support
  rtc: m48t86: allow driver to manage its resources
  rtc: m48t86: shorten register name defines
  bindings: rtc: correct wrong reference in required properties
  rtc: sun6i: Fix return value check in sun6i_rtc_clk_init()
  rtc: sun6i: extend test coverage
  rtc: sun6i: Fix compatibility with old DT binding
  rtc: snvs: add a missing write sync
  rtc: bq32000: add support to enable disable the trickle charge FET bypass
  ...
2017-02-27 19:59:21 -08:00
Jinbum Park
2959a5f726 mm: add arch-independent testcases for RODATA
This patch makes arch-independent testcases for RODATA.  Both x86 and
x86_64 already have testcases for RODATA, But they are arch-specific
because using inline assembly directly.

And cacheflush.h is not a suitable location for rodata-test related
things.  Since they were in cacheflush.h, If someone change the state of
CONFIG_DEBUG_RODATA_TEST, It cause overhead of kernel build.

To solve the above issues, write arch-independent testcases and move it
to shared location.

[jinb.park7@gmail.com: fix config dependency]
  Link: http://lkml.kernel.org/r/20170209131625.GA16954@pjb1027-Latitude-E5410
Link: http://lkml.kernel.org/r/20170129105436.GA9303@pjb1027-Latitude-E5410
Signed-off-by: Jinbum Park <jinb.park7@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Vegard Nossum
b279ddc338 mm: clarify mm_struct.mm_{users,count} documentation
Clarify documentation relating to mm_users and mm_count, and switch to
kernel-doc syntax.

Link: http://lkml.kernel.org/r/20161218123229.22952-4-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Vegard Nossum
3fce371bfa mm: add new mmget() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_users);/mmget\(\1\);/'
  git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_users);/mmget\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-2-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Vegard Nossum
f1f1007644 mm: add new mmgrab() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Masahiro Yamada
4091fb95b5 scripts/spelling.txt: add "followings" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  followings||following

While we are here, add a missing colon in the boilerplate in DT binding
documents.  The "you SoC" in allwinner,sunxi-pinctrl.txt was fixed as
well.

I reworded "as the followings:" to "as follows:" for
drivers/usb/gadget/udc/renesas_usb3.c.

Link: http://lkml.kernel.org/r/1481573103-11329-32-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Masahiro Yamada
8ab102d60a scripts/spelling.txt: add "partiton" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  partiton||partition

Link: http://lkml.kernel.org/r/1481573103-11329-7-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Masahiro Yamada
03440c4e5e scripts/spelling.txt: add "an union" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  an union||a union

Link: http://lkml.kernel.org/r/1481573103-11329-5-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Fabian Frederick
93407472a2 fs: add i_blocksize()
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Manfred Spraul
9de5ab8a2e ipc/sem: add hysteresis
sysv sem has two lock modes: One with per-semaphore locks, one lock mode
with a single global lock for the whole array.  When switching from the
per-semaphore locks to the global lock, all per-semaphore locks must be
scanned for ongoing operations.

The patch adds a hysteresis for switching from the global lock to the
per semaphore locks.  This reduces how often the per-semaphore locks
must be scanned.

Compared to the initial patch, this is a simplified solution: Setting
USE_GLOBAL_LOCK_HYSTERESIS to 1 restores the current behavior.

In theory, a workload with exactly 10 simple sops and then one complex
op now scales a bit worse, but this is pure theory: If there is
concurrency, the it won't be exactly 10:1:10:1:10:1:...  If there is no
concurrency, then there is no need for scalability.

Link: http://lkml.kernel.org/r/1476851896-3590-3-git-send-email-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: <1vier1@web.de>
Cc: kernel test robot <xiaolong.ye@intel.com>
Cc: <felixh@informatik.uni-bremen.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Tetsuo Handa
e3b5a342ab include/linux/pid.h: use for_each_thread() in do_each_pid_thread()
while_each_pid_thread() is using while_each_thread(), which is unsafe
under RCU lock according to commit 0c740d0afc ("introduce
for_each_thread() to replace the buggy while_each_thread()").  Use
for_each_thread() in do_each_pid_thread() which is safe under RCU lock.

Link: http://lkml.kernel.org/r/201702011947.DBD56740.OMVHOLOtSJFFFQ@I-love.SAKURA.ne.jp
Link: http://lkml.kernel.org/r/1486041779-4401-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:45 -08:00
Stas Sergeev
441398d378 sigaltstack: support SS_AUTODISARM for CONFIG_COMPAT
Currently SS_AUTODISARM is not supported in compatibility mode, but does
not return -EINVAL either.  This makes dosemu built with -m32 on x86_64
to crash.  Also the kernel's sigaltstack selftest fails if compiled with
-m32.

This patch adds the needed support.

Link: http://lkml.kernel.org/r/20170205101213.8163-2-stsp@list.ru
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Cc: Milosz Tanski <milosz@adfin.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:45 -08:00
Luis R. Rodriguez
7d134b2ce6 kprobes: move kprobe declarations to asm-generic/kprobes.h
Often all is needed is these small helpers, instead of compiler.h or a
full kprobes.h.  This is important for asm helpers, in fact even some
asm/kprobes.h make use of these helpers...  instead just keep a generic
asm file with helpers useful for asm code with the least amount of
clutter as possible.

Likewise we need now to also address what to do about this file for both
when architectures have CONFIG_HAVE_KPROBES, and when they do not.  Then
for when architectures have CONFIG_HAVE_KPROBES but have disabled
CONFIG_KPROBES.

Right now most asm/kprobes.h do not have guards against CONFIG_KPROBES,
this means most architecture code cannot include asm/kprobes.h safely.
Correct this and add guards for architectures missing them.
Additionally provide architectures that not have kprobes support with
the default asm-generic solution.  This lets us force asm/kprobes.h on
the header include/linux/kprobes.h always, but most importantly we can
now safely include just asm/kprobes.h on architecture code without
bringing the full kitchen sink of header files.

Two architectures already provided a guard against CONFIG_KPROBES on its
kprobes.h: sh, arch.  The rest of the architectures needed gaurds added.
We avoid including any not-needed headers on asm/kprobes.h unless
kprobes have been enabled.

In a subsequent atomic change we can try now to remove compiler.h from
include/linux/kprobes.h.

During this sweep I've also identified a few architectures defining a
common macro needed for both kprobes and ftrace, that of the definition
of the breakput instruction up.  Some refer to this as
BREAKPOINT_INSTRUCTION.  This must be kept outside of the #ifdef
CONFIG_KPROBES guard.

[mcgrof@kernel.org: fix arm64 build]
  Link: http://lkml.kernel.org/r/CAB=NE6X1WMByuARS4mZ1g9+W=LuVBnMDnh_5zyN0CLADaVh=Jw@mail.gmail.com
[sfr@canb.auug.org.au: fixup for kprobes declarations moving]
  Link: http://lkml.kernel.org/r/20170214165933.13ebd4f4@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170203233139.32682-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:45 -08:00
Linus Torvalds
79b17ea740 Merge tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
 "This release has no new tracing features, just clean ups, minor fixes
  and small optimizations"

* tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (25 commits)
  tracing: Remove outdated ring buffer comment
  tracing/probes: Fix a warning message to show correct maximum length
  tracing: Fix return value check in trace_benchmark_reg()
  tracing: Use modern function declaration
  jump_label: Reduce the size of struct static_key
  tracing/probe: Show subsystem name in messages
  tracing/hwlat: Update old comment about migration
  timers: Make flags output in the timer_start tracepoint useful
  tracing: Have traceprobe_probes_write() not access userspace unnecessarily
  tracing: Have COMM event filter key be treated as a string
  ftrace: Have set_graph_function handle multiple functions in one write
  ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock
  tracing: Reset parser->buffer to allow multiple "puts"
  ftrace: Have set_graph_functions handle write with RDWR
  ftrace: Reset fgd->hash in ftrace_graph_write()
  ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY
  ftrace: Create a slight optimization on searching the ftrace_hash
  tracing: Add ftrace_hash_key() helper function
  ftrace: Convert graph filter to use hash tables
  ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip
  ...
2017-02-27 13:26:17 -08:00
Christoph Hellwig
0d9f0a52c8 virtio_scsi: use virtio IRQ affinity
Use automatic IRQ affinity assignment in the virtio layer if available,
and build the blk-mq queues based on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:06 +02:00
Christoph Hellwig
73473427bb blk-mq: provide a default queue mapping for virtio device
Similar to the PCI version, just calling into virtio instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:05 +02:00
Christoph Hellwig
bbaba47956 virtio: provide a method to get the IRQ affinity mask for a virtqueue
This basically passed up the pci_irq_get_affinity information through
virtio through an optional get_vq_affinity method.  It is only implemented
by the PCI backend for now, and only when we use per-virtqueue IRQs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:05 +02:00
Christoph Hellwig
fb5e31d970 virtio: allow drivers to request IRQ affinity when creating VQs
Add a struct irq_affinity pointer to the find_vqs methods, which if set
is used to tell the PCI layer to create the MSI-X vectors for our I/O
virtqueues with the proper affinity from the start.  Compared to after
the fact affinity hints this gives us an instantly working setup and
allows to allocate the irq descritors node-local and avoid interconnect
traffic.  Last but not least this will allow blk-mq queues are created
based on the interrupt affinity for storage drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:04 +02:00
Chao Yu
4ac912427c f2fs: introduce free nid bitmap
In scenario of intensively node allocation, free nids will be ran out
soon, then it needs to stop to load free nids by traversing NAT blocks,
in worse case, if NAT blocks does not be cached in memory, it generates
IOs which slows down our foreground operations.

In order to speed up node allocation, in this patch we introduce a new
free_nid_bitmap array, so there is an bitmap table for each NAT block,
Once the NAT block is loaded, related bitmap cache will be switched on,
and bitmap will be set during traversing nat entries in NAT block, later
we can query and update nid usage status in memory completely.

With such implementation, I expect performance of node allocation can be
improved in the long-term after filesystem image is mounted.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:47 -08:00
Jaegeuk Kim
22ad0b6ab4 f2fs: add bitmaps for empty or full NAT blocks
This patches adds bitmaps to represent empty or full NAT blocks containing
free nid entries.

If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
use these bitmaps when building free nids. In order to avoid checkpointing
burden, up-to-date bitmaps will be flushed only during umount time. So,
normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
which recovers this bitmap again.

After this patch, we build free nids from nid #0 at mount time to make more
full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
without loading any NAT pages from disk.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:54 -08:00
Michael S. Tsirkin
51be7a9a26 virtio_mmio: expose header to userspace
It's handy for userspace emulators like QEMU.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 16:31:23 +02:00
Rafael J. Wysocki
6dbf5cea05 cpuidle: menu: Avoid taking spinlock for accessing QoS values
After commit 9908859aca (cpuidle/menu: add per CPU PM QoS resume
latency consideration) the cpuidle menu governor calls
dev_pm_qos_read_value() on CPU devices to read the current resume
latency QoS constraint values for them.  That function takes a spinlock
to prevent the device's power.qos pointer from becoming NULL during
the access which is a problem for the RT patchset where spinlocks are
converted into mutexes and the idle loop stops working.

However, it is not even necessary for the menu governor to take
that spinlock, because the power.qos pointer accessed under it
cannot be modified during the access anyway.

For this reason, introduce a "raw" routine for accessing device
QoS resume latency constraints without locking and use it in the
menu governor.

Fixes: 9908859aca (cpuidle/menu: add per CPU PM QoS resume latency consideration)
Acked-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-27 15:07:38 +01:00
Linus Torvalds
e5d56efc97 Merge tag 'watchdog-for-linus-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull watchdog updates from Guenter Roeck:
 "Wim asked me to handle the watchdog pull request this time around.

  Key changes:

   - New drivers: Cortina Gemini, ZTE's zx2967 family, NIC7018

   - Convert to use device managed functions: ebc-c384_wdt, tegra_wdt,
     da9063_wdt, da9062_wdt, da9055_wdt, da9052_wdt, bcm2835_wdt,
     mena21_wdt, wm831x_wdt, digicolor_wdt, intel-mid_wdt, meson_wdt,
     sunxi_wdt, aspeed_wdt, coh901327_wdt, iTCO_wdt

   - Use watchdog core to install restart handler: tangox, dw_wdt,
     bcm2835_wdt, asm9260_wdt, bcm47xx_wdt

   - Convert ts72xx_wdt driver to watchdog core

   - Let core handle heartbeat in ep93xx_wdt driver

   - Enable COMPILE_TEST where possible

   - Various other improvements"

* tag 'watchdog-for-linus-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (54 commits)
  watchdog: s3c2410: Add prefix to local function
  watchdog: s3c2410: Select MFD_SYSCON on all Exynos platforms
  watchdog: s3c2410: Use dev_dbg instead of pr_info
  watchdog: s3c2410: Fix infinite interrupt in soft mode
  watchdog: s3c2410: Remove confusing CONFIG prefix from local defines
  watchdog: softdog: make pretimeout support a compile option
  watchdog: zx2967: add watchdog controller driver for ZTE's zx2967 family
  dt: bindings: add documentation for zx2967 family watchdog controller
  watchdog: sama5d4: Implement resume hook
  watchdog: sama5d4: Cache MR instead of a partial config
  watchdog: ts72xx_wdt: convert driver to watchdog core
  watchdog: ep93xx_wdt: cleanup and let the core handle the heartbeat
  watchdog: RDC321X_WDT always depends on PCI
  watchdog: add driver for Cortina Gemini watchdog
  watchdog: add DT bindings for Cortina Gemini
  watchdog: constify watchdog_ops structures
  watchdog: Introduce watchdog_stop_on_unregister helper
  watchdog: ebc-c384_wdt: Utilize devm_ functions in driver probe callback
  watchdog: tegra_wdt: Convert to use device managed functions
  watchdog: da9063_wdt: Convert to use device managed functions
  ...
2017-02-26 13:19:17 -08:00
Linus Torvalds
5d8a00eee2 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
 "The usual collection of new drivers, non-critical fixes, and updates
  to existing clk drivers. The bulk of the work is on Allwinner and
  Rockchip SoCs, but there's also an Intel Atom driver in here too.

  New Drivers:
   - Tegra BPMP firmware
   - Hisilicon hi3660 SoCs
   - Rockchip rk3328 SoCs
   - Intel Atom PMC
   - STM32F746
   - IDT VersaClock 5P49V5923 and 5P49V5933
   - Marvell mv98dx3236 SoCs
   - Allwinner V3s SoCs

  Removed Drivers:
   - Samsung Exynos4415 SoCs

  Updates:
   - Migrate ABx500 to OF
   - Qualcomm IPQ4019 CPU clks and general PLL support
   - Qualcomm MSM8974 RPM
   - Rockchip non-critical fixes and clk id additions
   - Samsung Exynos4412 CPUs
   - Socionext UniPhier NAND and eMMC support
   - ZTE zx296718 i2s and other audio clks
   - Renesas CAN and MSIOF clks for R-Car M3-W
   - Renesas resets for R-Car Gen2 and Gen3 and RZ/G1
   - TI CDCE913, CDCE937, and CDCE949 clk generators
   - Marvell Armada ap806 CPU frequencies
   - STM32F4* I2S/SAI support
   - Broadcom BCM2835 DSI support
   - Allwinner sun5i and A80 conversion to new style clk bindings"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (130 commits)
  clk: renesas: mstp: ensure register writes complete
  clk: qcom: Do not drop device node twice
  clk: mvebu: adjust clock handling for the CP110 system controller
  clk: mvebu: Expand mv98dx3236-core-clock support
  clk: zte: add i2s clocks for zx296718
  clk: sunxi-ng: sun9i-a80: Fix wrong pointer passed to PTR_ERR()
  clk: sunxi-ng: select SUNXI_CCU_MULT for sun5i
  clk: sunxi-ng: Check kzalloc() for errors and cleanup error path
  clk: tegra: Add BPMP clock driver
  clk: uniphier: add eMMC clock for LD11 and LD20 SoCs
  clk: uniphier: add NAND clock for all UniPhier SoCs
  ARM: dts: sun9i: Switch to new clock bindings
  clk: sunxi-ng: Add A80 Display Engine CCU
  clk: sunxi-ng: Add A80 USB CCU
  clk: sunxi-ng: Add A80 CCU
  clk: sunxi-ng: Support separately grouped PLL lock status register
  clk: sunxi-ng: mux: Get closest parent rate possible with CLK_SET_RATE_PARENT
  clk: sunxi-ng: mux: honor CLK_SET_RATE_NO_REPARENT flag
  clk: sunxi-ng: mux: Fix determine_rate for mux clocks with pre-dividers
  clk: qcom: SDHCI enablement on Nexus 5X / 6P
  ...
2017-02-25 14:28:06 -08:00
Linus Torvalds
7067739df2 Merge branch 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "I2C has for you two new drivers (Tegra BPMP and STM32F4), interrupt
  support for pca954x muxes, and a bunch of driver bugfixes and
  improvements. Nothing really special this cycle.

  A few commits have been added to my tree just recently. Those are the
  Tegra BPMP driver and a few straightforward bugfixes or cleanups which
  I prefer to have upstream rather soonish. The rest had proper
  linux-next exposure"

* 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (25 commits)
  i2c: thunderx: Replace pci_enable_msix()
  i2c: exynos5: fix arbitration lost handling
  i2c: exynos5: disable fifo-almost-empty irq signal when necessary
  i2c: at91: ensure state is restored after suspending
  i2c: bcm2835: Avoid possible NULL ptr dereference
  i2c: Add Tegra BPMP I2C proxy driver
  dt-bindings: Add Tegra186 BPMP I2C binding
  misc: eeprom: at24: use device_property_*() functions instead of of_get_property()
  i2c: mux: pca954x: Add interrupt controller support
  dt: bindings: i2c-mux-pca954x: Add documentation for interrupt controller
  i2c: mux: pca954x: Add missing pca9542 definition to chip_desc
  i2c: riic: correctly finish transfers
  i2c: i801: Add support for Intel Gemini Lake
  i2c: mux: pca9541: Export OF device ID table as module aliases
  i2c: mux: pca954x: Export OF device ID table as module aliases
  i2c: mux: mlxcpld: remove unused including <linux/version.h>
  i2c: busses: constify i2c_algorithm structures
  i2c: i2c-mux-gpio: rename i2c-gpio-mux to i2c-mux-gpio
  i2c: sh_mobile: document support for r8a7796 (R-Car M3-W)
  i2c: i2c-cros-ec-tunnel: Reduce logging noise
  ...
2017-02-25 14:21:18 -08:00