Commit Graph

663177 Commits

Author SHA1 Message Date
Lendacky, Thomas
402168b4c2 amd-xgbe: Stop the PHY before releasing interrupts
Some configurations require the use of the hardware's MDIO support to
communicate with external PHYs. The MDIO commands indicate completion
through the device interrupt. When bringing down the device the interrupts
were released before stopping the external PHY, resulting in MDIO command
timeouts. Move the stopping of the PHY to before the releasing of the
interrupts.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-02 12:57:53 -08:00
Radim Krčmář
16ce771b93 Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into next
Two bug fixes for HV KVM on POWER9 machines.
2017-03-02 21:53:25 +01:00
Alban Bedel
9aea7779b7 drivers: net: xgene: Fix crash on DT systems
On DT systems the driver require a clock, but the probe just print a
warning and continue, leading to a crash when resetting the device.
To fix this crash and properly handle probe deferals only ignore the
missing clock if DT isn't used or if the clock doesn't exist.

Signed-off-by: Alban Bedel <alban.bedel@avionic-design.de>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-02 12:47:23 -08:00
Linus Torvalds
4f1f2b8f08 Merge tag 'watchdog-for-linus-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull more watchdog updates from Guenter Roeck:

 - fix fallout from enabling COMPILE_TEST

 - fix gcc-4.3 build of kempld watchdog driver

 - use hrtimer in softdog

* tag 'watchdog-for-linus-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  watchdog: retu: restore MFD dependency
  watchdog: db8500: add back prmcu dependency
  watchdog: kempld: fix gcc-4.3 build
  watchdog: softdog: fire watchdog even if softirqs do not get to run
  watchdog: kempld: revert to full dependency
  watchdog: bcm2835: add CONFIG_OF dependency
  watchdog: sp805: add back AMBA dependency
  watchdog: menf21bmc: add I2C dependency
  watchdog: geode: restore hard CS5535_MFGPT dependency
  watchdog: wm831x watchdog really needs mfd
2017-03-02 12:45:46 -08:00
WANG Cong
e3330039ea ipv6: check for ip6_null_entry in __ip6_del_rt_siblings()
Andrey reported a NULL pointer deref bug in ipv6_route_ioctl()
-> ip6_route_del() -> __ip6_del_rt_siblings() code path. This is
because ip6_null_entry is returned in this path since ip6_null_entry
is kinda default for a ipv6 route table root node. Quote from
David Ahern:

 ip6_null_entry is the root of all ipv6 fib tables making it integrated
 into the table ...

We should ignore any attempt of trying to delete it, like we do in
__ip6_del_rt() path and several others.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Fixes: 0ae8133586 ("net: ipv6: Allow shorthand delete of all nexthops in multipath route")
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-02 12:43:47 -08:00
Linus Torvalds
474c90156c give up on gcc ilog2() constant optimizations
gcc-7 has an "optimization" pass that completely screws up, and
generates the code expansion for the (impossible) case of calling
ilog2() with a zero constant, even when the code gcc compiles does not
actually have a zero constant.

And we try to generate a compile-time error for anybody doing ilog2() on
a constant where that doesn't make sense (be it zero or negative).  So
now gcc7 will fail the build due to our sanity checking, because it
created that constant-zero case that didn't actually exist in the source
code.

There's a whole long discussion on the kernel mailing about how to work
around this gcc bug.  The gcc people themselevs have discussed their
"feature" in

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

but it's all water under the bridge, because while it looked at one
point like it would be solved by the time gcc7 was released, that was
not to be.

So now we have to deal with this compiler braindamage.

And the only simple approach seems to be to just delete the code that
tries to warn about bad uses of ilog2().

So now "ilog2()" will just return 0 not just for the value 1, but for
any non-positive value too.

It's not like I can recall anybody having ever actually tried to use
this function on any invalid value, but maybe the sanity check just
meant that such code never made it out in public.

Reported-by: Laura Abbott <labbott@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-02 12:17:22 -08:00
Linus Walleij
3131d970f0 ARM: ux500: resume the second core properly
The pen hold/release scheme was copied over to Ux500 from the ARM
reference designs like most of these at the time. It is not needed
at all, and was mostly removed in commit c00def71ef
"ARM: ux500: simplify secondary CPU boot".

However on the suspend/resume path and hot plug/unplug of CPUs,
the .cpu_die() callback was still waiting for the pen to be
released which made it spin forever and the second core never come
back online after suspend/resume.

Fix this by simply replacing the strange custom .cpu_die() with
a oneline wfi() just like e.g. the qcom platform does. This fixes
the issue and makes the second core come up properly after
suspend/resume.

As a side effect, this rids us of the completely surplus local
setup.h and hotplug.c files, and we just compile this into platsmp.c
with everything else SMP.

Cc: stable@vger.kernel.org
Fixes: c00def71ef ("ARM: ux500: simplify secondary CPU boot")
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-03-02 17:53:17 +01:00
Arnd Bergmann
d4b80d9aac Merge branch 'next/late' with mainline
* next/late: (25 commits)
  arm64: dts: exynos: Add regulators for Vbus and Vbus-Boost
  arm64: dts: exynos: Add USB 3.0 controller node for Exynos7
  arm64: dts: exynos: Use macros for pinctrl configuration on Exynos7
  pinctrl: dt-bindings: samsung: Add Exynos7 specific pinctrl macro definitions
  arm64: dts: exynos: Add initial configuration for DISP clocks for TM2/TM2e
  ARM64: dts: meson-gxbb-p200: add ADC laddered keys
  ARM64: dts: meson: meson-gx: add the SAR ADC
  ARM64: dts: meson-gxl: add the pwm_ao_b pin
  ARM64: dts: meson-gx: add the missing pwm_AO_ab node
  clk: gxbb: fix CLKID_ETH defined twice
  clk: samsung: exynos5433: Add data for 250MHz and 278MHz PLL rates
  clk: samsung: exynos5433: Add IDs for PHYCLK_MIPIDPHY0_* clocks
  ARM64: dts: meson-gxl: rename Nexbox A95x for consistency
  clk: gxbb: add the SAR ADC clocks and expose them
  dt-bindings: amlogic: Add WeTek boards
  ARM64: dts: meson-gxbb: Add support for WeTek Hub and Play
  dt-bindings: vendor-prefix: Add wetek vendor prefix
  ARM64: dts: meson-gxm: Rename q200 and q201 DT files for consistency
  ARM64: dts: meson-gx: Add HDMI HPD/DDC pinctrl nodes
  ARM64: dts: meson-gxbb-vega-s95: Add LED
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-03-02 17:52:44 +01:00
Janosch Frank
2e4d88009f KVM: s390: Fix guest migration for huge guests resulting in panic
While we can technically not run huge page guests right now, we can
setup a guest with huge pages. Trying to migrate it will trigger a
VM_BUG_ON and, if the kernel is not configured to panic on a BUG, it
will happily try to work on non-existing page table entries.

With this patch, we always return "dirty" if we encounter a large page
when migrating. This at least fixes the immediate problem until we
have proper handling for both kind of pages.

Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.")
Cc: <stable@vger.kernel.org> # 3.16+

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-02 17:17:16 +01:00
Heiko Carstens
7afbeb6df2 s390/ipl: always use load normal for CCW-type re-IPL
commit 14890678687c ("s390/ipl: use load normal for LPAR re-ipl")
missed to convert one code path to use load normal semantics for
re-IPL. Convert the missing code path as well.

Fixes: 14890678687c ("s390/ipl: use load normal for LPAR re-ipl")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-02 17:17:15 +01:00
Jan Kara
a5a79d0001 block: Initialize bd_bdi on inode initialization
So far we initialized bd_bdi only in bdget(). That is fine for normal
bdev inodes however for the special case of the root inode of
blockdev_superblock that function is never called and thus bd_bdi is
left uninitialized. As a result bdev_evict_inode() may oops doing
bdi_put(root->bd_bdi) on that inode as can be seen when doing:

mount -t bdev none /mnt

Fix the problem by initializing bd_bdi when first allocating the inode
and then reinitializing bd_bdi in bdev_evict_inode().

Thanks to syzkaller team for finding the problem.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: b1d2dc5659 ("block: Make blk_get_backing_dev_info() safe without open bdev")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:59 -07:00
Omar Sandoval
e02898b423 loop: fix LO_FLAGS_PARTSCAN hang
loop_reread_partitions() needs to do I/O, but we just froze the queue,
so we end up waiting forever. This can easily be reproduced with losetup
-P. Fix it by moving the reread to after we unfreeze the queue.

Fixes: ecdd09597a ("block/loop: fix race between I/O and set_status")
Reported-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:59 -07:00
Keith Busch
302ad8cc09 nvme: Complete all stuck requests
If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:59 -07:00
Keith Busch
f91328c40a blk-mq: Provide freeze queue timeout
A driver may wish to take corrective action if queued requests do not
complete within a set time.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Keith Busch
6bae363ee3 blk-mq: Export blk_mq_freeze_queue_wait
Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Josef Bacik
6a8a215465 nbd: stop leaking sockets
This was introduced in the multi-connection patch, we've been leaking
socket's ever since.

Fixes: 9561a7a ("nbd: add multi-connection support")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Omar Sandoval
562bef4259 blk-mq: move update of tags->rqs to __blk_mq_alloc_request()
No functional difference, it just makes a little more sense to update
the tag map where we actually allocate the tag.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-02 08:56:04 -07:00
Omar Sandoval
5974839899 blk-mq: kill blk_mq_set_alloc_data()
Nothing is using it anymore.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-02 08:56:04 -07:00
Omar Sandoval
6d2809d51a blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
its blk_mq_alloc_request() counterpart. It also crashes because it
doesn't update the tags->rqs map.

Fix it by making it allocate a scheduler request.

Reported-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-02 08:56:04 -07:00
Sagi Grimberg
415b806de5 blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Modified by me to also check at driver tag allocation time if the
original request was reserved, so we can be sure to allocate a
properly reserved tag at that point in time, too.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shaohua Li
d3af3ecdc6 nvme: allocate nvme_queue in correct node
nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq
will use it.

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shaohua Li
27ddb68990 PCI: add an API to get node from vector
Next patch will use the API to get the node from vector for nvme device

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shaohua Li
59f082e464 blk-mq: allocate blk_mq_tags and requests in correct node
blk_mq_tags/requests of specific hardware queue are mostly used in
specific cpus, which might not be in the same numa node as disk. For
example, a nvme card is in node 0. half hardware queue will be used by
node 0, the other node 1.

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shuah Khan
e53aff45c4 selftests: lib.mk Fix individual test builds
In commit a8ba798bc8 ("selftests: enable O and KBUILD_OUTPUT"), added
support to generate compile targets in a user specified directory. OUTPUT
variable controls the location which is undefined when tests are built in
the test directory or with "make -C tools/testing/selftests/x86".

make -C tools/testing/selftests/x86/
make: Entering directory '/lkml/linux_4.11/tools/testing/selftests/x86'
Makefile:44: warning: overriding recipe for target 'clean'
../lib.mk:51: warning: ignoring old recipe for target 'clean'
gcc -m64 -o /single_step_syscall_64 -O2 -g -std=gnu99 -pthread -Wall  single_step_syscall.c -lrt -ldl
/usr/bin/ld: cannot open output file /single_step_syscall_64: Permission denied
collect2: error: ld returned 1 exit status
Makefile:50: recipe for target '/single_step_syscall_64' failed
make: *** [/single_step_syscall_64] Error 1
make: Leaving directory '/lkml/linux_4.11/tools/testing/selftests/x86'

Same failure with "cd tools/testing/selftests/x86/;make" run.

Fix this with a change to lib.mk to define OUTPUT to be the pwd when
MAKELEVEL is 0. This covers both cases mentioned above.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-03-02 07:53:01 -07:00
Al Viro
653a7746fa Merge remote-tracking branch 'ovl/for-viro' into for-linus
Overlayfs-related series from Miklos and Amir
2017-03-02 06:41:22 -05:00
Al Viro
f6c99aad4d Merge branch 'work.namei' into for-linus 2017-03-02 06:41:12 -05:00
Peter Zijlstra
0695d7dc1d orangefs: Use RCU for destroy_inode
freeing of inodes must be RCU-delayed on all filesystems

Cc: stable@vger.kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 06:40:36 -05:00
Paulo Flabiano Smorigo
5839f555fa crypto: vmx - Use skcipher for xts fallback
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Paulo Flabiano Smorigo <pfsmorigo@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-02 18:57:31 +08:00
Paulo Flabiano Smorigo
c96d0a1c47 crypto: vmx - Use skcipher for cbc fallback
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Paulo Flabiano Smorigo <pfsmorigo@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-02 18:57:30 +08:00
Waldemar Rymarkiewicz
1657b8f84e ath10k: search SMBIOS for OEM board file extension
Board Data File (BDF) is loaded upon driver boot-up procedure. The right
board data file is identified, among others, by device and sybsystem ids.

The problem, however, can occur when the (default) board data file cannot
fulfill with the vendor requirements and it is necessary to use a different
board data file.

To solve the issue QCA uses SMBIOS type 0xF8 to store Board Data File Name
Extension to specify the extension/variant name. The driver will take the
extension suffix into consideration and will load the right (non-default)
board data file if necessary.

If it is unnecessary to use extension board data file, please leave the
SMBIOS field blank and default configuration will be used.

Example:
If a default board data file for a specific board is identified by a string
      "bus=pci,vendor=168c,device=003e,subsystem-vendor=1028,
       subsystem-device=0310"
then the OEM specific data file, if used, could be identified by variant
suffix:
      "bus=pci,vendor=168c,device=003e,subsystem-vendor=1028,
       subsystem-device=0310,variant=DE_1AB"

If board data file name extension is set but board-2.bin does not contain
board data file for the variant, the driver will fallback to the default
board data file not to break backward compatibility.

This was first applied in commit f2593cb1b2 ("ath10k: Search SMBIOS for OEM
board file extension") but later reverted in commit 005c3490e9 ("Revert
"ath10k: Search SMBIOS for OEM board file extension"". This patch is now
otherwise the same as commit f2593cb1b2 except the regression fixed.

Signed-off-by: Waldemar Rymarkiewicz <ext.waldemar.rymarkiewicz@tieto.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2017-03-02 10:48:41 +02:00
Thomas Gleixner
bb1a2c2616 x86/hpet: Prevent might sleep splat on resume
Sergey reported a might sleep warning triggered from the hpet resume
path. It's caused by the call to disable_irq() from interrupt disabled
context.

The problem with the low level resume code is that it is not accounted as a
special system_state like we do during the boot process. Calling the same
code during system boot would not trigger the warning. That's inconsistent
at best.

In this particular case it's trivial to replace the disable_irq() with
disable_hardirq() because this particular code path is solely used from
system resume and the involved hpet interrupts can never be force threaded.


Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703012108460.3684@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-02 09:33:47 +01:00
Boqun Feng
857811a371 locking/ww_mutex: Adjust the lock number for stress test
Because there are only 12 bits in held_lock::references, so we only
support 4095 nested lock held in the same time, adjust the lock number
for ww_mutex stress test to kill one lockdep splat:

  [ ] [ BUG: bad unlock balance detected! ]
  [ ] kworker/u2:0/5 is trying to release lock (ww_class_mutex) at:
  [ ] ww_mutex_unlock()
  [ ] but there are no more locks to release!
  ...

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170301150138.hdixnmafzfsox7nn@tardis.cn.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 09:00:39 +01:00
Peter Zijlstra
7fb4a2cea6 locking/lockdep: Add nest_lock integrity test
Boqun reported that hlock->references can overflow. Add a debug test
for that to generate a clear error when this happens.

Without this, lockdep is likely to report a mysterious failure on
unlock.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 09:00:38 +01:00
Chris Wilson
2b232e0c3b locking/ww_mutex: Replace cpu_relax() with cond_resched() for tests
When busy-spinning on a ww_mutex_trylock(), we depend upon the other
thread advancing and releasing the lock. This can not happen on a single
CPU unless we relinquish it:

  [ ] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:18]
  ...
  [ ] Call Trace:
  [ ]  mutex_trylock()
  [ ]  test_mutex_work+0x31/0x56
  [ ]  process_one_work+0x1b4/0x2f9
  [ ]  worker_thread+0x1b0/0x27c
  [ ]  kthread+0xd1/0xd3
  [ ]  ret_from_fork+0x19/0x30

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f2a5fec173 ("locking/ww_mutex: Begin kselftests for ww_mutex")
Link: http://lkml.kernel.org/r/20170228094011.2595-1-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 09:00:38 +01:00
Ingo Molnar
e2679aecf1 Merge branch 'linus' into locking/urgent, to pick up fixes 2017-03-02 08:51:32 +01:00
Peter Zijlstra
f94c8d1169 sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
Wanpeng Li reported that since the following commit:

  acb04058de ("sched/clock: Fix hotplug crash")

... KVM always runs with unstable sched-clock even though KVM's
kvm_clock _is_ stable.

The problem is that we've tied clear_sched_clock_stable() to the TSC
state, and overlooked that sched_clock() is a paravirt function.

Solve this by doing two things:

 - tie the sched_clock() stable state more clearly to the TSC stable
   state for the normal (!paravirt) case.

 - only call clear_sched_clock_stable() when we mark TSC unstable
   when we use native_sched_clock().

The first means we can actually run with stable sched_clock in more
situations then before, which is good. And since commit:

  12907fbb1a ("sched/clock, clocksource: Add optional cs::mark_unstable() method")

... this should be reliable. Since any detection of TSC fail now results
in marking the TSC unstable.

Reported-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: acb04058de ("sched/clock: Fix hotplug crash")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:50:49 +01:00
Peter Zijlstra
0ba87bb27d sched/core: Fix pick_next_task() for RT,DL
Pavan noticed that the following commit:

  49ee576809 ("sched/core: Optimize pick_next_task() for idle_sched_class")

... broke RT,DL balancing by robbing them of the opportinty to do new-'idle'
balancing when their last runnable task (on that runqueue) goes away.

Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 49ee576809 ("sched/core: Optimize pick_next_task() for idle_sched_class")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:50:17 +01:00
Peter Zijlstra
4c77b18cf8 sched/fair: Make select_idle_cpu() more aggressive
Kitsunyan reported desktop latency issues on his Celeron 887 because
of commit:

  1b568f0aab ("sched/core: Optimize SCHED_SMT")

... even though his CPU doesn't do SMT.

The effect of running the SMT code on a !SMT part is basically a more
aggressive select_idle_cpu(). Removing the avg condition fixed things
for him.

I also know FB likes this test gone, even though other workloads like
having it.

For now, take it out by default, until we get a better idea.

Reported-by: kitsunyan <kitsunyan@inbox.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:50:17 +01:00
Ingo Molnar
de8f1c7731 sched/headers: Move autogroup APIs into <linux/sched/autogroup.h>
Further reduce the size of sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:43 +01:00
Ingo Molnar
dea38c74cb sched/headers: Move loadavg related definitions from <linux/sched.h> to <linux/sched/loadavg.h>
Move these bits to <linux/sched/loadavg.h>, to reduce the size and
complexity of <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:43 +01:00
Ingo Molnar
e2d1e2aec5 sched/headers: Move various ABI definitions to <uapi/linux/sched/types.h>
Move scheduler ABI types (struct sched_attr, struct sched_param, etc.) into
the new UAPI header.

This further reduces the size and complexity of <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
47913d4ebd sched/headers, delayacct: Move the 'struct task_delay_info' definition from <linux/sched.h> to <linux/delayacct.h>
The 'struct task_delay_info' definition does not have to be in sched.h,
because task_struct only has a pointer to it.

So move it to <linux/delayacct.h> to reduce the size of <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
5689810360 sched/headers: Move scheduler clock interfaces to <linux/sched/clock.h>
Move the sched_clock interfaces into a separate header file, to reduce
the size of sched.h.

Include <linux/sched/clock.h> in all files that made use of one of the
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
eb61baf698 sched/headers: Move the wake-queue types and interfaces from sched.h into <linux/sched/wake_q.h>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
5dbe91de59 sched/headers: Move idle polling methods to <linux/sched/idle.h>
Further reduce the size of <linux/sched.h> by moving these APIs:

	tsk_is_polling()
	__current_set_polling()
	current_set_polling_and_test()
	__current_clr_polling()
	current_clr_polling_and_test()
	current_clr_polling()

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
4437722b04 sched/headers: Move the wake_up_if_idle() prototype to <linux/sched/idle.h>
No need to clutter <linux/sched.h> with this rarely used prototype.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
b768917d2c sched/headers: Move the 'cpu_idle_type' enum from <linux/sched.h> to <linux/sched/idle.h>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
a60b9eda67 sched/headers: Move scheduler topology interfaces to <linux/sched/topology.h>
The vast majority of sched.h users does not require the topology types and
interfaces, so split them out into <linux/sched/topology.h>.

This reduces the size of linux/sched.h by ~6%.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
b69339ba10 sched/headers: Prepare to remove spurious <linux/sched.h> inclusion dependencies
In the following patches we are going to remove various headers
from sched.h and other headers that sched.h includes.

To make those patches build cleanly prepare the scene by adding
dependencies to various files that learned to rely on those
to-be-removed dependencies.

These changes all make sense standalone: they add a header for
a data type that a particular .c or .h file is using.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
50d34394ce sched/headers: Prepare to remove the <linux/magic.h> include from <linux/sched/task_stack.h>
Update files that depend on the magic.h inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00