Commit Graph

72909 Commits

Author SHA1 Message Date
Ahmed S. Darwish
932e463652 seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
raw_seqcount_begin() has the same code as raw_read_seqcount(), with the
exception of masking the sequence counter's LSB before returning it to
the caller.

Note, raw_seqcount_begin() masks the counter's LSB before returning it
to the caller so that read_seqcount_retry() can fail if the counter is
odd -- without the overhead of an extra branching instruction.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-7-a.darwish@linutronix.de
2020-07-29 16:14:24 +02:00
Ahmed S. Darwish
89b88845e0 seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock.h is now included by kernel's RST documentation, but a small
number of the the exported seqlock.h functions are kernel-doc annotated.

Add kernel-doc for all seqlock.h exported APIs.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-6-a.darwish@linutronix.de
2020-07-29 16:14:23 +02:00
Ahmed S. Darwish
f4a27cbcec seqlock: Reorder seqcount_t and seqlock_t API definitions
The seqlock.h seqcount_t and seqlock_t API definitions are presented in
the chronological order of their development rather than the order that
makes most sense to readers. This makes it hard to follow and understand
the header file code.

Group and reorder all of the exported seqlock.h functions according to
their function.

First, group together the seqcount_t standard read path functions:

    - __read_seqcount_begin()
    - raw_read_seqcount_begin()
    - read_seqcount_begin()

since each function is implemented exactly in terms of the one above
it. Then, group the special-case seqcount_t readers on their own as:

    - raw_read_seqcount()
    - raw_seqcount_begin()

since the only difference between the two functions is that the second
one masks the sequence counter LSB while the first one does not. Note
that raw_seqcount_begin() can actually be implemented in terms of
raw_read_seqcount(), which will be done in a follow-up commit.

Then, group the seqcount_t write path functions, instead of injecting
unrelated seqcount_t latch functions between them, and order them as:

    - raw_write_seqcount_begin()
    - raw_write_seqcount_end()
    - write_seqcount_begin_nested()
    - write_seqcount_begin()
    - write_seqcount_end()
    - raw_write_seqcount_barrier()
    - write_seqcount_invalidate()

which is the expected natural order. This also isolates the seqcount_t
latch functions into their own area, at the end of the sequence counters
section, and before jumping to the next one: sequential locks
(seqlock_t).

Do a similar grouping and reordering for seqlock_t "locking" readers vs.
the "conditionally locking or lockless" ones.

No implementation code was changed in any of the reordering above.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-5-a.darwish@linutronix.de
2020-07-29 16:14:23 +02:00
Ahmed S. Darwish
d3b35b87f4 seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
The seqcount_t latch reader example at the raw_write_seqcount_latch()
kernel-doc comment ends the latch read section with a manual smp memory
barrier and sequence counter comparison.

This is technically correct, but it is suboptimal: read_seqcount_retry()
already contains the same logic of an smp memory barrier and sequence
counter comparison.

End the latch read critical section example with read_seqcount_retry().

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-4-a.darwish@linutronix.de
2020-07-29 16:14:23 +02:00
Ahmed S. Darwish
15cbe67bbd seqlock: Properly format kernel-doc code samples
Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-3-a.darwish@linutronix.de
2020-07-29 16:14:23 +02:00
Ahmed S. Darwish
0d24f65e93 Documentation: locking: Describe seqlock design and usage
Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.9.

References: 0f6ed63b17 ("no need to keep brlock macros anymore...")
References: 6ec4476ac8 ("Raise gcc version requirement to 4.9")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de
2020-07-29 16:14:22 +02:00
Peter Zijlstra
f05d67179d Merge branch 'locking/header' 2020-07-29 16:14:21 +02:00
Herbert Xu
7ca8cf5347 locking/atomic: Move ATOMIC_INIT into linux/types.h
This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au
2020-07-29 16:14:18 +02:00
Herbert Xu
e885d5d947 lockdep: Move list.h inclusion into lockdep.h
Currently lockdep_types.h includes list.h without actually using any
of its macros or functions.  All it needs are the type definitions
which were moved into types.h long ago.  This potentially causes
inclusion loops because both are included by many core header
files.

This patch moves the list.h inclusion into lockdep.h.  Note that
we could probably remove it completely but that could potentially
result in compile failures should any end users not include list.h
directly and also be unlucky enough to not get list.h via some other
header file.

Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20200716063649.GA23065@gondor.apana.org.au
2020-07-28 10:45:46 +02:00
Ingo Molnar
c84d53051f Merge tag 'v5.8-rc6' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-25 21:49:36 +02:00
Linus Torvalds
43768f7ce0 Merge tag 'sched-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull scheduler fixes from Thomas Gleixner:
 "A set of scheduler fixes:

   - Plug a load average accounting race which was introduced with a
     recent optimization casing load average to show bogus numbers.

   - Fix the rseq CPU id initialization for new tasks. sched_fork() does
     not update the rseq CPU id so the id is the stale id of the parent
     task, which can cause user space data corruption.

   - Handle a 0 return value of task_h_load() correctly in the load
     balancer, which does not decrease imbalance and therefore pulls
     until the maximum number of loops is reached, which might be all
     tasks just created by a fork bomb"

* tag 'sched-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: handle case of task_h_load() returning 0
  sched: Fix unreliable rseq cpu_id for new tasks
  sched: Fix loadavg accounting race
2020-07-19 11:55:24 -07:00
Linus Torvalds
8c18fc6344 Merge tag 'dma-mapping-5.8-6' of git://git.infradead.org/users/hch/dma-mapping into master
Pull dma-mapping fixes from Christoph Hellwig:
 "Ensure we always have fully addressable memory in the dma coherent
  pool (Nicolas Saenz Julienne)"

* tag 'dma-mapping-5.8-6' of git://git.infradead.org/users/hch/dma-mapping:
  dma-pool: do not allocate pool memory from CMA
  dma-pool: make sure atomic pool suits device
  dma-pool: introduce dma_guess_pool()
  dma-pool: get rid of dma_in_atomic_pool()
  dma-direct: provide function to check physical memory area validity
2020-07-19 11:28:23 -07:00
Linus Torvalds
0dd68a34ec Merge tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into master
Pull fuse fixes from Miklos Szeredi:

 - two regressions in this cycle caused by the conversion of writepage
   list to an rb_tree

 - two regressions in v5.4 cause by the conversion to the new mount API

 - saner behavior of fsconfig(2) for the reconfigure case

 - an ancient issue with FS_IOC_{GET,SET}FLAGS ioctls

* tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS
  fuse: don't ignore errors from fuse_writepages_fill()
  fuse: clean up condition for writepage sending
  fuse: reject options on reconfigure via fsconfig(2)
  fuse: ignore 'data' argument of mount(..., MS_REMOUNT)
  fuse: use ->reconfigure() instead of ->remount_fs()
  fuse: fix warning in tree_insert() and clean up writepage insertion
  fuse: move rb_erase() before tree_insert()
2020-07-17 10:36:19 -07:00
Linus Torvalds
8882572675 Merge tag 'drm-fixes-2020-07-17-1' of git://anongit.freedesktop.org/drm/drm into master
Pull drm fixes from Dave Airlie:
 "Weekly fixes pull, big bigger than I'd normally like, but they are
  fairly scattered and small individually.

  The vmwgfx one is a black screen regression, otherwise the largest is
  an MST encoder fix for amdgpu which results in a WARN in some cases,
  and a scattering of i915 fixes.

  I'm tracking two regressions at the moment that hopefully we get
  nailed down this week for rc7.

  dma-buf:
   - sleeping atomic fix

  amdgpu:
   - Fix a race condition with KIQ
   - Preemption fix
   - Fix handling of fake MST encoders
   - OLED panel fix
   - Handle allocation failure in stream construction
   - Renoir SMC fix
   - SDMA 5.x fix

  i915:
   - FBC w/a stride fix
   - Fix use-after-free fix on module reload
   - Ignore irq enabling on the virtual engines to fix device sleep
   - Use GTT when saving/restoring engine GPR
   - Fix selftest sort function

  vmwgfx:
   - black screen fix

  aspeed:
   - fbcon init warn fix"

* tag 'drm-fixes-2020-07-17-1' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu/sdma5: fix wptr overwritten in ->get_wptr()
  drm/amdgpu/powerplay: Modify SMC message name for setting power profile mode
  drm/amd/display: handle failed allocation during stream construction
  drm/amd/display: OLED panel backlight adjust not work with external display connected
  drm/amdgpu/display: create fake mst encoders ahead of time (v4)
  drm/amdgpu: fix preemption unit test
  drm/amdgpu/gfx10: fix race condition for kiq
  drm/i915: Recalculate FBC w/a stride when needed
  drm/i915: Move cec_notifier to intel_hdmi_connector_unregister, v2.
  drm/i915/gt: Only swap to a random sibling once upon creation
  drm/i915/gt: Ignore irq enabling on the virtual engines
  drm/i915/perf: Use GTT when saving/restoring engine GPR
  drm/i915/selftests: Fix compare functions provided for sorting
  drm/vmwgfx: fix update of display surface when resolution changes
  dmabuf: use spinlock to access dmabuf->name
  drm/aspeed: Call drm_fbdev_generic_setup after drm_dev_register
2020-07-16 21:39:51 -07:00
Alexey Dobriyan
a9232dc560 rwsem: fix commas in initialisation
Leading comma prevents arbitrary reordering of initialisation clauses.
The whole point of C99 initialisation is to allow any such reordering.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200711145954.GA1178171@localhost.localdomain
2020-07-16 23:19:51 +02:00
Herbert Xu
5be542e945 lockdep: Move list.h inclusion into lockdep.h
Currently lockdep_types.h includes list.h without actually using any
of its macros or functions.  All it needs are the type definitions
which were moved into types.h long ago.  This potentially causes
inclusion loops because both are included by many core header
files.

This patch moves the list.h inclusion into lockdep.h.  Note that
we could probably remove it completely but that could potentially
result in compile failures should any end users not include list.h
directly and also be unlucky enough to not get list.h via some other
header file.

Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20200716063649.GA23065@gondor.apana.org.au
2020-07-16 23:19:51 +02:00
Linus Torvalds
50ad1c2155 Merge tag 'driver-core-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into master
Pull driver core fixes from Greg KH:
 "Here are 3 driver core fixes for 5.8-rc6.

  They resolve some issues found with the deferred probe code for some
  types of devices on some embedded systems. They have been tested a
  bunch and have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Avoid deferred probe due to fw_devlink_pause/resume()
  driver core: Rename dev_links_info.defer_sync to defer_hook
  driver core: Don't do deferred probe in parallel with kernel_init thread
2020-07-16 11:20:23 -07:00
Linus Torvalds
7531ee3147 Merge tag 'tty-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty into master
Pull tty/serial driver fixes from Greg KH:
 :Here are some small tty and serial driver fixes for 5.8-rc6.

  The largest set of patches in here is a revert of the sysrq changes
  that went into 5.8-rc1 but turned out to cause a noticable overhead
  and cpu usage.

  Other than that, there's a few small serial driver fixes to resolve
  reported issues, and finally resolving the spinlock init problem on
  many serial driver consoles.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: core: Initialise spin lock before use in uart_configure_port()
  serial: mxs-auart: add missed iounmap() in probe failure and remove
  serial: sh-sci: Initialize spinlock for uart console
  Revert "tty: xilinx_uartps: Fix missing id assignment to the console"
  serial: core: drop redundant sysrq checks
  serial: core: fix sysrq overhead regression
  Revert "serial: core: Refactor uart_unlock_and_check_sysrq()"
  tty/serial: fix serial_core.c kernel-doc warnings
  tty: serial: cpm_uart: Fix behaviour for non existing GPIOs
2020-07-16 11:10:27 -07:00
Dave Airlie
03d54ef0a1 Merge tag 'drm-misc-fixes-2020-07-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
* aspeed: setup fbdev console after registering device; avoids warning
   and stacktrace in dmesg log
 * dmabuf: protect dmabuf->name with a spinlock; avoids sleeping in
   atomic context

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200715171756.GA18606@linux-uq9g
2020-07-16 10:09:59 +10:00
Nicolas Saenz Julienne
567f6a6eba dma-direct: provide function to check physical memory area validity
dma_coherent_ok() checks if a physical memory area fits a device's DMA
constraints.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-14 15:42:24 +02:00
Miklos Szeredi
b330966f79 fuse: reject options on reconfigure via fsconfig(2)
Previous patch changed handling of remount/reconfigure to ignore all
options, including those that are unknown to the fuse kernel fs.  This was
done for backward compatibility, but this likely only affects the old
mount(2) API.

The new fsconfig(2) based reconfiguration could possibly be improved.  This
would make the new API less of a drop in replacement for the old, OTOH this
is a good chance to get rid of some weirdnesses in the old API.

Several other behaviors might make sense:

 1) unknown options are rejected, known options are ignored

 2) unknown options are rejected, known options are rejected if the value
 is changed, allowed otherwise

 3) all options are rejected

Prior to the backward compatibility fix to ignore all options all known
options were accepted (1), even if they change the value of a mount
parameter; fuse_reconfigure() does not look at the config values set by
fuse_parse_param().

To fix that we'd need to verify that the value provided is the same as set
in the initial configuration (2).  The major drawback is that this is much
more complex than just rejecting all attempts at changing options (3);
i.e. all options signify initial configuration values and don't make sense
on reconfigure.

This patch opts for (3) with the rationale that no mount options are
reconfigurable in fuse.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14 14:45:41 +02:00
Linus Torvalds
e9919e11e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "A few quirks for the Elan touchpad driver, another Thinkpad is being
  switched over from PS/2 to native RMI4 interface, and we gave a brand
  new SW_MACHINE_COVER switch definition"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - add more hardware ID for Lenovo laptops
  Input: i8042 - add Lenovo XiaoXin Air 12 to i8042 nomux list
  Revert "Input: elants_i2c - report resolution information for touch major"
  Input: elan_i2c - only increment wakeup count on touch
  Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen
  ARM: dts: n900: remove mmc1 card detect gpio
  Input: add `SW_MACHINE_COVER`
2020-07-13 18:31:15 -07:00
Linus Torvalds
9901a6bd15 Merge tag 'riscv-for-linus-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
 "I have a few KGDB-related fixes. They're mostly fixes for build
  warnings, but there's also:

   - Support for the qSupported and qXfer packets, which are necessary
     to pass around GDB XML information which we need for the RISC-V GDB
     port to fully function.

   - Users can now select STRICT_KERNEL_RWX instead of forcing it on"

* tag 'riscv-for-linus-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Avoid kgdb.h including gdb_xml.h to solve unused-const-variable warning
  kgdb: Move the extern declaration kgdb_has_hit_break() to generic kgdb.h
  riscv: Fix "no previous prototype" compile warning in kgdb.c file
  riscv: enable the Kconfig prompt of STRICT_KERNEL_RWX
  kgdb: enable arch to support XML packet.
2020-07-11 19:22:46 -07:00
Linus Torvalds
5a764898af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
    BPF programs, from Maciej Żenczykowski.

 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
    Mariappan.

 3) Slay memory leak in nl80211 bss color attribute parsing code, from
    Luca Coelho.

 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.

 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
    Dumazet.

 6) xsk code dips too deeply into DMA mapping implementation internals.
    Add dma_need_sync and use it. From Christoph Hellwig

 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.

 8) Check for disallowed attributes when loading flow dissector BPF
    programs. From Lorenz Bauer.

 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
    Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
    it. From Alex Elder.

11) Resolve several issues in TCP MD5 signature support. Missing memory
    barriers, bogus options emitted when using syncookies, and failure
    to allow md5 key changes in established states. All from Eric
    Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
    Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast code,
    from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
    currently only works for one level of VLAN encapsulation. Allow
    Q-in-Q situations to work properly here, from Toke
    Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
    support and various protocols. From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
    Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
    Eli Britstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  mlxsw: pci: Fix use-after-free in case of failed devlink reload
  mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
  net: macb: fix call to pm_runtime in the suspend/resume functions
  net: macb: fix macb_suspend() by removing call to netif_carrier_off()
  net: macb: fix macb_get/set_wol() when moving to phylink
  net: macb: mark device wake capable when "magic-packet" property present
  net: macb: fix wakeup test in runtime suspend/resume routines
  bnxt_en: fix NULL dereference in case SR-IOV configuration fails
  libbpf: Fix libbpf hashmap on (I)LP32 architectures
  net/mlx5e: CT: Fix memory leak in cleanup
  net/mlx5e: Fix port buffers cell size value
  net/mlx5e: Fix 50G per lane indication
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix usage of rcu-protected pointer
  net/mxl5e: Verify that rpriv is not NULL
  net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  net/mlx5: Fix eeprom support for SFP module
  cgroup: Fix sock_cgroup_data on big-endian.
  selftests: bpf: Fix detach from sockmap tests
  ...
2020-07-10 18:16:22 -07:00
David S. Miller
ca68d5637a Merge tag 'mlx5-fixes-2020-07-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2020-07-02

This series introduces some fixes to mlx5 driver.

V1->v2:
 - Drop "ip -s" patch and mirred device hold reference patch.
 - Will revise them in a later submission.

Please pull and let me know if there is any problem.

For -stable v5.2
 ('net/mlx5: Fix eeprom support for SFP module')

For -stable v5.4
 ('net/mlx5e: Fix 50G per lane indication')

For -stable v5.5
 ('net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash')
 ('net/mlx5e: Fix VXLAN configuration restore after function reload')

For -stable v5.7
 ('net/mlx5e: CT: Fix memory leak in cleanup')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:02:01 -07:00
Linus Torvalds
b1b11d0063 Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc
Pull in-kernel read and write op cleanups from Christoph Hellwig:
 "Cleanup in-kernel read and write operations

  Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
  all users of in-kernel file I/O use them if they don't use iov_iter
  based methods already.

  The new WARN_ONs in combination with syzcaller already found a missing
  input validation in 9p. The fix should be on your way through the
  maintainer ASAP".

[ This is prep-work for the real changes coming 5.9 ]

* tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc:
  fs: remove __vfs_read
  fs: implement kernel_read using __kernel_read
  integrity/ima: switch to using __kernel_read
  fs: add a __kernel_read helper
  fs: remove __vfs_write
  fs: implement kernel_write using __kernel_write
  fs: check FMODE_WRITE in __kernel_write
  fs: unexport __kernel_write
  bpfilter: switch to kernel_write
  autofs: switch to kernel_write
  cachefiles: switch to kernel_write
2020-07-10 09:45:15 -07:00
Linus Torvalds
1bfde03742 Merge tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:

 - add a warning when the atomic pool is depleted (David Rientjes)

 - protect the parameters of the new scatterlist helper macros (Marek
   Szyprowski )

* tag 'dma-mapping-5.8-5' of git://git.infradead.org/users/hch/dma-mapping:
  scatterlist: protect parameters of the sg_table related macros
  dma-mapping: warn when coherent pool is depleted
2020-07-10 09:36:03 -07:00
Linus Torvalds
d02b0478c1 Merge tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
 "Fix gfs2 readahead deadlocks by adding a IOCB_NOIO flag that allows
  gfs2 to use the generic fiel read iterator functions without having to
  worry about being called back while holding locks".

* tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Rework read and page fault locking
  fs: Add IOCB_NOIO flag for generic_file_read_iter
2020-07-10 08:53:21 -07:00
Saravana Kannan
2451e74647 driver core: Avoid deferred probe due to fw_devlink_pause/resume()
With the earlier patch in this series, all devices that deferred probe
due to fw_devlink_pause() will have their probes delayed till the
deferred probe thread is kicked off during late_initcall. This will also
affect all their consumers.

This delayed probing in unnecessary. So this patch just keeps track of
the devices that had their probe deferred due to fw_devlink_pause() and
attempts to probe them once during fw_devlink_resume().

Fixes: 716a7a2596 ("driver core: fw_devlink: Add support for batching fwnode parsing")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200701194259.3337652-4-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 15:20:38 +02:00
Saravana Kannan
ec7bd78498 driver core: Rename dev_links_info.defer_sync to defer_hook
The defer_sync field is used as a hook to add the device to the
deferred_sync list. Rename it so that it's more meaningful for the next
patch that'll also use this field as a hook to a deferred_fw_devlink
list.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200701194259.3337652-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 15:20:38 +02:00
Charan Teja Kalla
6348dd291e dmabuf: use spinlock to access dmabuf->name
There exists a sleep-while-atomic bug while accessing the dmabuf->name
under mutex in the dmabuffs_dname(). This is caused from the SELinux
permissions checks on a process where it tries to validate the inherited
files from fork() by traversing them through iterate_fd() (which
traverse files under spin_lock) and call
match_file(security/selinux/hooks.c) where the permission checks happen.
This audit information is logged using dump_common_audit_data() where it
calls d_path() to get the file path name. If the file check happen on
the dmabuf's fd, then it ends up in ->dmabuffs_dname() and use mutex to
access dmabuf->name. The flow will be like below:
flush_unauthorized_files()
  iterate_fd()
    spin_lock() --> Start of the atomic section.
      match_file()
        file_has_perm()
          avc_has_perm()
            avc_audit()
              slow_avc_audit()
	        common_lsm_audit()
		  dump_common_audit_data()
		    audit_log_d_path()
		      d_path()
                        dmabuffs_dname()
                          mutex_lock()--> Sleep while atomic.

Call trace captured (on 4.19 kernels) is below:
___might_sleep+0x204/0x208
__might_sleep+0x50/0x88
__mutex_lock_common+0x5c/0x1068
__mutex_lock_common+0x5c/0x1068
mutex_lock_nested+0x40/0x50
dmabuffs_dname+0xa0/0x170
d_path+0x84/0x290
audit_log_d_path+0x74/0x130
common_lsm_audit+0x334/0x6e8
slow_avc_audit+0xb8/0xf8
avc_has_perm+0x154/0x218
file_has_perm+0x70/0x180
match_file+0x60/0x78
iterate_fd+0x128/0x168
selinux_bprm_committing_creds+0x178/0x248
security_bprm_committing_creds+0x30/0x48
install_exec_creds+0x1c/0x68
load_elf_binary+0x3a4/0x14e0
search_binary_handler+0xb0/0x1e0

So, use spinlock to access dmabuf->name to avoid sleep-while-atomic.

Cc: <stable@vger.kernel.org> [5.3+]
Signed-off-by: Charan Teja Kalla <charante@codeaurora.org>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
 [sumits: added comment to spinlock_t definition to avoid warning]
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/a83e7f0d-4e54-9848-4b58-e1acdbe06735@codeaurora.org
2020-07-10 15:39:29 +05:30
Peter Zijlstra
f9ad4a5f3f lockdep: Remove lockdep_hardirq{s_enabled,_context}() argument
Now that the macros use per-cpu data, we no longer need the argument.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.571835311@infradead.org
2020-07-10 12:00:02 +02:00
Peter Zijlstra
a21ee6055c lockdep: Change hardirq{s_enabled,_context} to per-cpu variables
Currently all IRQ-tracking state is in task_struct, this means that
task_struct needs to be defined before we use it.

Especially for lockdep_assert_irq*() this can lead to header-hell.

Move the hardirq state into per-cpu variables to avoid the task_struct
dependency.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.512673481@infradead.org
2020-07-10 12:00:02 +02:00
Peter Zijlstra
ba1f2b2eaa x86/entry: Fix NMI vs IRQ state tracking
While the nmi_enter() users did
trace_hardirqs_{off_prepare,on_finish}() there was no matching
lockdep_hardirqs_*() calls to complete the picture.

Introduce idtentry_{enter,exit}_nmi() to enable proper IRQ state
tracking across the NMIs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.216740948@infradead.org
2020-07-10 12:00:01 +02:00
Peter Zijlstra
e1bcad609f Merge branch 'tip/x86/entry' 2020-07-10 11:59:59 +02:00
Vincent Chen
def0aa218e kgdb: Move the extern declaration kgdb_has_hit_break() to generic kgdb.h
Currently, only riscv kgdb.c uses the kgdb_has_hit_break() to identify
the kgdb breakpoint. It causes other architectures will encounter the "no
previous prototype" warnings if the compile option has W=1. Moving the
declaration of extern kgdb_has_hit_break() from risc-v kgdb.h to generic
kgdb.h to avoid generating these warnings.

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-07-09 20:12:19 -07:00
Vincent Chen
8c080d3a97 kgdb: enable arch to support XML packet.
The XML packet could be supported by required architecture if the
architecture defines CONFIG_HAVE_ARCH_KGDB_QXFER_PKT and implement its own
kgdb_arch_handle_qxfer_pkt(). Except for the kgdb_arch_handle_qxfer_pkt(),
the architecture also needs to record the feature supported by gdb stub
into the kgdb_arch_gdb_stub_feature, and these features will be reported
to host gdb when gdb stub receives the qSupported packet.

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-07-09 20:09:28 -07:00
Eran Ben Elisha
88b3d5c90e net/mlx5e: Fix port buffers cell size value
Device unit for port buffers size, xoff_threshold and xon_threshold is
cells. Fix a bug in driver where cell unit size was hard-coded to
128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
versions.

Driver to read cell size from SBCAM register and translate bytes to cell
units accordingly.

In order to fix the bug, this patch exposes SBCAM (Shared buffer
capabilities mask) layout and defines.

If SBCAM.cap_cell_size is valid, use it for all bytes to cells
calculations. If not valid, fallback to 128.

Cell size do not change on the fly per device. Instead of issuing SBCAM
access reg command every time such translation is needed, cache it in
mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
as a param to every function that needs bytes to cells translation.

While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
en_dcbnl.c, as it is only used by that file.

Fixes: 0696d60853 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-09 19:27:07 -07:00
Cong Wang
14b032b8f8 cgroup: Fix sock_cgroup_data on big-endian.
In order for no_refcnt and is_data to be the lowest order two
bits in the 'val' we have to pad out the bitfield of the u8.

Fixes: ad0f75e5f5 ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 16:28:44 -07:00
Linus Torvalds
ce69fb3b39 Merge tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kallsyms fix from Kees Cook:
 "Refactor kallsyms_show_value() users for correct cred.

  I'm not delighted by the timing of getting these changes to you, but
  it does fix a handful of kernel address exposures, and no one has
  screamed yet at the patches.

  Several users of kallsyms_show_value() were performing checks not
  during "open". Refactor everything needed to gain proper checks
  against file->f_cred for modules, kprobes, and bpf"

* tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests: kmod: Add module address visibility test
  bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
  kprobes: Do not expose probe addresses to non-CAP_SYSLOG
  module: Do not expose section addresses to non-CAP_SYSLOG
  module: Refactor section attr into bin attribute
  kallsyms: Refactor kallsyms_show_value() to take cred
2020-07-09 13:09:30 -07:00
Dave Wang
a50ca29523 Input: elan_i2c - add more hardware ID for Lenovo laptops
This adds more hardware IDs for Elan touchpads found in various Lenovo
laptops.

Signed-off-by: Dave Wang <dave.wang@emc.com.tw>
Link: https://lore.kernel.org/r/000201d5a8bd$9fead3f0$dfc07bd0$@emc.com.tw
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-07-08 22:31:51 -07:00
Kees Cook
6396026045 bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file->f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c0 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08 16:01:21 -07:00
Kees Cook
160251842c kallsyms: Refactor kallsyms_show_value() to take cred
In order to perform future tests against the cred saved during open(),
switch kallsyms_show_value() to operate on a cred, and have all current
callers pass current_cred(). This makes it very obvious where callers
are checking the wrong credential in their "read" contexts. These will
be fixed in the coming patches.

Additionally switch return value to bool, since it is always used as a
direct permission check, not a 0-on-success, negative-on-error style
function return.

Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08 15:59:57 -07:00
Linus Torvalds
6ec4476ac8 Raise gcc version requirement to 4.9
I realize that we fairly recently raised it to 4.8, but the fact is, 4.9
is a much better minimum version to target.

We have a number of workarounds for actual bugs in pre-4.9 gcc versions
(including things like internal compiler errors on ARM), but we also
have some syntactic workarounds for lacking features.

In particular, raising the minimum to 4.9 means that we can now just
assume _Generic() exists, which is likely the much better replacement
for a lot of very convoluted built-time magic with conditionals on
sizeof and/or __builtin_choose_expr() with same_type() etc.

Using _Generic also means that you will need to have a very recent
version of 'sparse', but thats easy to build yourself, and much less of
a hassle than some old gcc version can be.

The latest (in a long string) of reasons for minimum compiler version
upgrades was commit 5435f73d5c ("efi/x86: Fix build with gcc 4").

Ard points out that RHEL 7 uses gcc-4.8, but the people who stay back on
old RHEL versions persumably also don't build their own kernels anyway.
And maybe they should cross-built or just have a little side affair with
a newer compiler?

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-08 10:48:35 -07:00
Peter Zijlstra
dbfb089d36 sched: Fix loadavg accounting race
The recent commit:

  c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")

moved these lines in ttwu():

	p->sched_contributes_to_load = !!task_contributes_to_load(p);
	p->state = TASK_WAKING;

up before:

	smp_cond_load_acquire(&p->on_cpu, !VAL);

into the 'p->on_rq == 0' block, with the thinking that once we hit
schedule() the current task cannot change it's ->state anymore. And
while this is true, it is both incorrect and flawed.

It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0'
to avoid weak hardware from re-ordering things for us. This can fairly
easily be achieved by relying on the control-dependency already in
place.

The second problem, which makes the flaw in the original argument, is
that while schedule() will not change prev->state, it will read it a
number of times (arguably too many times since it's marked volatile).
The previous condition 'p->on_cpu == 0' was sufficient because that
indicates schedule() has completed, and will no longer read
prev->state. So now the trick is to make this same true for the (much)
earlier 'prev->on_rq == 0' case.

Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0'
assignment needs to he a RELEASE, but adding additional ordering to
schedule() is an unwelcome proposition at the best of times, doubly so
for mere accounting.

Luckily we can push the prev->state load up before rq->lock, with the
only caveat that we then have to re-read the state after. However, we
know that if it changed, we no longer have to worry about the blocking
path. This gives us the required ordering, if we block, we did the
prev->state load before an (effective) smp_mb() and the p->on_rq store
needs not change.

With this we end up with the effective ordering:

	LOAD p->state           LOAD-ACQUIRE p->on_rq == 0
	MB
	STORE p->on_rq, 0       STORE p->state, TASK_WAKING

which ensures the TASK_WAKING store happens after the prev->state
load, and all is well again.

Fixes: c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net
2020-07-08 11:38:49 +02:00
Christoph Hellwig
775802c057 fs: remove __vfs_read
Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:57 +02:00
Christoph Hellwig
61a707c543 fs: add a __kernel_read helper
This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Toke Høiland-Jørgensen
469aceddfa vlan: consolidate VLAN parsing code and limit max parsing depth
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.

Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
switch over __vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.

v2:
- Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)

Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d7bf2ebebc ("sched: consistently handle layer3 header accesses in the presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-07 15:48:38 -07:00
Andreas Gruenbacher
41da51bce3 fs: Add IOCB_NOIO flag for generic_file_read_iter
Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-07-07 23:40:08 +02:00
Cong Wang
ad0f75e5f5 cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-07 13:34:11 -07:00