While commit 097b9146c0e2 ("net: fix up truesize of cloned
skb in skb_prepare_for_shift()") fixed immediate issues found
when KFENCE was enabled/tested, there are still similar issues,
when tcp_trim_head() hits KFENCE while the master skb
is cloned.
This happens under heavy networking TX workloads,
when the TX completion might be delayed after incoming ACK.
This patch fixes the WARNING in sk_stream_kill_queues
when sk->sk_mem_queued/sk->sk_forward_alloc are not zero.
Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20211102004555.1359210-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit c4777efa751d293e369aec464ce6875e957be255)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I5e456705bd01396c05c79009aeba36e00829e037
Changes in 5.10.68
drm/bridge: lt9611: Fix handling of 4k panels
btrfs: fix upper limit for max_inline for page size 64K
io_uring: ensure symmetry in handling iter types in loop_rw_iter()
xen: reset legacy rtc flag for PV domU
bnx2x: Fix enabling network interfaces without VFs
arm64/sve: Use correct size when reinitialising SVE state
PM: base: power: don't try to use non-existing RTC for storing data
PCI: Add AMD GPU multi-function power dependencies
drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10
drm/etnaviv: return context from etnaviv_iommu_context_get
drm/etnaviv: put submit prev MMU context when it exists
drm/etnaviv: stop abusing mmu_context as FE running marker
drm/etnaviv: keep MMU context across runtime suspend/resume
drm/etnaviv: exec and MMU state is lost when resetting the GPU
drm/etnaviv: fix MMU context leak on GPU reset
drm/etnaviv: reference MMU context when setting up hardware state
drm/etnaviv: add missing MMU context put when reaping MMU mapping
s390/sclp: fix Secure-IPL facility detection
x86/pat: Pass valid address to sanitize_phys()
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
tipc: fix an use-after-free issue in tipc_recvmsg
ethtool: Fix rxnfc copy to user buffer overflow
net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
net-caif: avoid user-triggerable WARN_ON(1)
ptp: dp83640: don't define PAGE0
dccp: don't duplicate ccid when cloning dccp sock
net/l2tp: Fix reference count leak in l2tp_udp_recv_core
r6040: Restore MDIO clock frequency after MAC reset
tipc: increase timeout in tipc_sk_enqueue()
drm/rockchip: cdn-dp-core: Make cdn_dp_core_resume __maybe_unused
perf machine: Initialize srcline string member in add_location struct
net/mlx5: FWTrace, cancel work on alloc pd error flow
net/mlx5: Fix potential sleeping in atomic context
nvme-tcp: fix io_work priority inversion
events: Reuse value read using READ_ONCE instead of re-reading it
net: ipa: initialize all filter table slots
gen_compile_commands: fix missing 'sys' package
vhost_net: fix OoB on sendmsg() failure.
net/af_unix: fix a data-race in unix_dgram_poll
net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
x86/uaccess: Fix 32-bit __get_user_asm_u64() when CC_HAS_ASM_GOTO_OUTPUT=y
tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
selftest: net: fix typo in altname test
qed: Handle management FW error
udp_tunnel: Fix udp_tunnel_nic work-queue type
dt-bindings: arm: Fix Toradex compatible typo
ibmvnic: check failover_pending in login response
KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
bnxt_en: make bnxt_free_skbs() safe to call after bnxt_free_mem()
net: hns3: pad the short tunnel frame before sending to hardware
net: hns3: change affinity_mask to numa node range
net: hns3: disable mac in flr process
net: hns3: fix the timing issue of VF clearing interrupt sources
mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation
mfd: db8500-prcmu: Adjust map to reality
PCI: Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms
fuse: fix use after free in fuse_read_interrupt()
PCI: tegra194: Fix handling BME_CHGED event
PCI: tegra194: Fix MSI-X programming
PCI: tegra: Fix OF node reference leak
mfd: Don't use irq_create_mapping() to resolve a mapping
PCI: rcar: Fix runtime PM imbalance in rcar_pcie_ep_probe()
tracing/probes: Reject events which have the same name of existing one
PCI: cadence: Use bitfield for *quirk_retrain_flag* instead of bool
PCI: cadence: Add quirk flag to set minimum delay in LTSSM Detect.Quiet state
PCI: j721e: Add PCIe support for J7200
PCI: j721e: Add PCIe support for AM64
PCI: Add ACS quirks for Cavium multi-function devices
watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate
octeontx2-af: Add additional register check to rvu_poll_reg()
Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920
block, bfq: honor already-setup queue merges
PCI: ibmphp: Fix double unmap of io_mem
ethtool: Fix an error code in cxgb2.c
NTB: Fix an error code in ntb_msit_probe()
NTB: perf: Fix an error code in perf_setup_inbuf()
s390/bpf: Fix optimizing out zero-extensions
s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
s390/bpf: Fix branch shortening during codegen pass
mfd: axp20x: Update AXP288 volatile ranges
backlight: ktd253: Stabilize backlight
PCI: of: Don't fail devm_pci_alloc_host_bridge() on missing 'ranges'
PCI: iproc: Fix BCMA probe resource handling
netfilter: Fix fall-through warnings for Clang
netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
PCI: Fix pci_dev_str_match_path() alloc while atomic bug
mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
tracing/boot: Fix a hist trigger dependency for boot time tracing
mtd: mtdconcat: Judge callback existence based on the master
mtd: mtdconcat: Check _read, _write callbacks existence before assignment
KVM: arm64: Fix read-side race on updates to vcpu reset state
KVM: arm64: Handle PSCI resets before userspace touches vCPU state
PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n
mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
ARC: export clear_user_page() for modules
perf unwind: Do not overwrite FEATURE_CHECK_LDFLAGS-libunwind-{x86,aarch64}
perf bench inject-buildid: Handle writen() errors
gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
net: dsa: tag_rtl4_a: Fix egress tags
selftests: mptcp: clean tmp files in simult_flows
net: hso: add failure handler for add_net_device
net: dsa: b53: Fix calculating number of switch ports
net: dsa: b53: Set correct number of ports in the DSA struct
netfilter: socket: icmp6: fix use-after-scope
fq_codel: reject silly quantum parameters
qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
ip_gre: validate csum_start only on pull
net: dsa: b53: Fix IMP port setup on BCM5301x
bnxt_en: fix stored FW_PSID version masks
bnxt_en: Fix asic.rev in devlink dev info command
bnxt_en: log firmware debug notifications
bnxt_en: Consolidate firmware reset event logging.
bnxt_en: Convert to use netif_level() helpers.
bnxt_en: Improve logging of error recovery settings information.
bnxt_en: Fix possible unintended driver initiated error recovery
mfd: lpc_sch: Partially revert "Add support for Intel Quark X1000"
mfd: lpc_sch: Rename GPIOBASE to prevent build error
net: renesas: sh_eth: Fix freeing wrong tx descriptor
x86/mce: Avoid infinite loop for copy from user recovery
bnxt_en: Fix error recovery regression
net: dsa: bcm_sf2: Fix array overrun in bcm_sf2_num_active_ports()
Linux 5.10.68
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I542f48f8de516dcabce91d3d399583483aba0da7
commit 04f08eb44b5011493d77b602fdec29ff0f5c6cd5 upstream.
syzbot reported another data-race in af_unix [1]
Lets change __skb_insert() to use WRITE_ONCE() when changing
skb head qlen.
Also, change unix_dgram_poll() to use lockless version
of unix_recvq_full()
It is verry possible we can switch all/most unix_recvq_full()
to the lockless version, this will be done in a future kernel version.
[1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1
BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll
write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0:
__skb_insert include/linux/skbuff.h:1938 [inline]
__skb_queue_before include/linux/skbuff.h:2043 [inline]
__skb_queue_tail include/linux/skbuff.h:2076 [inline]
skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264
unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg net/socket.c:723 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
___sys_sendmsg net/socket.c:2446 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2532
__do_sys_sendmmsg net/socket.c:2561 [inline]
__se_sys_sendmmsg net/socket.c:2558 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1:
skb_queue_len include/linux/skbuff.h:1869 [inline]
unix_recvq_full net/unix/af_unix.c:194 [inline]
unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777
sock_poll+0x23e/0x260 net/socket.c:1288
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll fs/eventpoll.c:846 [inline]
ep_send_events fs/eventpoll.c:1683 [inline]
ep_poll fs/eventpoll.c:1798 [inline]
do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226
__do_sys_epoll_wait fs/eventpoll.c:2238 [inline]
__se_sys_epoll_wait fs/eventpoll.c:2233 [inline]
__x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000001b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 86b18aaa2b ("skbuff: fix a data race in skb_queue_len()")
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.64
igmp: Add ip_mc_list lock in ip_check_mc_rcu
USB: serial: mos7720: improve OOM-handling in read_mos_reg()
net: ll_temac: Remove left-over debug message
mm/page_alloc: speed up the iteration of max_order
net: kcov: don't select SKB_EXTENSIONS when there is no NET
serial: 8250: 8250_omap: Fix unused variable warning
net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling
tty: drop termiox user definitions
Revert "r8169: avoid link-up interrupt issue on RTL8106e if user enables ASPM"
x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
blk-mq: fix kernel panic during iterating over flush request
blk-mq: fix is_flush_rq
netfilter: nftables: avoid potential overflows on 32bit arches
netfilter: nf_tables: initialize set before expression setup
netfilter: nftables: clone set element expression template
blk-mq: clearing flush request reference in tags->rqs[]
ALSA: usb-audio: Add registration quirk for JBL Quantum 800
usb: host: xhci-rcar: Don't reload firmware after the completion
usb: gadget: tegra-xudc: fix the wrong mult value for HS isoc or intr
usb: mtu3: restore HS function when set SS/SSP
usb: mtu3: use @mult for HS isoc or intr
usb: mtu3: fix the wrong HS mult value
xhci: fix even more unsafe memory usage in xhci tracing
xhci: fix unsafe memory usage in xhci tracing
x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions
PCI: Call Max Payload Size-related fixup quirks early
Linux 5.10.64
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2269075a6d5eb6121b6e42a28d4f3fd0c252695c
commit 97f53a08cba128a724ebbbf34778d3553d559816 upstream.
The previous Kconfig patch led to some other build errors as
reported by the 0day bot and my own overnight build testing.
These are all in <linux/skbuff.h> when KCOV is enabled but
SKB_EXTENSIONS is not enabled, so fix those by combining those conditions
in the header file.
Fixes: 6370cc3bbd8a ("net: add kcov handle to skb extensions")
Fixes: 85ce50d337d1 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116212108.32465-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.54
igc: Fix use-after-free error during reset
igb: Fix use-after-free error during reset
igc: change default return of igc_read_phy_reg()
ixgbe: Fix an error handling path in 'ixgbe_probe()'
igc: Fix an error handling path in 'igc_probe()'
igb: Fix an error handling path in 'igb_probe()'
fm10k: Fix an error handling path in 'fm10k_probe()'
e1000e: Fix an error handling path in 'e1000_probe()'
iavf: Fix an error handling path in 'iavf_probe()'
igb: Check if num of q_vectors is smaller than max before array access
igb: Fix position of assignment to *ring
gve: Fix an error handling path in 'gve_probe()'
net: add kcov handle to skb extensions
bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
bonding: fix null dereference in bond_ipsec_add_sa()
ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
bonding: disallow setting nested bonding + ipsec offload
bonding: Add struct bond_ipesc to manage SA
bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
bonding: fix incorrect return value of bond_ipsec_offload_ok()
ipv6: fix 'disable_policy' for fwd packets
stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
selftests: icmp_redirect: remove from checking for IPv6 route get
selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
cxgb4: fix IRQ free race during driver unload
mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join
nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
perf inject: Fix dso->nsinfo refcounting
perf map: Fix dso->nsinfo refcounting
perf probe: Fix dso->nsinfo refcounting
perf env: Fix sibling_dies memory leak
perf test session_topology: Delete session->evlist
perf test event_update: Fix memory leak of evlist
perf dso: Fix memory leak in dso__new_map()
perf test maps__merge_in: Fix memory leak of maps
perf env: Fix memory leak of cpu_pmu_caps
perf report: Free generated help strings for sort option
perf script: Fix memory 'threads' and 'cpus' leaks on exit
perf lzma: Close lzma stream on exit
perf probe-file: Delete namelist in del_events() on the error path
perf data: Close all files in close_dir()
perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
ASoC: wm_adsp: Correct wm_coeff_tlv_get handling
spi: imx: add a check for speed_hz before calculating the clock
spi: stm32: fixes pm_runtime calls in probe/remove
regulator: hi6421: Use correct variable type for regmap api val argument
regulator: hi6421: Fix getting wrong drvdata
spi: mediatek: fix fifo rx mode
ASoC: rt5631: Fix regcache sync errors on resume
bpf, test: fix NULL pointer dereference on invalid expected_attach_type
bpf: Fix tail_call_reachable rejection for interpreter when jit failed
xdp, net: Fix use-after-free in bpf_xdp_link_release
timers: Fix get_next_timer_interrupt() with no timers pending
liquidio: Fix unintentional sign extension issue on left shift of u16
s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
bpf, sockmap: Fix potential memory leak on unlikely error case
bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats
bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats
bpftool: Check malloc return value in mount_bpffs_for_pin
net: fix uninit-value in caif_seqpkt_sendmsg
usb: hso: fix error handling code of hso_create_net_device
dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
efi/tpm: Differentiate missing and invalid final event log table.
net: decnet: Fix sleeping inside in af_decnet
KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
net: sched: fix memory leak in tcindex_partial_destroy_work
sctp: trim optlen when it's a huge value in sctp_setsockopt
netrom: Decrease sock refcount when sock timers expire
scsi: iscsi: Fix iface sysfs attr detection
scsi: target: Fix protect handling in WRITE SAME(32)
spi: cadence: Correct initialisation of runtime PM again
ACPI: Kconfig: Fix table override from built-in initrd
bnxt_en: don't disable an already disabled PCI device
bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe()
bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task()
bnxt_en: Validate vlan protocol ID on RX packets
bnxt_en: Check abort error state in bnxt_half_open_nic()
net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition
net/tcp_fastopen: fix data races around tfo_active_disable_stamp
ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
net: hns3: fix possible mismatches resp of mailbox
net: hns3: fix rx VLAN offload state inconsistent issue
spi: spi-bcm2835: Fix deadlock
net/sched: act_skbmod: Skip non-Ethernet packets
ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
ceph: don't WARN if we're still opening a session to an MDS
nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
afs: Fix tracepoint string placement with built-in AFS
r8169: Avoid duplicate sysfs entry creation error
nvme: set the PRACT bit when using Write Zeroes with T10 PI
sctp: update active_key for asoc when old key is being replaced
tcp: disable TFO blackhole logic by default
net: dsa: sja1105: make VID 4095 a bridge VLAN too
net: sched: cls_api: Fix the the wrong parameter
drm/panel: raspberrypi-touchscreen: Prevent double-free
cifs: only write 64kb at a time when fallocating a small region of a file
cifs: fix fallocate when trying to allocate a hole.
proc: Avoid mixing integer types in mem_rw()
mmc: core: Don't allocate IDA for OF aliases
s390/ftrace: fix ftrace_update_ftrace_func implementation
s390/boot: fix use of expolines in the DMA code
ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
ALSA: sb: Fix potential ABBA deadlock in CSP driver
ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
ALSA: hdmi: Expose all pins on MSI MS-7C94 board
ALSA: pcm: Call substream ack() method upon compat mmap commit
ALSA: pcm: Fix mmap capability check
Revert "usb: renesas-xhci: Fix handling of unknown ROM state"
usb: xhci: avoid renesas_usb_fw.mem when it's unusable
xhci: Fix lost USB 2 remote wake
KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
usb: hub: Fix link power management max exit latency (MEL) calculations
USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
usb: max-3421: Prevent corruption of freed memory
usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
USB: serial: option: add support for u-blox LARA-R6 family
USB: serial: cp210x: fix comments for GE CS1000
USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
usb: typec: stusb160x: register role switch before interrupt registration
firmware/efi: Tell memblock about EFI iomem reservations
tracepoints: Update static_call before tp_funcs when adding a tracepoint
tracing/histogram: Rename "cpu" to "common_cpu"
tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
tracing: Synthetic event field_pos is an index not a boolean
btrfs: check for missing device in btrfs_trim_fs
media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
ixgbe: Fix packet corruption due to missing DMA sync
bus: mhi: core: Validate channel ID when processing command completions
posix-cpu-timers: Fix rearm racing against process tick
selftest: use mmap instead of posix_memalign to allocate memory
io_uring: explicitly count entries for poll reqs
io_uring: remove double poll entry on arm failure
userfaultfd: do not untag user pointers
memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
hugetlbfs: fix mount mode command line processing
rbd: don't hold lock_rwsem while running_list is being drained
rbd: always kick acquire on "acquired" and "released" notifications
misc: eeprom: at24: Always append device id even if label property is set.
nds32: fix up stack guard gap
driver core: Prevent warning when removing a device link from unregistered consumer
drm: Return -ENOTTY for non-drm ioctls
drm/amdgpu: update golden setting for sienna_cichlid
net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
PCI: Mark AMD Navi14 GPU ATS as broken
bonding: fix build issue
skbuff: Release nfct refcount on napi stolen or re-used skbs
Documentation: Fix intiramfs script name
perf inject: Close inject.output on exit
usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
drm/i915/gvt: Clear d3_entered on elsp cmd submission.
sfc: ensure correct number of XDP queues
xhci: add xhci_get_virt_ep() helper
skbuff: Fix build with SKB extensions disabled
Linux 5.10.54
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e
[ Upstream commit 6370cc3bbd8a0f9bf975b013781243ab147876c6 ]
Remote KCOV coverage collection enables coverage-guided fuzzing of the
code that is not reachable during normal system call execution. It is
especially helpful for fuzzing networking subsystems, where it is
common to perform packet handling in separate work queues even for the
packets that originated directly from the user space.
Enable coverage-guided frame injection by adding kcov remote handle to
skb extensions. Default initialization in __alloc_skb and
__build_skb_around ensures that no socket buffer that was generated
during a system call will be missed.
Code that is of interest and that performs packet processing should be
annotated with kcov_remote_start()/kcov_remote_stop().
An alternative approach is to determine kcov_handle solely on the
basis of the device/interface that received the specific socket
buffer. However, in this case it would be impossible to distinguish
between packets that originated during normal background network
processes or were intentionally injected from the user space.
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Some vendors want to add things to 'struct skb_shared_info', so give
them an array to place their data.
Bug: 171013716
Signed-off-by: Vignesh Saravanaperumal <vignesh1.s@samsung.com>
Change-Id: Ia0024e3e8de89f4ef335fa26208ec6c45abafb22
Try to mitigate potential future driver core api changes by adding a
padding to a lot of different networking structures:
struct ipv6_devconf
struct proto_ops
struct header_ops
struct napi_struct
struct netdev_queue
struct netdev_rx_queue
struct xfrmdev_ops
struct net_device_ops
struct net_device
struct packet_type
struct sk_buff
struct tlsdev_ops
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I590f004754dbc8beafa40e71cac70a0938c38b4a
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"
[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]
* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
respectively pop and push a base Ethernet header at the beginning of a
frame.
POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
must be stripped before calling POP_ETH.
PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
addresses can be configured. The Ethertype is automatically set from
skb->protocol. These restrictions ensure that all skb's fields remain
consistent, so that this action can't confuse other part of the
networking stack (like GSO).
Since openvswitch already had these actions, consolidate the code in
skbuff.c (like for vlan and mpls push/pop).
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.
This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.
Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.
With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
Kivilinna.
2) Fix loss of RTT samples in rxrpc, from David Howells.
3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.
4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.
5) We disable BH for too lokng in sctp_get_port_local(), add a
cond_resched() here as well, from Xin Long.
6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.
7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
Yonghong Song.
8) Missing of_node_put() in mt7530 DSA driver, from Sumera
Priyadarsini.
9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.
10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.
11) Memory leak in rxkad_verify_response, from Dinghao Liu.
12) In tipc, don't use smp_processor_id() in preemptible context. From
Tuong Lien.
13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.
14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.
15) Fix ABI mismatch between driver and firmware in nfp, from Louis
Peens.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
net/smc: fix sock refcounting in case of termination
net/smc: reset sndbuf_desc if freed
net/smc: set rx_off for SMCR explicitly
net/smc: fix toleration of fake add_link messages
tg3: Fix soft lockup when tg3_reset_task() fails.
doc: net: dsa: Fix typo in config code sample
net: dp83867: Fix WoL SecureOn password
nfp: flower: fix ABI mismatch between driver and firmware
tipc: fix shutdown() of connectionless socket
ipv6: Fix sysctl max for fib_multipath_hash_policy
drivers/net/wan/hdlc: Change the default of hard_header_len to 0
net: gemini: Fix another missing clk_disable_unprepare() in probe
net: bcmgenet: fix mask check in bcmgenet_validate_flow()
amd-xgbe: Add support for new port mode
net: usb: dm9601: Add USB ID of Keenetic Plus DSL
vhost: fix typo in error message
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
pktgen: fix error message with wrong function name
net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
cxgb4: fix thermal zone device registration
...
Fix some comments, including wrong function name, duplicated word and so
on.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function consume_skb is only meaningful when tracing is enabled.
This patch makes it conditional on CONFIG_TRACEPOINTS.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull misc vfs updates from Al Viro:
"No common topic whatsoever in those, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: define inode flags using bit numbers
iov_iter: Move unnecessary inclusion of crypto/hash.h
dlmfs: clean up dlmfs_file_{read,write}() a bit
When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.
"kernel: net2: dropped over-mtu packet: 1528 > 1500"
This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.
Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retreive a hash value from the SKB and store it
in the dissector key for future matching.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The header file linux/uio.h includes crypto/hash.h which pulls in
most of the Crypto API. Since linux/uio.h is used throughout the
kernel this means that every tiny bit of change to the Crypto API
causes the entire kernel to get rebuilt.
This patch fixes this by moving it into lib/iov_iter.c instead
where it is actually used.
This patch also fixes the ifdef to use CRYPTO_HASH instead of just
CRYPTO which does not guarantee the existence of ahash.
Unfortunately a number of drivers were relying on linux/uio.h to
provide access to linux/slab.h. This patch adds inclusions of
linux/slab.h as detected by build failures.
Also skbuff.h was relying on this to provide a declaration for
ahash_request. This patch adds a forward declaration instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Lorenz recently reported:
In our TC classifier cls_redirect [0], we use the following sequence of
helper calls to decapsulate a GUE (basically IP + UDP + custom header)
encapsulated packet:
bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
bpf_redirect(skb->ifindex, BPF_F_INGRESS)
It seems like some checksums of the inner headers are not validated in
this case. For example, a TCP SYN packet with invalid TCP checksum is
still accepted by the network stack and elicits a SYN ACK. [...]
That is, we receive the following packet from the driver:
| ETH | IP | UDP | GUE | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
On this packet we run skb_adjust_room_mac(-encap_len), and get the following:
| ETH | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
turned into a no-op due to CHECKSUM_UNNECESSARY.
The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.
Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.
The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.
[0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
Fixes: 2be7e212d5 ("bpf: add bpf_skb_adjust_room helper")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net
In order to:
(1) attach more than one BPF program type to netns, or
(2) support attaching BPF programs to netns with bpf_link, or
(3) support multi-prog attach points for netns
we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.
Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.
Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.
This is similar to how it is organized for cgroup with cgroup_bpf.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-3-jakub@cloudflare.com
mptcp calls this from the transmit side, from process context.
Allow a sleeping allocation instead of unconditional GFP_ATOMIC.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Fixed the punctuation and some typos.
Improved some sentences with minor changes.
No change of semantics or code.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.
Fixes: 9207f9d45b ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
pkt->skb->tc_redirected = 1;
^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
pkt->skb->tc_from_ingress = 1;
^~
To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.
This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).
Fixes: bcfabee1af ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only users for such argument are the UDP protocol and the UNIX
socket family. We can safely reclaim the accounted memory directly
from the UDP code and, after the previous patch, we can do scm
stats accounting outside the datagram helpers.
Overall this cleans up a bit some datagram-related helpers, and
avoids an indirect call per packet in the UDP receive path.
v1 -> v2:
- call scm_stat_del() only when not peeking - Kirill
- fix build issue with CONFIG_INET_ESPINTCP
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix all kernel-doc warnings in <linux/skbuff.h>.
Fixes these warnings:
../include/linux/skbuff.h:890: warning: Function parameter or member 'list' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
net/unix/af_unix.c:1761
____sys_sendmsg+0x33e/0x370
___sys_sendmsg+0xa6/0xf0
__sys_sendmsg+0x69/0xf0
__x64_sys_sendmsg+0x51/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe
write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
__skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
__skb_try_recv_datagram+0xbe/0x220
unix_dgram_recvmsg+0xee/0x850
____sys_recvmsg+0x1fb/0x210
___sys_recvmsg+0xa2/0xf0
__sys_recvmsg+0x66/0xf0
__x64_sys_recvmsg+0x51/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Since only the read is operating as lockless, it could introduce a logic
bug in unix_recvq_full() due to the load tearing. Fix it by adding
a lockless variant of skb_queue_len() and unix_recvq_full() where
READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
the commit d7d16a8935 ("net: add skb_queue_empty_lockless()").
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the core functions to chain/unchain
GSO skbs at the frag_list pointer. This also adds
a new GSO type SKB_GSO_FRAGLIST and a is_flist
flag to napi_gro_cb which indicates that this
flow will be GROed by fraglist chaining.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds new Fraglist GRO/GSO feature flags. They will be used
to configure fraglist GRO/GSO what will be implemented with some
followup paches.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2020-01-21
1) Add support for TCP encapsulation of IKE and ESP messages,
as defined by RFC 8229. Patchset from Sabrina Dubroca.
Please note that there is a merge conflict in:
net/unix/af_unix.c
between commit:
3c32da19a8 ("unix: Show number of pending scm files of receive queue in fdinfo")
from the net-next tree and commit:
b50b0580d2 ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram")
from the ipsec-next tree.
The conflict can be solved as done in linux-next.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This worked before, because we made all callers name their next pointer
"next". But in trying to be more "drop-in" ready, the silliness here is
revealed. This commit fixes the problem by making the macro argument and
the member use different names.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we can allocate the extension only after the skb,
this change allows the user to do the opposite, will simplify
allocation failure handling from MPTCP.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the continual effort to remove direct usage of skb->next and
skb->prev, this patch adds a helper for iterating through the
singly-linked variant of skb lists, which are used for lists of GSO
packet. The name "skb_list_..." has been chosen to match the existing
function, "kfree_skb_list, which also operates on these singly-linked
lists, and the "..._walk_safe" part is the same idiom as elsewhere in
the kernel.
This patch removes the helper from wireguard and puts it into
linux/skbuff.h, while making it a bit more robust for general usage. In
particular, parenthesis are added around the macro argument usage, and it
now accounts for trying to iterate through an already-null skb pointer,
which will simply run the iteration zero times. This latter enhancement
means it can be used to replace both do { ... } while and while (...)
open-coded idioms.
This should take care of these three possible usages, which match all
current methods of iterations.
skb_list_walk_safe(segs, skb, next) { ... }
skb_list_walk_safe(skb, skb, next) { ... }
skb_list_walk_safe(segs, skb, segs) { ... }
Gcc appears to generate efficient code for each of these.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb_mpls_push was not updating ethertype of an ethernet packet if
the packet was originally received from a non ARPHRD_ETHER device.
In the below OVS data path flow, since the device corresponding to
port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
not update the ethertype of the packet even though the previous
push_eth action had added an ethernet header to the packet.
recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4
Fixes: 8822e270d6 ("net: core: move push MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb_mpls_pop was not updating ethertype of an ethernet packet if the
packet was originally received from a non ARPHRD_ETHER device.
In the below OVS data path flow, since the device corresponding to port 7
is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
the ethertype of the packet even though the previous push_eth action had
added an ethernet header to the packet.
recirc_id(0),in_port(7),eth_type(0x8847),
mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
pop_mpls(eth_type=0x800),4
Fixes: ed246cee09 ("net: core: move pop MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull y2038 cleanups from Arnd Bergmann:
"y2038 syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended for
namespace cleaning: the kernel defines the traditional time_t, timeval
and timespec types that often lead to y2038-unsafe code. Even though
the unsafe usage is mostly gone from the kernel, having the types and
associated functions around means that we can still grow new users,
and that we may be missing conversions to safe types that actually
matter.
There are still a number of driver specific patches needed to get the
last users of these types removed, those have been submitted to the
respective maintainers"
Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
y2038: alarm: fix half-second cut-off
y2038: ipc: fix x32 ABI breakage
y2038: fix typo in powerpc vdso "LOPART"
y2038: allow disabling time32 system calls
y2038: itimer: change implementation to timespec64
y2038: move itimer reset into itimer.c
y2038: use compat_{get,set}_itimer on alpha
y2038: itimer: compat handling to itimer.c
y2038: time: avoid timespec usage in settimeofday()
y2038: timerfd: Use timespec64 internally
y2038: elfcore: Use __kernel_old_timeval for process times
y2038: make ns_to_compat_timeval use __kernel_old_timeval
y2038: socket: use __kernel_old_timespec instead of timespec
y2038: socket: remove timespec reference in timestamping
y2038: syscalls: change remaining timeval to __kernel_old_timeval
y2038: rusage: use __kernel_old_timeval
y2038: uapi: change __kernel_time_t to __kernel_old_time_t
y2038: stat: avoid 'time_t' in 'struct stat'
y2038: ipc: remove __kernel_time_t reference from headers
y2038: vdso: powerpc: avoid timespec references
...
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f5489 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b0 ("s390/qeth: consolidate some duplicated HW cmd code").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Once udp stack has set the UDP_SKB_IS_STATELESS flag, later skb free
assumes all skb head state has been dropped already.
This will leak the extension memory in case the skb has extensions other
than the ipsec secpath, e.g. bridge nf data.
To fix this, set the UDP_SKB_IS_STATELESS flag only if we don't have
extensions or if the extension space can be free'd.
Fixes: 895b5c9f20 ("netfilter: drop bridge nf reset from nf_reset")
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Byron Stanoszek <gandalf@winds.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'timespec' type definition and helpers like ktime_to_timespec()
or timespec64_to_timespec() should no longer be used in the kernel so
we can remove them and avoid introducing y2038 issues in new code.
Change the socket code that needs to pass a timespec to user space for
backward compatibility to use __kernel_old_timespec instead. This type
has the same layout but with a clearer defined name.
Slightly reformat tcp_recv_timestamp() for consistency after the removal
of timespec64_to_timespec().
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
skb_peek_tail() can be used without protection of a lock,
as spotted by KCSAN [1]
In order to avoid load-stearing, add a READ_ONCE()
Note that the corresponding WRITE_ONCE() are already there.
[1]
BUG: KCSAN: data-race in sk_wait_data / skb_queue_tail
read to 0xffff8880b36a4118 of 8 bytes by task 20426 on cpu 1:
skb_peek_tail include/linux/skbuff.h:1784 [inline]
sk_wait_data+0x15b/0x250 net/core/sock.c:2477
kcm_wait_data+0x112/0x1f0 net/kcm/kcmsock.c:1103
kcm_recvmsg+0xac/0x320 net/kcm/kcmsock.c:1130
sock_recvmsg_nosec net/socket.c:871 [inline]
sock_recvmsg net/socket.c:889 [inline]
sock_recvmsg+0x92/0xb0 net/socket.c:885
___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
__sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
__do_sys_recvmmsg net/socket.c:2703 [inline]
__se_sys_recvmmsg net/socket.c:2696 [inline]
__x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
write to 0xffff8880b36a4118 of 8 bytes by task 451 on cpu 0:
__skb_insert include/linux/skbuff.h:1852 [inline]
__skb_queue_before include/linux/skbuff.h:1958 [inline]
__skb_queue_tail include/linux/skbuff.h:1991 [inline]
skb_queue_tail+0x7e/0xc0 net/core/skbuff.c:3145
kcm_queue_rcv_skb+0x202/0x310 net/kcm/kcmsock.c:206
kcm_rcv_strparser+0x74/0x4b0 net/kcm/kcmsock.c:370
__strp_recv+0x348/0xf50 net/strparser/strparser.c:309
strp_recv+0x84/0xa0 net/strparser/strparser.c:343
tcp_read_sock+0x174/0x5c0 net/ipv4/tcp.c:1639
strp_read_sock+0xd4/0x140 net/strparser/strparser.c:366
do_strp_work net/strparser/strparser.c:414 [inline]
strp_work+0x9a/0xe0 net/strparser/strparser.c:423
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 451 Comm: kworker/u4:3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: kstrp strp_work
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>