09bd9e940e98acad0c854f75b31fc79a93cde981
213 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
8b444656fa |
Merge 5.10.56 into android12-5.10-lts
Changes in 5.10.56 selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c io_uring: fix null-ptr-deref in io_sq_offload_start() x86/asm: Ensure asm/proto.h can be included stand-alone pipe: make pipe writes always wake up readers btrfs: fix rw device counting in __btrfs_free_extra_devids btrfs: mark compressed range uptodate only if all bio succeed Revert "ACPI: resources: Add checks for ACPI IRQ override" ACPI: DPTF: Fix reading of attributes x86/kvm: fix vcpu-id indexed array sizes KVM: add missing compat KVM_CLEAR_DIRTY_LOG ocfs2: fix zero out valid data ocfs2: issue zeroout to EOF blocks can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values can: mcba_usb_start(): add missing urb->transfer_dma initialization can: usb_8dev: fix memory leak can: ems_usb: fix memory leak can: esd_usb2: fix memory leak alpha: register early reserved memory in memblock HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT NIU: fix incorrect error return, missed in previous revert drm/amd/display: ensure dentist display clock update finished in DCN20 drm/amdgpu: Avoid printing of stack contents on firmware load error drm/amdgpu: Fix resource leak on probe error path blk-iocost: fix operation ordering in iocg_wake_fn() nfc: nfcsim: fix use after free during module unload cfg80211: Fix possible memory leak in function cfg80211_bss_update RDMA/bnxt_re: Fix stats counters bpf: Fix OOB read when printing XDP link fdinfo mac80211: fix enabling 4-address mode on a sta vif after assoc netfilter: conntrack: adjust stop timestamp to real expiry value netfilter: nft_nat: allow to specify layer 4 protocol NAT only i40e: Fix logic of disabling queues i40e: Fix firmware LLDP agent related warning i40e: Fix queue-to-TC mapping on Tx i40e: Fix log TC creation failure when max num of queues is exceeded tipc: fix implicit-connect for SYN+ tipc: fix sleeping in tipc accept routine net: Set true network header for ECN decapsulation net: qrtr: fix memory leaks ionic: remove intr coalesce update from napi ionic: fix up dim accounting for tx and rx ionic: count csum_none when offload enabled tipc: do not write skb_shinfo frags when doing decrytion octeontx2-pf: Fix interface down flag on error mlx4: Fix missing error code in mlx4_load_one() KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access net: llc: fix skb_over_panic drm/msm/dpu: Fix sm8250_mdp register length drm/msm/dp: Initialize the INTF_CONFIG register skmsg: Make sk_psock_destroy() static net/mlx5: Fix flow table chaining net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() sctp: fix return value check in __sctp_rcv_asconf_lookup tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sis900: Fix missing pci_disable_device() in probe and remove can: hi311x: fix a signedness bug in hi3110_cmd() bpf: Introduce BPF nospec instruction for mitigating Spectre v4 bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Remove superfluous aux sanitation on subprog rejection bpf: verifier: Allocate idmap scratch in verifier env bpf: Fix pointer arithmetic mask tightening under state pruning SMB3: fix readpage for large swap cache powerpc/pseries: Fix regression while building external modules Revert "perf map: Fix dso->nsinfo refcounting" i40e: Add additional info to PHY type error can: j1939: j1939_session_deactivate(): clarify lifetime of session object Linux 5.10.56 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib3c9244afb7ee5d6ee8d3235efe8956898f486c4 |
||
![]() |
bea9e2fd18 |
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
[ Upstream commit f5e81d1117501546b7be050c5fbafa6efd2c722c ] In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
67e686fc73 |
Revert "bpf: Track subprog poke descriptors correctly and fix use-after-free"
This reverts commit
|
||
![]() |
afe9ed0e13 |
Merge 5.10.53 into android12-5.10-lts
Changes in 5.10.53 ARM: dts: gemini: rename mdio to the right name ARM: dts: gemini: add device_type on pci ARM: dts: rockchip: Fix thermal sensor cells o rk322x ARM: dts: rockchip: fix pinctrl sleep nodename for rk3036-kylin and rk3288 arm64: dts: rockchip: fix pinctrl sleep nodename for rk3399.dtsi ARM: dts: rockchip: Fix the timer clocks order ARM: dts: rockchip: Fix IOMMU nodes properties on rk322x ARM: dts: rockchip: Fix power-controller node names for rk3066a ARM: dts: rockchip: Fix power-controller node names for rk3188 ARM: dts: rockchip: Fix power-controller node names for rk3288 arm64: dts: rockchip: Fix power-controller node names for px30 arm64: dts: rockchip: Fix power-controller node names for rk3328 arm64: dts: rockchip: Fix power-controller node names for rk3399 reset: ti-syscon: fix to_ti_syscon_reset_data macro ARM: brcmstb: dts: fix NAND nodes names ARM: Cygnus: dts: fix NAND nodes names ARM: NSP: dts: fix NAND nodes names ARM: dts: BCM63xx: Fix NAND nodes names ARM: dts: Hurricane 2: Fix NAND nodes names ARM: dts: imx6: phyFLEX: Fix UART hardware flow control ARM: imx: pm-imx5: Fix references to imx5_cpu_suspend_info arm64: dts: rockchip: fix regulator-gpio states array ARM: dts: ux500: Fix interrupt cells ARM: dts: ux500: Rename gpio-controller node ARM: dts: ux500: Fix orientation of accelerometer ARM: dts: imx6dl-riotboard: configure PHY clock and set proper EEE value rtc: mxc_v2: add missing MODULE_DEVICE_TABLE kbuild: sink stdout from cmd for silent build ARM: dts: am57xx-cl-som-am57x: fix ti,no-reset-on-init flag for gpios ARM: dts: am437x-gp-evm: fix ti,no-reset-on-init flag for gpios ARM: dts: am335x: fix ti,no-reset-on-init flag for gpios ARM: dts: OMAP2+: Replace underscores in sub-mailbox node names arm64: dts: ti: k3-am654x/j721e/j7200-common-proc-board: Fix MCU_RGMII1_TXC direction ARM: tegra: wm8903: Fix polarity of headphones-detection GPIO in device-trees ARM: tegra: nexus7: Correct 3v3 regulator GPIO of PM269 variant arm64: dts: qcom: sc7180: Move rmtfs memory region ARM: dts: stm32: Remove extra size-cells on dhcom-pdk2 ARM: dts: stm32: Fix touchscreen node on dhcom-pdk2 ARM: dts: stm32: fix stm32mp157c-odyssey card detect pin ARM: dts: stm32: fix gpio-keys node on STM32 MCU boards ARM: dts: stm32: fix RCC node name on stm32f429 MCU ARM: dts: stm32: fix timer nodes on STM32 MCU to prevent warnings memory: tegra: Fix compilation warnings on 64bit platforms firmware: arm_scmi: Add SMCCC discovery dependency in Kconfig firmware: arm_scmi: Fix the build when CONFIG_MAILBOX is not selected ARM: dts: bcm283x: Fix up MMC node names ARM: dts: bcm283x: Fix up GPIO LED node names arm64: dts: juno: Update SCPI nodes as per the YAML schema ARM: dts: rockchip: fix supply properties in io-domains nodes ARM: dts: stm32: fix i2c node name on stm32f746 to prevent warnings ARM: dts: stm32: move stmmac axi config in ethernet node on stm32mp15 ARM: dts: stm32: fix the Odyssey SoM eMMC VQMMC supply ARM: dts: stm32: Drop unused linux,wakeup from touchscreen node on DHCOM SoM ARM: dts: stm32: Rename spi-flash/mx66l51235l@N to flash@N on DHCOM SoM ARM: dts: stm32: fix stpmic node for stm32mp1 boards ARM: OMAP2+: Block suspend for am3 and am4 if PM is not configured soc/tegra: fuse: Fix Tegra234-only builds firmware: tegra: bpmp: Fix Tegra234-only builds arm64: dts: ls208xa: remove bus-num from dspi node arm64: dts: imx8mq: assign PCIe clocks thermal/core: Correct function name thermal_zone_device_unregister() thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1 thermal/drivers/imx_sc: Add missing of_node_put for loop iteration thermal/drivers/sprd: Add missing of_node_put for loop iteration kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set arch/arm64/boot/dts/marvell: fix NAND partitioning scheme rtc: max77686: Do not enforce (incorrect) interrupt trigger type scsi: aic7xxx: Fix unintentional sign extension issue on left shift of u8 scsi: libsas: Add LUN number check in .slave_alloc callback scsi: libfc: Fix array index out of bound exception scsi: qedf: Add check to synchronize abort and flush sched/fair: Fix CFS bandwidth hrtimer expiry type perf/x86/intel/uncore: Clean up error handling path of iio mapping thermal/core/thermal_of: Stop zone device before unregistering it s390/traps: do not test MONITOR CALL without CONFIG_BUG s390: introduce proper type handling call_on_stack() macro cifs: prevent NULL deref in cifs_compose_mount_options() firmware: turris-mox-rwtm: add marvell,armada-3700-rwtm-firmware compatible string arm64: dts: marvell: armada-37xx: move firmware node to generic dtsi file Revert "swap: fix do_swap_page() race with swapoff" f2fs: Show casefolding support only when supported mm/thp: simplify copying of huge zero page pmd when fork mm/userfaultfd: fix uffd-wp special cases for fork() mm/page_alloc: fix memory map initialization for descending nodes usb: cdns3: Enable TDL_CHK only for OUT ep net: bcmgenet: ensure EXT_ENERGY_DET_MASK is clear net: dsa: mv88e6xxx: enable .port_set_policy() on Topaz net: dsa: mv88e6xxx: use correct .stats_set_histogram() on Topaz net: dsa: mv88e6xxx: enable .rmu_disable() on Topaz net: dsa: mv88e6xxx: enable devlink ATU hash param for Topaz net: ipv6: fix return value of ip6_skb_dst_mtu netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfo net/sched: act_ct: fix err check for nf_conntrack_confirm vmxnet3: fix cksum offload issues for tunnels with non-default udp ports net/sched: act_ct: remove and free nf_table callbacks net: bridge: sync fdb to new unicast-filtering ports net: netdevsim: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops net: bcmgenet: Ensure all TX/RX queues DMAs are disabled net: ip_tunnel: fix mtu calculation for ETHER tunnel devices net: moxa: fix UAF in moxart_mac_probe net: qcom/emac: fix UAF in emac_remove net: ti: fix UAF in tlan_remove_one net: send SYNACK packet with accepted fwmark net: validate lwtstate->data before returning from skb_tunnel_info() Revert "mm/shmem: fix shmem_swapin() race with swapoff" net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() net: fddi: fix UAF in fza_probe dma-buf/sync_file: Don't leak fences on merge failure kbuild: do not suppress Kconfig prompts for silent build ARM: dts: aspeed: Fix AST2600 machines line names ARM: dts: tacoma: Add phase corrections for eMMC tcp: consistently disable header prediction for mptcp tcp: annotate data races around tp->mtu_info tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized ipv6: tcp: drop silly ICMPv6 packet too big messages tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path tools: bpf: Fix error in 'make -C tools/ bpf_install' bpftool: Properly close va_list 'ap' by va_end() on error bpf: Track subprog poke descriptors correctly and fix use-after-free perf test bpf: Free obj_buf drm/panel: nt35510: Do not fail if DSI read fails udp: annotate data races around unix_sk(sk)->gso_size Linux 5.10.53 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iac8fe9cd2abb2d8dd993967205a97c89f01f3647 |
||
![]() |
a9f36bf361 |
bpf: Track subprog poke descriptors correctly and fix use-after-free
commit f263a81451c12da5a342d90572e317e611846f2c upstream. Subprograms are calling map_poke_track(), but on program release there is no hook to call map_poke_untrack(). However, on program release, the aux memory (and poke descriptor table) is freed even though we still have a reference to it in the element list of the map aux data. When we run map_poke_run(), we then end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run(): [...] [ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e [ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337 [ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399 [ 402.824715] Call Trace: [ 402.824719] dump_stack+0x93/0xc2 [ 402.824727] print_address_description.constprop.0+0x1a/0x140 [ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824744] kasan_report.cold+0x7c/0xd8 [ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824757] prog_array_map_poke_run+0xc2/0x34e [ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0 [...] The elements concerned are walked as follows: for (i = 0; i < elem->aux->size_poke_tab; i++) { poke = &elem->aux->poke_tab[i]; [...] The access to size_poke_tab is a 4 byte read, verified by checking offsets in the KASAN dump: [ 402.825004] The buggy address belongs to the object at ffff8881905a7800 which belongs to the cache kmalloc-1k of size 1024 [ 402.825008] The buggy address is located 320 bytes inside of 1024-byte region [ffff8881905a7800, ffff8881905a7c00) The pahole output of bpf_prog_aux: struct bpf_prog_aux { [...] /* --- cacheline 5 boundary (320 bytes) --- */ u32 size_poke_tab; /* 320 4 */ [...] In general, subprograms do not necessarily manage their own data structures. For example, BTF func_info and linfo are just pointers to the main program structure. This allows reference counting and cleanup to be done on the latter which simplifies their management a bit. The aux->poke_tab struct, however, did not follow this logic. The initial proposed fix for this use-after-free bug further embedded poke data tracking into the subprogram with proper reference counting. However, Daniel and Alexei questioned why we were treating these objects special; I agree, its unnecessary. The fix here removes the per subprogram poke table allocation and map tracking and instead simply points the aux->poke_tab pointer at the main programs poke table. This way, map tracking is simplified to the main program and we do not need to manage them per subprogram. This also means, bpf_prog_free_deferred(), which unwinds the program reference counting and kfrees objects, needs to ensure that we don't try to double free the poke_tab when free'ing the subprog structures. This is easily solved by NULL'ing the poke_tab pointer. The second detail is to ensure that per subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do this, we add a pointer in the poke structure to point at the subprogram value so JITs can easily check while walking the poke_tab structure if the current entry belongs to the current program. The aux pointer is stable and therefore suitable for such comparison. On the jit_subprogs() error path, we omit cleaning up the poke->aux field because these are only ever referenced from the JIT side, but on error we will never make it to the JIT, so its fine to leave them dangling. Removing these pointers would complicate the error path for no reason. However, we do need to untrack all poke descriptors from the main program as otherwise they could race with the freeing of JIT memory from the subprograms. Lastly, |
||
![]() |
8db62be3c3 |
Merge 5.10.51 into android12-5.10-lts
Changes in 5.10.51 drm/mxsfb: Don't select DRM_KMS_FB_HELPER drm/zte: Don't select DRM_KMS_FB_HELPER drm/ast: Fixed CVE for DP501 drm/amd/display: fix HDCP reset sequence on reinitialize drm/amd/amdgpu/sriov disable all ip hw status by default drm/vc4: fix argument ordering in vc4_crtc_get_margins() drm/bridge: nwl-dsi: Force a full modeset when crtc_state->active is changed to be true net: pch_gbe: Use proper accessors to BE data in pch_ptp_match() drm/amd/display: fix use_max_lb flag for 420 pixel formats clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe() hugetlb: clear huge pte during flush function on mips platform atm: iphase: fix possible use-after-free in ia_module_exit() mISDN: fix possible use-after-free in HFC_cleanup() atm: nicstar: Fix possible use-after-free in nicstar_cleanup() net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT drm/mediatek: Fix PM reference leak in mtk_crtc_ddp_hw_init() net: mdio: ipq8064: add regmap config to disable REGCACHE drm/bridge: lt9611: Add missing MODULE_DEVICE_TABLE reiserfs: add check for invalid 1st journal block drm/virtio: Fix double free on probe failure net: mdio: provide shim implementation of devm_of_mdiobus_register net/sched: cls_api: increase max_reclassify_loop pinctrl: equilibrium: Add missing MODULE_DEVICE_TABLE drm/scheduler: Fix hang when sched_entity released drm/sched: Avoid data corruptions udf: Fix NULL pointer dereference in udf_symlink function drm/vc4: Fix clock source for VEC PixelValve on BCM2711 drm/vc4: hdmi: Fix PM reference leak in vc4_hdmi_encoder_pre_crtc_co() e100: handle eeprom as little endian igb: handle vlan types with checker enabled igb: fix assignment on big endian machines drm/bridge: cdns: Fix PM reference leak in cdns_dsi_transfer() clk: renesas: r8a77995: Add ZA2 clock net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet net/mlx5: Fix lag port remapping logic drm: rockchip: add missing registers for RK3188 drm: rockchip: add missing registers for RK3066 net: stmmac: the XPCS obscures a potential "PHY not found" error RDMA/rtrs: Change MAX_SESS_QUEUE_DEPTH clk: tegra: Fix refcounting of gate clocks clk: tegra: Ensure that PLLU configuration is applied properly drm: bridge: cdns-mhdp8546: Fix PM reference leak in virtio-net: Add validation for used length ipv6: use prandom_u32() for ID generation MIPS: cpu-probe: Fix FPU detection on Ingenic JZ4760(B) MIPS: ingenic: Select CPU_SUPPORTS_CPUFREQ && MIPS_EXTERNAL_TIMER drm/amd/display: Avoid HDCP over-read and corruption drm/amdgpu: remove unsafe optimization to drop preamble ib net: tcp better handling of reordering then loss cases RDMA/cxgb4: Fix missing error code in create_qp() dm space maps: don't reset space map allocation cursor when committing dm writecache: don't split bios when overwriting contiguous cache content dm: Fix dm_accept_partial_bio() relative to zone management commands net: bridge: mrp: Update ring transitions. pinctrl: mcp23s08: fix race condition in irq handler ice: set the value of global config lock timeout longer ice: fix clang warning regarding deadcode.DeadStores virtio_net: Remove BUG() to avoid machine dead net: mscc: ocelot: check return value after calling platform_get_resource() net: bcmgenet: check return value after calling platform_get_resource() net: mvpp2: check return value after calling platform_get_resource() net: micrel: check return value after calling platform_get_resource() net: moxa: Use devm_platform_get_and_ioremap_resource() drm/amd/display: Fix DCN 3.01 DSCCLK validation drm/amd/display: Update scaling settings on modeset drm/amd/display: Release MST resources on switch from MST to SST drm/amd/display: Set DISPCLK_MAX_ERRDET_CYCLES to 7 drm/amd/display: Fix off-by-one error in DML net: phy: realtek: add delay to fix RXC generation issue selftests: Clean forgotten resources as part of cleanup() net: sgi: ioc3-eth: check return value after calling platform_get_resource() drm/amdkfd: use allowed domain for vmbo validation fjes: check return value after calling platform_get_resource() selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC r8169: avoid link-up interrupt issue on RTL8106e if user enables ASPM drm/amd/display: Verify Gamma & Degamma LUT sizes in amdgpu_dm_atomic_check xfrm: Fix error reporting in xfrm_state_construct. dm writecache: commit just one block, not a full page wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP wl1251: Fix possible buffer overflow in wl1251_cmd_scan cw1200: add missing MODULE_DEVICE_TABLE drm/amdkfd: fix circular locking on get_wave_state drm/amdkfd: Fix circular lock in nocpsch path bpf: Fix up register-based shifts in interpreter to silence KUBSAN ice: fix incorrect payload indicator on PTYPE ice: mark PTYPE 2 as reserved mt76: mt7615: fix fixed-rate tx status reporting net: fix mistake path for netdev_features_strings net: ipa: Add missing of_node_put() in ipa_firmware_load() net: sched: fix error return code in tcf_del_walker() io_uring: fix false WARN_ONCE drm/amdgpu: fix bad address translation for sienna_cichlid drm/amdkfd: Walk through list with dqm lock hold mt76: mt7915: fix IEEE80211_HE_PHY_CAP7_MAX_NC for station mode rtl8xxxu: Fix device info for RTL8192EU devices MIPS: add PMD table accounting into MIPS'pmd_alloc_one net: fec: add ndo_select_queue to fix TX bandwidth fluctuations atm: nicstar: use 'dma_free_coherent' instead of 'kfree' atm: nicstar: register the interrupt handler in the right place vsock: notify server to shutdown when client has pending signal RDMA/rxe: Don't overwrite errno from ib_umem_get() iwlwifi: mvm: don't change band on bound PHY contexts iwlwifi: mvm: fix error print when session protection ends iwlwifi: pcie: free IML DMA memory allocation iwlwifi: pcie: fix context info freeing sfc: avoid double pci_remove of VFs sfc: error code if SRIOV cannot be disabled wireless: wext-spy: Fix out-of-bounds warning cfg80211: fix default HE tx bitrate mask in 2G band mac80211: consider per-CPU statistics if present mac80211_hwsim: add concurrent channels scanning support over virtio IB/isert: Align target max I/O size to initiator size media, bpf: Do not copy more entries than user space requested net: ip: avoid OOM kills with large UDP sends over loopback RDMA/cma: Fix rdma_resolve_route() memory leak Bluetooth: btusb: Fixed too many in-token issue for Mediatek Chip. Bluetooth: Fix the HCI to MGMT status conversion table Bluetooth: Fix alt settings for incoming SCO with transparent coding format Bluetooth: Shutdown controller after workqueues are flushed or cancelled Bluetooth: btusb: Add a new QCA_ROME device (0cf3:e500) Bluetooth: L2CAP: Fix invalid access if ECRED Reconfigure fails Bluetooth: L2CAP: Fix invalid access on ECRED Connection response Bluetooth: btusb: Add support USB ALT 3 for WBS Bluetooth: mgmt: Fix the command returns garbage parameter value Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc. sched/fair: Ensure _sum and _avg values stay consistent bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc() flow_offload: action should not be NULL when it is referenced sctp: validate from_addr_param return sctp: add size validation when walking chunks MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops MIPS: set mips32r5 for virt extensions selftests/resctrl: Fix incorrect parsing of option "-t" MIPS: MT extensions are not available on MIPS32r1 ath11k: unlock on error path in ath11k_mac_op_add_interface() arm64: dts: rockchip: add rk3328 dwc3 usb controller node arm64: dts: rockchip: Enable USB3 for rk3328 Rock64 loop: fix I/O error on fsync() in detached loop devices mm,hwpoison: return -EBUSY when migration fails io_uring: simplify io_remove_personalities() io_uring: Convert personality_idr to XArray io_uring: convert io_buffer_idr to XArray scsi: iscsi: Fix race condition between login and sync thread scsi: iscsi: Fix iSCSI cls conn state powerpc/mm: Fix lockup on kernel exec fault powerpc/barrier: Avoid collision with clang's __lwsync macro powerpc/powernv/vas: Release reference to tgid during window close drm/amdgpu: Update NV SIMD-per-CU to 2 drm/amdgpu: enable sdma0 tmz for Raven/Renoir(V2) drm/radeon: Add the missed drm_gem_object_put() in radeon_user_framebuffer_create() drm/radeon: Call radeon_suspend_kms() in radeon_pci_shutdown() for Loongson64 drm/vc4: txp: Properly set the possible_crtcs mask drm/vc4: crtc: Skip the TXP drm/vc4: hdmi: Prevent clock unbalance drm/dp: Handle zeroed port counts in drm_dp_read_downstream_info() drm/rockchip: dsi: remove extra component_del() call drm/amd/display: fix incorrrect valid irq check pinctrl/amd: Add device HID for new AMD GPIO controller drm/amd/display: Reject non-zero src_y and src_x for video planes drm/tegra: Don't set allow_fb_modifiers explicitly drm/msm/mdp4: Fix modifier support enabling drm/arm/malidp: Always list modifiers drm/nouveau: Don't set allow_fb_modifiers explicitly drm/i915/display: Do not zero past infoframes.vsc mmc: sdhci-acpi: Disable write protect detection on Toshiba Encore 2 WT8-B mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode mmc: core: clear flags before allowing to retune mmc: core: Allow UHS-I voltage switch for SDSC cards if supported ata: ahci_sunxi: Disable DIPM arm64: tlb: fix the TTL value of tlb_get_level cpu/hotplug: Cure the cpusets trainwreck clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround fpga: stratix10-soc: Add missing fpga_mgr_free() call ASoC: tegra: Set driver_name=tegra for all machine drivers i40e: fix PTP on 5Gb links qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute ipmi/watchdog: Stop watchdog timer when the current action is 'none' thermal/drivers/int340x/processor_thermal: Fix tcc setting ubifs: Fix races between xattr_{set|get} and listxattr operations power: supply: ab8500: Fix an old bug mfd: syscon: Free the allocated name field of struct regmap_config nvmem: core: add a missing of_node_put lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE selftests/lkdtm: Fix expected text for CR4 pinning extcon: intel-mrfld: Sync hardware and software state on init seq_buf: Fix overflow in seq_buf_putmem_hex() rq-qos: fix missed wake-ups in rq_qos_throttle try two tracing: Simplify & fix saved_tgids logic tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe coresight: Propagate symlink failure coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer() dm zoned: check zone capacity dm writecache: flush origin device when writing and cache is full dm btree remove: assign new_root only when removal succeeds PCI: Leave Apple Thunderbolt controllers on for s2idle or standby PCI: aardvark: Fix checking for PIO Non-posted Request PCI: aardvark: Implement workaround for the readback value of VEND_ID media: subdev: disallow ioctl for saa6588/davinci media: dtv5100: fix control-request directions media: zr364xx: fix memory leak in zr364xx_start_readpipe media: gspca/sq905: fix control-request direction media: gspca/sunplus: fix zero-length control requests media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K io_uring: fix clear IORING_SETUP_R_DISABLED in wrong function dm writecache: write at least 4k when committing pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq() drm/ast: Remove reference to struct drm_device.pdev jfs: fix GPF in diFree smackfs: restrict bytes count in smk_set_cipso() ext4: fix memory leak in ext4_fill_super f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances Linux 5.10.51 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Icb10fed733a0050848ecc23db13ae3d134895acd |
||
![]() |
2e66c36f13 |
bpf: Fix up register-based shifts in interpreter to silence KUBSAN
[ Upstream commit 28131e9d933339a92f78e7ab6429f4aaaa07061c ] syzbot reported a shift-out-of-bounds that KUBSAN observed in the interpreter: [...] UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2 shift exponent 255 is too large for 64-bit type 'long long unsigned int' CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420 __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735 bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline] bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline] bpf_prog_run_clear_cb include/linux/filter.h:755 [inline] run_filter+0x1a1/0x470 net/packet/af_packet.c:2031 packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104 dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387 xmit_one net/core/dev.c:3588 [inline] dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609 __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182 __bpf_tx_skb net/core/filter.c:2116 [inline] __bpf_redirect_no_mac net/core/filter.c:2141 [inline] __bpf_redirect+0x548/0xc80 net/core/filter.c:2164 ____bpf_clone_redirect net/core/filter.c:2448 [inline] bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420 ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523 __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737 bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline] bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50 bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582 bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline] __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Generally speaking, KUBSAN reports from the kernel should be fixed. However, in case of BPF, this particular report caused concerns since the large shift is not wrong from BPF point of view, just undefined. In the verifier, K-based shifts that are >= {64,32} (depending on the bitwidth of the instruction) are already rejected. The register-based cases were not given their content might not be known at verification time. Ideas such as verifier instruction rewrite with an additional AND instruction for the source register were brought up, but regularly rejected due to the additional runtime overhead they incur. As Edward Cree rightly put it: Shifts by more than insn bitness are legal in the BPF ISA; they are implementation-defined behaviour [of the underlying architecture], rather than UB, and have been made legal for performance reasons. Each of the JIT backends compiles the BPF shift operations to machine instructions which produce implementation-defined results in such a case; the resulting contents of the register may be arbitrary but program behaviour as a whole remains defined. Guard checks in the fast path (i.e. affecting JITted code) will thus not be accepted. The case of division by zero is not truly analogous here, as division instructions on many of the JIT-targeted architectures will raise a machine exception / fault on division by zero, whereas (to the best of my knowledge) none will do so on an out-of-bounds shift. Given the KUBSAN report only affects the BPF interpreter, but not JITs, one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run(). That would make the shifts defined, and thus shuts up KUBSAN, and the compiler would optimize out the AND on any CPU that interprets the shift amounts modulo the width anyway (e.g., confirmed from disassembly that on x86-64 and arm64 the generated interpreter code is the same before and after this fix). The BPF interpreter is slow path, and most likely compiled out anyway as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of BPF instructions by the interpreter. Given the main argument was to avoid sacrificing performance, the fact that the AND is optimized away from compiler for mainstream archs helps as well as a solution moving forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors to provide guidance when they see the ___bpf_prog_run() interpreter code and use it as a model for a new JIT backend. Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com Reported-by: Kurt Manucredo <fuzzybritches0@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Co-developed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com Cc: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
b16bfd6279 |
Revert "Revert "bpf: Fix fexit trampoline.""
This reverts commit
|
||
![]() |
bc751d322e |
Revert "bpf: Fix fexit trampoline."
This reverts commit
|
||
![]() |
e92949726c |
Merge 5.10.28 into android12-5.10
Changes in 5.10.28 arm64: mm: correct the inside linear map range during hotplug check bpf: Fix fexit trampoline. virtiofs: Fail dax mount if device does not support it ext4: shrink race window in ext4_should_retry_alloc() ext4: fix bh ref count on error paths fs: nfsd: fix kconfig dependency warning for NFSD_V4 rpc: fix NULL dereference on kmalloc failure iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate ASoC: rt1015: fix i2c communication error ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10 ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10 ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe ASoC: es8316: Simplify adc_pga_gain_tlv table ASoC: soc-core: Prevent warning if no DMI table is present ASoC: cs42l42: Fix Bitclock polarity inversion ASoC: cs42l42: Fix channel width support ASoC: cs42l42: Fix mixer volume control ASoC: cs42l42: Always wait at least 3ms after reset NFSD: fix error handling in NFSv4.0 callbacks kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing vhost: Fix vhost_vq_reset() io_uring: fix ->flags races by linked timeouts scsi: st: Fix a use after free in st_open() scsi: qla2xxx: Fix broken #endif placement staging: comedi: cb_pcidas: fix request_irq() warn staging: comedi: cb_pcidas64: fix request_irq() warn ASoC: rt5659: Update MCLK rate in set_sysclk() ASoC: rt711: add snd_soc_component remove callback thermal/core: Add NULL pointer check before using cooling device stats locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini() nvmet-tcp: fix kmap leak when data digest in use io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls static_call: Align static_call_is_init() patching condition ext4: do not iput inode under running transaction in ext4_rename() io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL net: mvpp2: fix interrupt mask/unmask skip condition flow_dissector: fix TTL and TOS dissection on IPv4 fragments can: dev: move driver related infrastructure into separate subdir net: introduce CAN specific pointer in the struct net_device can: tcan4x5x: fix max register value brcmfmac: clear EAP/association status bits on linkdown events ath11k: add ieee80211_unregister_hw to avoid kernel crash caused by NULL pointer rtw88: coex: 8821c: correct antenna switch function netdevsim: dev: Initialize FIB module after debugfs iwlwifi: pcie: don't disable interrupts for reg_lock ath10k: hold RCU lock when calling ieee80211_find_sta_by_ifaddr() net: ethernet: aquantia: Handle error cleanup of start on open appletalk: Fix skb allocation size in loopback case net: ipa: remove two unused register definitions net: ipa: fix register write command validation net: wan/lmc: unregister device when no matching device is found net: 9p: advance iov on empty read bpf: Remove MTU check in __bpf_skb_max_len ACPI: tables: x86: Reserve memory occupied by ACPI tables ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead() ALSA: usb-audio: Apply sample rate quirk to Logitech Connect ALSA: hda: Re-add dropped snd_poewr_change_state() calls ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8 xtensa: fix uaccess-related livelock in do_page_fault xtensa: move coprocessor_flush to the .text section KVM: SVM: load control fields from VMCB12 before checking them KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit PM: runtime: Fix race getting/putting suppliers at probe PM: runtime: Fix ordering in pm_runtime_get_suppliers() tracing: Fix stack trace event size s390/vdso: copy tod_steering_delta value to vdso_data page s390/vdso: fix tod_steering_delta type mm: fix race by making init_zero_pfn() early_initcall drm/amdkfd: dqm fence memory corruption drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings() drm/amdgpu: check alignment on CPU page for bo map reiserfs: update reiserfs_xattrs_initialized() condition drm/imx: fix memory leak when fails to init drm/tegra: dc: Restore coupling of display controllers drm/tegra: sor: Grab runtime PM reference across reset vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends pinctrl: rockchip: fix restore error in resume extcon: Add stubs for extcon_register_notifier_all() functions extcon: Fix error handling in extcon_dev_register firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0 usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield video: hyperv_fb: Fix a double free in hvfb_probe firewire: nosy: Fix a use-after-free bug in nosy_ioctl() usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control() USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem usb: musb: Fix suspend with devices connected for a64 usb: xhci-mtk: fix broken streams issue on 0.96 xHCI cdc-acm: fix BREAK rx code path adding necessary calls USB: cdc-acm: untangle a circular dependency between callback and softint USB: cdc-acm: downgrade message to debug USB: cdc-acm: fix double free on probe failure USB: cdc-acm: fix use-after-free after probe failure usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board. usb: dwc2: Prevent core suspend when port connection flag is 0 usb: dwc3: qcom: skip interconnect init for ACPI probe usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable soc: qcom-geni-se: Cleanup the code to remove proxy votes staging: rtl8192e: Fix incorrect source in memcpy() staging: rtl8192e: Change state information from u16 to u8 driver core: clear deferred probe reason on probe retry drivers: video: fbcon: fix NULL dereference in fbcon_cursor() riscv: evaluate put_user() arg before enabling user access Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG Linux 5.10.28 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifdbbeda8de3ee22a7aa3f5d3b10becf0aba1a124 |
||
![]() |
e21d2b9235 |
bpf: Fix fexit trampoline.
[ Upstream commit e21aa341785c679dd409c8cb71f864c00fe6c463 ]
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.
Fixes:
|
||
![]() |
2ff446fc4d |
ANDROID: bpf: Add vendor hook
Add vendor hook for bpf, so we can get memory type and use it to do memory type check for architecture dependent page table setting. Bug: 181639260 Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Change-Id: Icac325a040fb88c7f6b04b2409029b623bd8515f |
||
![]() |
080b6f4076 |
bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
Commit |
||
![]() |
9ff9b0d392 |
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ... |
||
![]() |
3aac1ead5e |
bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach
In preparation for allowing multiple attachments of freplace programs, move the references to the target program and trampoline into the bpf_tracing_link structure when that is created. To do this atomically, introduce a new mutex in prog->aux to protect writing to the two pointers to target prog and trampoline, and rename the members to make it clear that they are related. With this change, it is no longer possible to attach the same tracing program multiple times (detaching in-between), since the reference from the tracing program to the target disappears on the first attach. However, since the next patch will let the caller supply an attach target, that will also make it possible to attach to the same place multiple times. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk |
||
![]() |
eb411377ae |
bpf: Add bpf_seq_printf_btf helper
A helper is added to allow seq file writing of kernel data structures using vmlinux BTF. Its signature is long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); Flags and struct btf_ptr definitions/use are identical to the bpf_snprintf_btf helper, and the helper returns 0 on success or a negative error value. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com |
||
![]() |
c4d0bfb450 |
bpf: Add bpf_snprintf_btf helper
A helper is added to support tracing kernel type information in BPF using the BPF Type Format (BTF). Its signature is long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); struct btf_ptr * specifies - a pointer to the data to be traced - the BTF id of the type of data pointed to - a flags field is provided for future use; these flags are not to be confused with the BTF_F_* flags below that control how the btf_ptr is displayed; the flags member of the struct btf_ptr may be used to disambiguate types in kernel versus module BTF, etc; the main distinction is the flags relate to the type and information needed in identifying it; not how it is displayed. For example a BPF program with a struct sk_buff *skb could do the following: static struct btf_ptr b = { }; b.ptr = skb; b.type_id = __builtin_btf_type_id(struct sk_buff, 1); bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0); Default output looks like this: (struct sk_buff){ .transport_header = (__u16)65535, .mac_header = (__u16)65535, .end = (sk_buff_data_t)192, .head = (unsigned char *)0x000000007524fd8b, .data = (unsigned char *)0x000000007524fd8b, .truesize = (unsigned int)768, .users = (refcount_t){ .refs = (atomic_t){ .counter = (int)1, }, }, } Flags modifying display are as follows: - BTF_F_COMPACT: no formatting around type information - BTF_F_NONAME: no struct/union member names/types - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values; equivalent to %px. - BTF_F_ZERO: show zero-valued struct/union members; they are not displayed by default Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com |
||
![]() |
ebf7d1f508 |
bpf, x64: rework pro/epilogue and tailcall handling in JIT
This commit serves two things: 1) it optimizes BPF prologue/epilogue generation 2) it makes possible to have tailcalls within BPF subprogram Both points are related to each other since without 1), 2) could not be achieved. In [1], Alexei says: "The prologue will look like: nop5 xor eax,eax // two new bytes if bpf_tail_call() is used in this // function push rbp mov rbp, rsp sub rsp, rounded_stack_depth push rax // zero init tail_call counter variable number of push rbx,r13,r14,r15 Then bpf_tail_call will pop variable number rbx,.. and final 'pop rax' Then 'add rsp, size_of_current_stack_frame' jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov rbp, rsp' This way new function will set its own stack size and will init tail call counter with whatever value the parent had. If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'. Instead it would need to have 'nop2' in there." Implement that suggestion. Since the layout of stack is changed, tail call counter handling can not rely anymore on popping it to rbx just like it have been handled for constant prologue case and later overwrite of rbx with actual value of rbx pushed to stack. Therefore, let's use one of the register (%rcx) that is considered to be volatile/caller-saved and pop the value of tail call counter in there in the epilogue. Drop the BUILD_BUG_ON in emit_prologue and in emit_bpf_tail_call_indirect where instruction layout is not constant anymore. Introduce new poke target, 'tailcall_bypass' to poke descriptor that is dedicated for skipping the register pops and stack unwind that are generated right before the actual jump to target program. For case when the target program is not present, BPF program will skip the pop instructions and nop5 dedicated for jmpq $target. An example of such state when only R6 of callee saved registers is used by program: ffffffffc0513aa1: e9 0e 00 00 00 jmpq 0xffffffffc0513ab4 ffffffffc0513aa6: 5b pop %rbx ffffffffc0513aa7: 58 pop %rax ffffffffc0513aa8: 48 81 c4 00 00 00 00 add $0x0,%rsp ffffffffc0513aaf: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ffffffffc0513ab4: 48 89 df mov %rbx,%rdi When target program is inserted, the jump that was there to skip pops/nop5 will become the nop5, so CPU will go over pops and do the actual tailcall. One might ask why there simply can not be pushes after the nop5? In the following example snippet: ffffffffc037030c: 48 89 fb mov %rdi,%rbx (...) ffffffffc0370332: 5b pop %rbx ffffffffc0370333: 58 pop %rax ffffffffc0370334: 48 81 c4 00 00 00 00 add $0x0,%rsp ffffffffc037033b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ffffffffc0370340: 48 81 ec 00 00 00 00 sub $0x0,%rsp ffffffffc0370347: 50 push %rax ffffffffc0370348: 53 push %rbx ffffffffc0370349: 48 89 df mov %rbx,%rdi ffffffffc037034c: e8 f7 21 00 00 callq 0xffffffffc0372548 There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall and jump target is not present. ctx is in %rbx register and BPF subprogram that we will call into on ffffffffc037034c is relying on it, e.g. it will pick ctx from there. Such code layout is therefore broken as we would overwrite the content of %rbx with the value that was pushed on the prologue. That is the reason for the 'bypass' approach. Special care needs to be taken during the install/update/remove of tailcall target. In case when target program is not present, the CPU must not execute the pop instructions that precede the tailcall. To address that, the following states can be defined: A nop, unwind, nop B nop, unwind, tail C skip, unwind, nop D skip, unwind, tail A is forbidden (lead to incorrectness). The state transitions between tailcall install/update/remove will work as follows: First install tail call f: C->D->B(f) * poke the tailcall, after that get rid of the skip Update tail call f to f': B(f)->B(f') * poke the tailcall (poke->tailcall_target) and do NOT touch the poke->tailcall_bypass Remove tail call: B(f')->C(f') * poke->tailcall_bypass is poked back to jump, then we wait the RCU grace period so that other programs will finish its execution and after that we are safe to remove the poke->tailcall_target Install new tail call (f''): C(f')->D(f'')->B(f''). * same as first step This way CPU can never be exposed to "unwind, tail" state. Last but not least, when tailcalls get mixed with bpf2bpf calls, it would be possible to encounter the endless loop due to clearing the tailcall counter if for example we would use the tailcall3-like from BPF selftests program that would be subprogram-based, meaning the tailcall would be present within the BPF subprogram. This test, broken down to particular steps, would do: entry -> set tailcall counter to 0, bump it by 1, tailcall to func0 func0 -> call subprog_tail (we are NOT skipping the first 11 bytes of prologue and this subprogram has a tailcall, therefore we clear the counter...) subprog -> do the same thing as entry and then loop forever. To address this, the idea is to go through the call chain of bpf2bpf progs and look for a tailcall presence throughout whole chain. If we saw a single tail call then each node in this call chain needs to be marked as a subprog that can reach the tailcall. We would later feed the JIT with this info and: - set eax to 0 only when tailcall is reachable and this is the entry prog - if tailcall is reachable but there's no tailcall in insns of currently JITed prog then push rax anyway, so that it will be possible to propagate further down the call chain - finally if tailcall is reachable, then we need to precede the 'call' insn with mov rax, [rbp - (stack_depth + 8)] Tail call related cases from test_verifier kselftest are also working fine. Sample BPF programs that utilize tail calls (sockex3, tracex5) work properly as well. [1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/ Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
cf71b174d3 |
bpf: rename poke descriptor's 'ip' member to 'tailcall_target'
Reflect the actual purpose of poke->ip and rename it to poke->tailcall_target so that it will not the be confused with another poke target that will be introduced in next commit. While at it, do the same thing with poke->ip_stable - rename it to poke->tailcall_target_stable. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
984fe94f94 |
bpf: Mutex protect used_maps array and count
To support modifying the used_maps array, we use a mutex to protect the use of the counter and the array. The mutex is initialized right after the prog aux is allocated, and destroyed right before prog aux is freed. This way we guarantee it's initialized for both cBPF and eBPF. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: YiFei Zhu <zhuyifei1999@gmail.com> Link: https://lore.kernel.org/bpf/20200915234543.3220146-2-sdf@google.com |
||
![]() |
00089c048e |
objtool: Rename frame.h -> objtool.h
Header frame.h is getting more code annotations to help objtool analyze object files. Rename the file to objtool.h. [ jpoimboe: add objtool.h to MAINTAINERS ] Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> |
||
![]() |
b8c1a30907 |
bpf: Delete repeated words in comments
Drop repeated words in kernel/bpf/: {has, the} Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200807033141.10437-1-rdunlap@infradead.org |
||
![]() |
7d9c342789 |
bpf: Make cgroup storages shared between programs on the same cgroup
This change comes in several parts: One, the restriction that the CGROUP_STORAGE map can only be used by one program is removed. This results in the removal of the field 'aux' in struct bpf_cgroup_storage_map, and removal of relevant code associated with the field, and removal of now-noop functions bpf_free_cgroup_storage and bpf_cgroup_storage_release. Second, we permit a key of type u64 as the key to the map. Providing such a key type indicates that the map should ignore attach type when comparing map keys. However, for simplicity newly linked storage will still have the attach type at link time in its key struct. cgroup_storage_check_btf is adapted to accept u64 as the type of the key. Third, because the storages are now shared, the storages cannot be unconditionally freed on program detach. There could be two ways to solve this issue: * A. Reference count the usage of the storages, and free when the last program is detached. * B. Free only when the storage is impossible to be referred to again, i.e. when either the cgroup_bpf it is attached to, or the map itself, is freed. Option A has the side effect that, when the user detach and reattach a program, whether the program gets a fresh storage depends on whether there is another program attached using that storage. This could trigger races if the user is multi-threaded, and since nondeterminism in data races is evil, go with option B. The both the map and the cgroup_bpf now tracks their associated storages, and the storage unlink and free are removed from cgroup_bpf_detach and added to cgroup_bpf_release and cgroup_storage_map_free. The latter also new holds the cgroup_mutex to prevent any races with the former. Fourth, on attach, we reuse the old storage if the key already exists in the map, via cgroup_storage_lookup. If the storage does not exist yet, we create a new one, and publish it at the last step in the attach process. This does not create a race condition because for the whole attach the cgroup_mutex is held. We keep track of an array of new storages that was allocated and if the process fails only the new storages would get freed. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com |
||
![]() |
ce3aa9cc51 |
bpf, netns: Handle multiple link attachments
Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the prog_array at given position when link gets attached/updated/released. This let's us lift the limit of having just one link attached for the new attach type introduced by subsequent patch. No functional changes intended. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com |
||
![]() |
cb8e59cc87 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ... |
||
![]() |
88dca4ca5a |
mm: remove the pgprot argument to __vmalloc
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
![]() |
0142dddcbe |
bpf: Fix spelling in comment explaining ARG1 in ___bpf_prog_run
Change 'handeled' to 'handled'. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200525230025.14470-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
2c78ee898d |
bpf: Implement CAP_BPF
Implement permissions as stated in uapi/linux/capability.h In order to do that the verifier allow_ptr_leaks flag is split into four flags and they are set as: env->allow_ptr_leaks = bpf_allow_ptr_leaks(); env->bypass_spec_v1 = bpf_bypass_spec_v1(); env->bypass_spec_v4 = bpf_bypass_spec_v4(); env->bpf_capable = bpf_capable(); The first three currently equivalent to perfmon_capable(), since leaking kernel pointers and reading kernel memory via side channel attacks is roughly equivalent to reading kernel memory with cap_perfmon. 'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions, subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the verifier, run time mitigations in bpf array, and enables indirect variable access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code by the verifier. That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will have speculative checks done by the verifier and other spectre mitigation applied. Such networking BPF program will not be able to leak kernel pointers and will not be able to access arbitrary kernel memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com |
||
![]() |
6b0b0fa2bc |
crypto: lib/sha1 - rename "sha" to "sha1"
The library implementation of the SHA-1 compression function is confusingly called just "sha_transform()". Alongside it are some "SHA_" constants and "sha_init()". Presumably these are left over from a time when SHA just meant SHA-1. But now there are also SHA-2 and SHA-3, and moreover SHA-1 is now considered insecure and thus shouldn't be used. Therefore, rename these functions and constants to make it very clear that they are for SHA-1. Also add a comment to make it clear that these shouldn't be used. For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that it matches the same definition in <crypto/sha.h>. This prepares for merging <linux/cryptohash.h> into <crypto/sha.h>. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> |
||
![]() |
71d1921477 |
bpf: add bpf_ktime_get_boot_ns()
On a device like a cellphone which is constantly suspending and resuming CLOCK_MONOTONIC is not particularly useful for keeping track of or reacting to external network events. Instead you want to use CLOCK_BOOTTIME. Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns() based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
6890896bd7 |
bpf: Fix missing bpf_base_func_proto in cgroup_base_func_proto for CGROUP_NET=n
linux-next build bot reported compile issue [1] with one of its
configs. It looks like when we have CONFIG_NET=n and
CONFIG_BPF{,_SYSCALL}=y, we are missing the bpf_base_func_proto
definition (from net/core/filter.c) in cgroup_base_func_proto.
I'm reshuffling the code a bit to make it work. The common helpers
are moved into kernel/bpf/helpers.c and the bpf_base_func_proto is
exported from there.
Also, bpf_get_raw_cpu_id goes into kernel/bpf/core.c akin to existing
bpf_user_rnd_u32.
[1] https://lore.kernel.org/linux-next/CAKH8qBsBvKHswiX1nx40LgO+BGeTmb1NX8tiTttt_0uu6T3dCA@mail.gmail.com/T/#mff8b0c083314c68c2e2ef0211cb11bc20dc13c72
Fixes:
|
||
![]() |
0f09abd105 |
bpf: Enable bpf cgroup hooks to retrieve cgroup v2 and ancestor id
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(),
recvmsg() and bind-related hooks in order to retrieve the cgroup v2
context which can then be used as part of the key for BPF map lookups,
for example. Given these hooks operate in process context 'current' is
always valid and pointing to the app that is performing mentioned
syscalls if it's subject to a v2 cgroup. Also with same motivation of
commit
|
||
![]() |
dba122fb5e |
bpf: Add bpf_ksym_add/del functions
Separating /proc/kallsyms add/del code and adding bpf_ksym_add/del functions for that. Moving bpf_prog_ksym_node_add/del functions to __bpf_ksym_add/del and changing their argument to 'struct bpf_ksym' object. This way we can call them for other bpf objects types like trampoline and dispatcher. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200312195610.346362-10-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
cbd76f8d5a |
bpf: Add prog flag to struct bpf_ksym object
Adding 'prog' bool flag to 'struct bpf_ksym' to mark that this object belongs to bpf_prog object. This change allows having bpf_prog objects together with other types (trampolines and dispatchers) in the single bpf_tree. It's used when searching for bpf_prog exception tables by the bpf_prog_ksym_find function, where we need to get the bpf_prog pointer. >From now we can safely add bpf_ksym support for trampoline or dispatcher objects, because we can differentiate them from bpf_prog objects. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200312195610.346362-9-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
eda0c92902 |
bpf: Add bpf_ksym_find function
Adding bpf_ksym_find function that is used bpf bpf address lookup functions: __bpf_address_lookup is_bpf_text_address while keeping bpf_prog_kallsyms_find to be used only for lookup of bpf_prog objects (will happen in following changes). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200312195610.346362-8-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
ca4424c920 |
bpf: Move ksym_tnode to bpf_ksym
Moving ksym_tnode list node to 'struct bpf_ksym' object, so the symbol itself can be chained and used in other objects like bpf_trampoline and bpf_dispatcher. We need bpf_ksym object to be linked both in bpf_kallsyms via lnode for /proc/kallsyms and in bpf_tree via tnode for bpf address lookup functions like __bpf_address_lookup or bpf_prog_kallsyms_find. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200312195610.346362-7-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
ecb60d1c67 |
bpf: Move lnode list node to struct bpf_ksym
Adding lnode list node to 'struct bpf_ksym' object, so the struct bpf_ksym itself can be chained and used in other objects like bpf_trampoline and bpf_dispatcher. Changing iterator to bpf_ksym in bpf_get_kallsym function. The ksym->start is holding the prog->bpf_func value, so it's ok to use it as value in bpf_get_kallsym. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
bfea9a8574 |
bpf: Add name to struct bpf_ksym
Adding name to 'struct bpf_ksym' object to carry the name of the symbol for bpf_prog, bpf_trampoline, bpf_dispatcher objects. The current benefit is that name is now generated only when the symbol is added to the list, so we don't need to generate it every time it's accessed. The future benefit is that we will have all the bpf objects symbols represented by struct bpf_ksym. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
535911c80a |
bpf: Add struct bpf_ksym
Adding 'struct bpf_ksym' object that will carry the kallsym information for bpf symbol. Adding the start and end address to begin with. It will be used by bpf_prog, bpf_trampoline, bpf_dispatcher objects. The symbol_start/symbol_end values were originally used to sort bpf_prog objects. For the address displayed in /proc/kallsyms we are using prog->bpf_func value. I'm using the bpf_func value for program symbol start instead of the symbol_start, because it makes no difference for sorting bpf_prog objects and we can use it directly as an address to display it in /proc/kallsyms. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
![]() |
b4490c5c4e |
bpf: Added new helper bpf_get_ns_current_pid_tgid
New bpf helper bpf_get_ns_current_pid_tgid, This helper will return pid and tgid from current task which namespace matches dev_t and inode number provided, this will allows us to instrument a process inside a container. Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com |
||
![]() |
5576b991e9 |
bpf: Add BPF_FUNC_jiffies64
This patch adds a helper to read the 64bit jiffies. It will be used in a later patch to implement the bpf_cubic.c. The helper is inlined for jit_requested and 64 BITS_PER_LONG as the map_gen_lookup(). Other cases could be considered together with map_gen_lookup() if needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com |
||
![]() |
2bbc078f81 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-12-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 127 non-merge commits during the last 17 day(s) which contain a total of 110 files changed, 6901 insertions(+), 2721 deletions(-). There are three merge conflicts. Conflicts and resolution looks as follows: 1) Merge conflict in net/bpf/test_run.c: There was a tree-wide cleanup |
||
![]() |
5bf2fc1f9c |
bpf: Remove unnecessary assertion on fp_old
The two callers of bpf_prog_realloc - bpf_patch_insn_single and bpf_migrate_filter dereference the struct fp_old, before passing it to the function. Thus assertion to check fp_old is unnecessary and can be removed. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219175735.19231-1-pakki001@umn.edu |
||
![]() |
e47304232b |
bpf: Fix cgroup local storage prog tracking
Recently noticed that we're tracking programs related to local storage maps
through their prog pointer. This is a wrong assumption since the prog pointer
can still change throughout the verification process, for example, whenever
bpf_patch_insn_single() is called.
Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
and the map would therefore remain in busy state forever. Fix this by using
the prog's aux pointer which is stable throughout verification and beyond.
Fixes:
|
||
![]() |
a2ea07465c |
bpf: Fix missing prog untrack in release_maps
Commit |
||
![]() |
81c22041d9 |
bpf, x86, arm64: Enable jit by default when not built as always-on
After Spectre 2 fix via |
||
![]() |
da765a2f59 |
bpf: Add poke dependency tracking for prog array maps
This work adds program tracking to prog array maps. This is needed such that upon prog array updates/deletions we can fix up all programs which make use of this tail call map. We add ops->map_poke_{un,}track() helpers to maps to maintain the list of programs and ops->map_poke_run() for triggering the actual update. bpf_array_aux is extended to contain the list head and poke_mutex in order to serialize program patching during updates/deletions. bpf_free_used_maps() will untrack the program shortly before dropping the reference to the map. For clearing out the prog array once all urefs are dropped we need to use schedule_work() to have a sleepable context. The prog_array_map_poke_run() is triggered during updates/deletions and walks the maintained prog list. It checks in their poke_tabs whether the map and key is matching and runs the actual bpf_arch_text_poke() for patching in the nop or new jmp location. Depending on the type of update, we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1fb364bb3c565b3e415d5ea348f036ff379e779d.1574452833.git.daniel@iogearbox.net |
||
![]() |
a66886fe6c |
bpf: Add initial poke descriptor table for jit images
Add initial poke table data structures and management to the BPF prog that can later be used by JITs. Also add an instance of poke specific data for tail call maps; plan for later work is to extend this also for BPF static keys. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1db285ec2ea4207ee0455b3f8e191a4fc58b9ade.1574452833.git.daniel@iogearbox.net |
||
![]() |
2beee5f574 |
bpf: Move owner type, jited info into array auxiliary data
We're going to extend this with further information which is only relevant for prog array at this point. Given this info is not used in critical path, move it into its own structure such that the main array map structure can be kept on diet. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/b9ddccdb0f6f7026489ee955f16c96381e1e7238.1574452833.git.daniel@iogearbox.net |
||
![]() |
6332be04c0 |
bpf: Move bpf_free_used_maps into sleepable section
We later on are going to need a sleepable context as opposed to plain RCU callback in order to untrack programs we need to poke at runtime and tracking as well as image update is performed under mutex. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/09823b1d5262876e9b83a8e75df04cf0467357a4.1574452833.git.daniel@iogearbox.net |