android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Greg Kroah-Hartman	2300418cc6	Merge 5.10.65 into android12-5.10-lts Changes in 5.10.65 locking/mutex: Fix HANDOFF condition regmap: fix the offset of register error log regulator: tps65910: Silence deferred probe error crypto: mxs-dcp - Check for DMA mapping errors sched/deadline: Fix reset_on_fork reporting of DL tasks power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop() sched/deadline: Fix missing clock update in migrate_task_rq_dl() rcu/tree: Handle VM stoppage in stall detection EDAC/mce_amd: Do not load edac_mce_amd module on guests posix-cpu-timers: Force next expiration recalc after itimer reset hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns() hrtimer: Ensure timerfd notification for HIGHRES=n udf: Check LVID earlier udf: Fix iocharset=utf8 mount option isofs: joliet: Fix iocharset=utf8 mount option bcache: add proper error unwinding in bcache_device_init blk-throtl: optimize IOPS throttle for large IO scenarios nvme-tcp: don't update queue count when failing to set io queues nvme-rdma: don't update queue count when failing to set io queues nvmet: pass back cntlid on successful completion power: supply: smb347-charger: Add missing pin control activation power: supply: max17042_battery: fix typo in MAx17042_TOFF s390/cio: add dev_busid sysfs entry for each subchannel s390/zcrypt: fix wrong offset index for APKA master key valid state libata: fix ata_host_start() crypto: omap - Fix inconsistent locking of device lists crypto: qat - do not ignore errors from enable_vf2pf_comms() crypto: qat - handle both source of interrupt in VF ISR crypto: qat - fix reuse of completion variable crypto: qat - fix naming for init/shutdown VF to PF notifications crypto: qat - do not export adf_iov_putmsg() fcntl: fix potential deadlock for &fasync_struct.fa_lock udf_get_extendedattr() had no boundary checks. s390/kasan: fix large PMD pages address alignment check s390/pci: fix misleading rc in clp_set_pci_fn() s390/debug: keep debug data on resize s390/debug: fix debug area life cycle s390/ap: fix state machine hang after failure to enable irq power: supply: cw2015: use dev_err_probe to allow deferred probe m68k: emu: Fix invalid free in nfeth_cleanup() sched/numa: Fix is_core_idle() sched: Fix UCLAMP_FLAG_IDLE setting rcu: Fix to include first blocked task in stall warning rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock m68k: Fix invalid RMW_INSNS on CPUs that lack CAS block: return ELEVATOR_DISCARD_MERGE if possible spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config spi: spi-pic32: Fix issue with uninitialized dma_slave_config genirq/timings: Fix error return code in irq_timings_test_irqs() irqchip/loongson-pch-pic: Improve edge triggered interrupt support lib/mpi: use kcalloc in mpi_resize clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel block: nbd: add sanity check for first_minor spi: coldfire-qspi: Use clk_disable_unprepare in the remove function irqchip/gic-v3: Fix priority comparison when non-secure priorities are used crypto: qat - use proper type for vf_mask certs: Trigger creation of RSA module signing key if it's not an RSA key tpm: ibmvtpm: Avoid error message when process gets signal while waiting x86/mce: Defer processing of early errors spi: davinci: invoke chipselect callback blk-crypto: fix check for too-large dun_bytes regulator: vctrl: Use locked regulator_get_voltage in probe path regulator: vctrl: Avoid lockdep warning in enable/disable ops spi: sprd: Fix the wrong WDG_LOAD_VAL spi: spi-zynq-qspi: use wait_for_completion_timeout to make zynq_qspi_exec_mem_op not interruptible EDAC/i10nm: Fix NVDIMM detection drm/panfrost: Fix missing clk_disable_unprepare() on error in panfrost_clk_init() drm/gma500: Fix end of loop tests for list_for_each_entry ASoC: mediatek: mt8183: Fix Unbalanced pm_runtime_enable in mt8183_afe_pcm_dev_probe media: TDA1997x: enable EDID support leds: is31fl32xx: Fix missing error code in is31fl32xx_parse_dt() soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally media: cxd2880-spi: Fix an error handling path drm/of: free the right object bpf: Fix a typo of reuseport map in bpf.h. bpf: Fix potential memleak and UAF in the verifier. drm/of: free the iterator object on failure gve: fix the wrong AdminQ buffer overflow check libbpf: Fix the possible memory leak on error ARM: dts: aspeed-g6: Fix HVI3C function-group in pinctrl dtsi arm64: dts: renesas: r8a77995: draak: Remove bogus adv7511w properties i40e: improve locking of mac_filter_hash soc: qcom: rpmhpd: Use corner in power_off libbpf: Fix removal of inner map in bpf_object__create_map gfs2: Fix memory leak of object lsi on error return path firmware: fix theoretical UAF race with firmware cache and resume driver core: Fix error return code in really_probe() ionic: cleanly release devlink instance media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init media: dvb-usb: fix uninit-value in vp702x_read_mac_addr media: dvb-usb: Fix error handling in dvb_usb_i2c_init media: go7007: fix memory leak in go7007_usb_probe media: go7007: remove redundant initialization media: rockchip/rga: use pm_runtime_resume_and_get() media: rockchip/rga: fix error handling in probe media: coda: fix frame_mem_ctrl for YUV420 and YVU420 formats media: atomisp: fix the uninitialized use and rename "retvalue" Bluetooth: sco: prevent information leak in sco_conn_defer_accept() 6lowpan: iphc: Fix an off-by-one check of array index drm/amdgpu/acp: Make PM domain really work tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos ARM: dts: meson8: Use a higher default GPU clock frequency ARM: dts: meson8b: odroidc1: Fix the pwm regulator supply properties ARM: dts: meson8b: mxq: Fix the pwm regulator supply properties ARM: dts: meson8b: ec100: Fix the pwm regulator supply properties net/mlx5e: Prohibit inner indir TIRs in IPoIB net/mlx5e: Block LRO if firmware asks for tunneled LRO cgroup/cpuset: Fix a partition bug with hotplug drm: mxsfb: Enable recovery on underflow drm: mxsfb: Increase number of outstanding requests on V4 and newer HW drm: mxsfb: Clear FIFO_CLEAR bit net: cipso: fix warnings in netlbl_cipsov4_add_std Bluetooth: mgmt: Fix wrong opcode in the response for add_adv cmd arm64: dts: renesas: rzg2: Convert EtherAVB to explicit delay handling arm64: dts: renesas: hihope-rzg2-ex: Add EtherAVB internal rx delay devlink: Break parameter notification sequence to be before/after unload/load driver net/mlx5: Fix missing return value in mlx5_devlink_eswitch_inline_mode_set() i2c: highlander: add IRQ check leds: lt3593: Put fwnode in any case during ->probe() leds: trigger: audio: Add an activate callback to ensure the initial brightness is set media: em28xx-input: fix refcount bug in em28xx_usb_disconnect media: venus: venc: Fix potential null pointer dereference on pointer fmt PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently PCI: PM: Enable PME if it can be signaled from D3cold bpf, samples: Add missing mprog-disable to xdp_redirect_cpu's optstring soc: qcom: smsm: Fix missed interrupts if state changes while masked debugfs: Return error during {full/open}_proxy_open() on rmmod Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow PM: EM: Increase energy calculation precision selftests/bpf: Fix bpf-iter-tcp4 test to print correctly the dest IP drm/msm/mdp4: refactor HW revision detection into read_mdp_hw_revision drm/msm/mdp4: move HW revision detection to earlier phase drm/msm/dpu: make dpu_hw_ctl_clear_all_blendstages clear necessary LMs arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7 counter: 104-quad-8: Return error when invalid mode during ceiling_write cgroup/cpuset: Miscellaneous code cleanup cgroup/cpuset: Fix violation of cpuset locking rule ASoC: Intel: Fix platform ID matching Bluetooth: fix repeated calls to sco_sock_kill drm/msm/dsi: Fix some reference counted resource leaks net/mlx5: Register to devlink ingress VLAN filter trap net/mlx5: Fix unpublish devlink parameters ASoC: rt5682: Implement remove callback ASoC: rt5682: Properly turn off regulators if wrong device ID usb: dwc3: meson-g12a: add IRQ check usb: dwc3: qcom: add IRQ check usb: gadget: udc: at91: add IRQ check usb: gadget: udc: s3c2410: add IRQ check usb: phy: fsl-usb: add IRQ check usb: phy: twl6030: add IRQ checks usb: gadget: udc: renesas_usb3: Fix soc_device_match() abuse selftests/bpf: Fix test_core_autosize on big-endian machines devlink: Clear whole devlink_flash_notify struct samples: pktgen: add missing IPv6 option to pktgen scripts Bluetooth: Move shutdown callback before flushing tx and rx queue PM: cpu: Make notifier chain use a raw_spinlock_t usb: host: ohci-tmio: add IRQ check usb: phy: tahvo: add IRQ check libbpf: Re-build libbpf.so when libbpf.map changes mac80211: Fix insufficient headroom issue for AMSDU locking/lockdep: Mark local_lock_t locking/local_lock: Add missing owner initialization lockd: Fix invalid lockowner cast after vfs_test_lock nfsd4: Fix forced-expiry locking arm64: dts: marvell: armada-37xx: Extend PCIe MEM space clk: staging: correct reference to config IOMEM to config HAS_IOMEM i2c: synquacer: fix deferred probing firmware: raspberrypi: Keep count of all consumers firmware: raspberrypi: Fix a leak in 'rpi_firmware_get()' usb: gadget: mv_u3d: request_irq() after initializing UDC mm/swap: consider max pages in iomap_swapfile_add_extent lkdtm: replace SCSI_DISPATCH_CMD with SCSI_QUEUE_RQ Bluetooth: add timeout sanity check to hci_inquiry i2c: iop3xx: fix deferred probing i2c: s3c2410: fix IRQ check i2c: fix platform_get_irq.cocci warnings i2c: hix5hd2: fix IRQ check gfs2: init system threads before freeze lock rsi: fix error code in rsi_load_9116_firmware() rsi: fix an error code in rsi_probe() ASoC: Intel: kbl_da7219_max98927: Fix format selection for max98373 ASoC: Intel: Skylake: Leave data as is when invoking TLV IPCs ASoC: Intel: Skylake: Fix module resource and format selection mmc: sdhci: Fix issue with uninitialized dma_slave_config mmc: dw_mmc: Fix issue with uninitialized dma_slave_config mmc: moxart: Fix issue with uninitialized dma_slave_config bpf: Fix possible out of bound write in narrow load handling CIFS: Fix a potencially linear read overflow i2c: mt65xx: fix IRQ check i2c: xlp9xx: fix main IRQ check usb: ehci-orion: Handle errors of clk_prepare_enable() in probe usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available usb: bdc: Fix a resource leak in the error handling path of 'bdc_probe()' tty: serial: fsl_lpuart: fix the wrong mapbase value ASoC: wcd9335: Fix a double irq free in the remove function ASoC: wcd9335: Fix a memory leak in the error handling path of the probe function ASoC: wcd9335: Disable irq on slave ports in the remove function iwlwifi: follow the new inclusive terminology iwlwifi: skip first element in the WTAS ACPI table ice: Only lock to update netdev dev_addr ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point() atlantic: Fix driver resume flow. bcma: Fix memory leak for internally-handled cores brcmfmac: pcie: fix oops on failure to resume and reprobe ipv6: make exception cache less predictible ipv4: make exception cache less predictible net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed net: qualcomm: fix QCA7000 checksum handling octeontx2-af: Fix loop in free and unmap counter octeontx2-af: Fix static code analyzer reported issues octeontx2-af: Set proper errorcode for IPv4 checksum errors ipv4: fix endianness issue in inet_rtm_getroute_build_skb() ASoC: rt5682: Remove unused variable in rt5682_i2c_remove() iwlwifi Add support for ax201 in Samsung Galaxy Book Flex2 Alpha f2fs: guarantee to write dirty data when enabling checkpoint back time: Handle negative seconds correctly in timespec64_to_ns() io_uring: IORING_OP_WRITE needs hash_reg_file set bio: fix page leak bio_add_hw_page failure tty: Fix data race between tiocsti() and flush_to_ldisc() perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op x86/resctrl: Fix a maybe-uninitialized build warning treated as error Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter ARM: dts: at91: add pinctrl-{names, 0} for all gpios fuse: truncate pagecache on atomic_o_trunc fuse: flush extending writes IMA: remove -Wmissing-prototypes warning IMA: remove the dependency on CRYPTO_MD5 fbmem: don't allow too huge resolutions backlight: pwm_bl: Improve bootloader/kernel device handover clk: kirkwood: Fix a clocking boot regression Linux 5.10.65 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie0b9306ba6ee4193de3200df7cdacaeba152b83e	2021-09-15 14:16:47 +02:00
Andrey Ignatov	b0491ab7d4	bpf: Fix possible out of bound write in narrow load handling [ Upstream commit d7af7e497f0308bc97809cc48b58e8e0f13887e1 ] Fix a verifier bug found by smatch static checker in [0]. This problem has never been seen in prod to my best knowledge. Fixing it still seems to be a good idea since it's hard to say for sure whether it's possible or not to have a scenario where a combination of convert_ctx_access() and a narrow load would lead to an out of bound write. When narrow load is handled, one or two new instructions are added to insn_buf array, but before it was only checked that cnt >= ARRAY_SIZE(insn_buf) And it's safe to add a new instruction to insn_buf[cnt++] only once. The second try will lead to out of bound write. And this is what can happen if `shift` is set. Fix it by making sure that if the BPF_RSH instruction has to be added in addition to BPF_AND then there is enough space for two more instructions in insn_buf. The full report [0] is below: kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c 12282 12283 insn->off = off & ~(size_default - 1); 12284 insn->code = BPF_LDX \| BPF_MEM \| size_code; 12285 } 12286 12287 target_size = 0; 12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog, 12289 &target_size); 12290 if (cnt == 0 \|\| cnt >= ARRAY_SIZE(insn_buf) \|\| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bounds check. 12291 (ctx_field_size && !target_size)) { 12292 verbose(env, "bpf verifier is misconfigured\n"); 12293 return -EINVAL; 12294 } 12295 12296 if (is_narrower_load && size < target_size) { 12297 u8 shift = bpf_ctx_narrow_access_offset( 12298 off, size, size_default) * 8; 12299 if (ctx_field_size <= 4) { 12300 if (shift) 12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, ^^^^^ increment beyond end of array 12302 insn->dst_reg, 12303 shift); --> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg, ^^^^^ out of bounds write 12305 (1 << size * 8) - 1); 12306 } else { 12307 if (shift) 12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH, 12309 insn->dst_reg, 12310 shift); 12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg, ^^^^^^^^^^^^^^^ Same. 12312 (1ULL << size * 8) - 1); 12313 } 12314 } 12315 12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 12317 if (!new_prog) 12318 return -ENOMEM; 12319 12320 delta += cnt - 1; 12321 12322 /* keep walking new program and skip insns we just inserted */ 12323 env->prog = new_prog; 12324 insn = new_prog->insnsi + i + delta; 12325 } 12326 12327 return 0; 12328 } [0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/ v1->v2: - clarify that problem was only seen by static checker but not in prod; Fixes: `46f53a65d2` ("bpf: Allow narrow loads with offset > 0") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-09-15 09:50:43 +02:00
He Fengqing	389dfd1147	bpf: Fix potential memleak and UAF in the verifier. [ Upstream commit 75f0fc7b48ad45a2e5736bcf8de26c8872fe8695 ] In bpf_patch_insn_data(), we first use the bpf_patch_insn_single() to insert new instructions, then use adjust_insn_aux_data() to adjust insn_aux_data. If the old env->prog have no enough room for new inserted instructions, we use bpf_prog_realloc to construct new_prog and free the old env->prog. There have two errors here. First, if adjust_insn_aux_data() return ENOMEM, we should free the new_prog. Second, if adjust_insn_aux_data() return ENOMEM, bpf_patch_insn_data() will return NULL, and env->prog has been freed in bpf_prog_realloc, but we will use it in bpf_check(). So in this patch, we make the adjust_insn_aux_data() never fails. In bpf_patch_insn_data(), we first pre-malloc memory for the new insn_aux_data, then call bpf_patch_insn_single() to insert new instructions, at last call adjust_insn_aux_data() to adjust insn_aux_data. Fixes: `8041902dae` ("bpf: adjust insn_aux_data when patching insns") Signed-off-by: He Fengqing <hefengqing@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210714101815.164322-1-hefengqing@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-09-15 09:50:31 +02:00
Greg Kroah-Hartman	674d2ac211	Merge 5.10.62 into android12-5.10-lts Changes in 5.10.62 net: qrtr: fix another OOB Read in qrtr_endpoint_post bpf: Fix ringbuf helper function compatibility bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper ASoC: rt5682: Adjust headset volume button threshold ASoC: component: Remove misplaced prefix handling in pin control functions ARC: Fix CONFIG_STACKDEPOT netfilter: conntrack: collect all entries in one cycle once: Fix panic when module unload blk-iocost: fix lockdep warning on blkcg->lock ovl: fix uninitialized pointer read in ovl_lookup_real_one() net: mscc: Fix non-GPL export of regmap APIs can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters ceph: correctly handle releasing an embedded cap flush riscv: Ensure the value of FP registers in the core dump file is up to date Revert "btrfs: compression: don't try to compress if we don't have enough pages" drm/amdgpu: Cancel delayed work when GFXOFF is disabled Revert "USB: serial: ch341: fix character loss at high transfer rates" USB: serial: option: add new VID/PID to support Fibocom FG150 usb: renesas-xhci: Prefer firmware loading on unknown ROM state usb: dwc3: gadget: Fix dwc3_calc_trbs_left() usb: dwc3: gadget: Stop EP0 transfers during pullup disable scsi: core: Fix hang of freezing queue between blocking and running device RDMA/bnxt_re: Add missing spin lock initialization IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs() RDMA/bnxt_re: Remove unpaired rtnl unlock in bnxt_re_dev_init() ice: do not abort devlink info if board identifier can't be found net: usb: pegasus: fixes of set_register(s) return value evaluation; igc: fix page fault when thunderbolt is unplugged igc: Use num_tx_queues when iterating over tx_ring queue e1000e: Fix the max snoop/no-snoop latency for 10M e1000e: Do not take care about recovery NVM checksum RDMA/efa: Free IRQ vectors on error flow ip_gre: add validation for csum_start xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()' net: marvell: fix MVNETA_TX_IN_PRGRS bit number ucounts: Increase ucounts reference counter before the security hook net/sched: ets: fix crash when flipping from 'strict' to 'quantum' ipv6: use siphash in rt6_exception_hash() ipv4: use siphash instead of Jenkins in fnhe_hashfun() cxgb4: dont touch blocked freelist bitmap after free rtnetlink: Return correct error on changing device netns net: hns3: clear hardware resource when loading driver net: hns3: add waiting time before cmdq memory is released net: hns3: fix duplicate node in VLAN list net: hns3: fix get wrong pfc_en when query PFC configuration Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711" net: stmmac: add mutex lock to protect est parameters net: stmmac: fix kernel panic due to NULL pointer dereference of plat->est drm/i915: Fix syncmap memory leak usb: gadget: u_audio: fix race condition on endpoint stop dt-bindings: sifive-l2-cache: Fix 'select' matching perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32 clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference iwlwifi: pnvm: accept multiple HW-type TLVs opp: remove WARN when no valid OPPs remain cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev virtio: Improve vq->broken access to avoid any compiler optimization virtio_pci: Support surprise removal of virtio pci device virtio_vdpa: reject invalid vq indices vringh: Use wiov->used to check for read/write desc order tools/virtio: fix build qed: qed ll2 race condition fixes qed: Fix null-pointer dereference in qed_rdma_create_qp() Revert "drm/amd/pm: fix workload mismatch on vega10" drm/amd/pm: change the workload type for some cards blk-mq: don't grab rq's refcount in blk_mq_check_expired() drm: Copy drm_wait_vblank to user before returning drm/nouveau/disp: power down unused DP links during init drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences net/rds: dma_map_sg is entitled to merge entries btrfs: fix race between marking inode needs to be logged and log syncing pipe: avoid unnecessary EPOLLET wakeups under normal loads pipe: do FASYNC notifications for every pipe IO, not just state changes mtd: spinand: Fix incorrect parameters for on-die ECC tipc: call tipc_wait_for_connect only when dlen is not 0 vt_kdsetmode: extend console locking Bluetooth: btusb: check conditions before enabling USB ALT 3 for WBS riscv: Fixup wrong ftrace remove cflag riscv: Fixup patch_text panic in ftrace perf env: Fix memory leak of bpf_prog_info_linear member perf symbol-elf: Fix memory leak by freeing sdt_note.args perf record: Fix memory leak in vDSO found using ASAN perf tools: Fix arm64 build error with gcc-11 perf annotate: Fix jump parsing for C++ code. powerpc/perf: Invoke per-CPU variable access with disabled interrupts srcu: Provide internal interface to start a Tree SRCU grace period srcu: Provide polling interfaces for Tree SRCU grace periods srcu: Provide internal interface to start a Tiny SRCU grace period srcu: Make Tiny SRCU use multi-bit grace-period counter srcu: Provide polling interfaces for Tiny SRCU grace periods tracepoint: Use rcu get state and cond sync for static call updates usb: typec: ucsi: acpi: Always decode connector change information usb: typec: ucsi: Work around PPM losing change information usb: typec: ucsi: Clear pending after acking connector change net: dsa: mt7530: fix VLAN traffic leaks again lkdtm: Enable DOUBLE_FAULT on all architectures arm64: dts: qcom: msm8994-angler: Fix gpio-reserved-ranges 85-88 btrfs: fix NULL pointer dereference when deleting device by invalid id kthread: Fix PF_KTHREAD vs to_kthread() race Revert "floppy: reintroduce O_NDELAY fix" Revert "parisc: Add assembly implementations for memset, strlen, strcpy, strncpy and strcat" net: don't unconditionally copy_from_user a struct ifreq for socket ioctls audit: move put_tree() to avoid trim_trees refcount underflow and UAF bpf: Fix potentially incorrect results with bpf_get_local_storage() Linux 5.10.62 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5a9bf4b2c254ae21a10f838494cae1c3fa016be3	2021-09-03 10:51:56 +02:00
Daniel Borkmann	9dd6f6d896	bpf: Fix ringbuf helper function compatibility commit 5b029a32cfe4600f5e10e36b41778506b90fd4de upstream. Commit `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") extended check_map_func_compatibility() by enforcing map -> helper function match, but not helper -> map type match. Due to this all of the bpf_ringbuf_*() helper functions could be used with a wrong map type such as array or hash map, leading to invalid access due to type confusion. Also, both BPF_FUNC_ringbuf_{submit,discard} have ARG_PTR_TO_ALLOC_MEM as argument and not a BPF map. Therefore, their check_map_func_compatibility() presence is incorrect since it's only for map type checking. Fixes: `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Ryota Shiga (Flatt Security) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-03 10:09:21 +02:00
Greg Kroah-Hartman	a6777a7cee	Merge 5.10.61 into android12-5.10-lts Changes in 5.10.61 ath: Use safer key clearing with key cache entries ath9k: Clear key cache explicitly on disabling hardware ath: Export ath_hw_keysetmac() ath: Modify ath_key_delete() to not need full key entry ath9k: Postpone key cache entry deletion for TXQ frames reference it mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards media: zr364xx: propagate errors from zr364xx_start_readpipe() media: zr364xx: fix memory leaks in probe() media: drivers/media/usb: fix memory leak in zr364xx_probe KVM: x86: Factor out x86 instruction emulation with decoding KVM: X86: Fix warning caused by stale emulation context USB: core: Avoid WARNings for 0-length descriptor requests USB: core: Fix incorrect pipe calculation in do_proc_control() dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() spi: spi-mux: Add module info needed for autoloading net: xfrm: Fix end of loop tests for list_for_each_entry ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available scsi: pm80xx: Fix TMF task completion race condition scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry() scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach() scsi: core: Avoid printing an error if target_alloc() returns -ENXIO scsi: core: Fix capacity set to zero after offlinining device drm/amdgpu: fix the doorbell missing when in CGPG issue for renoir. qede: fix crash in rmmod qede while automatic debug collection ARM: dts: nomadik: Fix up interrupt controller node names net: usb: pegasus: Check the return value of get_geristers() and friends; net: usb: lan78xx: don't modify phy_device state concurrently drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X drm/amd/display: workaround for hard hang on HPD on native DP Bluetooth: hidp: use correct wait queue when removing ctrl_wait arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x arm64: dts: qcom: msm8992-bullhead: Remove PSCI iommu: Check if group is NULL before remove device cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant dccp: add do-while-0 stubs for dccp_pr_debug macros virtio: Protect vqs list access vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update() bus: ti-sysc: Fix error handling for sysc_check_active_timer() vhost: Fix the calculation in vhost_overflow() vdpa/mlx5: Avoid destroying MR on empty iotlb soc / drm: mediatek: Move DDP component defines into mtk-mmsys.h drm/mediatek: Fix aal size config drm/mediatek: Add AAL output size configuration bpf: Clear zext_dst of dead insns bnxt: don't lock the tx queue from napi poll bnxt: disable napi before canceling DIM bnxt: make sure xmit_more + errors does not miss doorbells bnxt: count Tx drops net: 6pack: fix slab-out-of-bounds in decode_data ptp_pch: Restore dependency on PCI bnxt_en: Disable aRFS if running on 212 firmware bnxt_en: Add missing DMA memory barriers vrf: Reset skb conntrack connection on VRF rcv virtio-net: support XDP when not more queues virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path sch_cake: fix srchost/dsthost hashing mode net: mdio-mux: Don't ignore memory allocation errors net: mdio-mux: Handle -EPROBE_DEFER correctly ovs: clear skb->tstamp in forwarding path iommu/vt-d: Consolidate duplicate cache invaliation code iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry() r8152: fix writing USB_BP2_EN i40e: Fix ATR queue selection iavf: Fix ping is lost after untrusted VF had tried to change MAC Revert "flow_offload: action should not be NULL when it is referenced" mmc: dw_mmc: Fix hang on data CRC error mmc: mmci: stm32: Check when the voltage switch procedure should be done mmc: sdhci-msm: Update the software timeout value for sdhc clk: imx6q: fix uart earlycon unwork clk: qcom: gdsc: Ensure regulator init state matches GDSC state ALSA: hda - fix the 'Capture Switch' value change notifications tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name slimbus: messaging: start transaction ids from 1 instead of zero slimbus: messaging: check for valid transaction id slimbus: ngd: reset dma setup during runtime pm ipack: tpci200: fix many double free issues in tpci200_pci_probe ipack: tpci200: fix memory leak in the tpci200_register ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop mmc: sdhci-iproc: Cap min clock frequency on BCM2711 mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711 btrfs: prevent rename2 from exchanging a subvol with a directory from different parents ALSA: hda/via: Apply runtime PM workaround for ASUS B23E s390/pci: fix use after free of zpci_dev PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8 ASoC: intel: atom: Fix breakage for PCM buffer address setup mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim fs: warn about impending deprecation of mandatory locks io_uring: fix xa_alloc_cycle() error return value check io_uring: only assign io_uring_enter() SQPOLL error in actual error case Linux 5.10.61 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5b6e2a66b03d1cb01c8310b83dcc2a119c1bd6b3	2021-08-27 20:51:37 +02:00
Ilya Leoshkevich	585ff7344e	bpf: Clear zext_dst of dead insns [ Upstream commit 45c709f8c71b525b51988e782febe84ce933e7e0 ] "access skb fields ok" verifier test fails on s390 with the "verifier bug. zext_dst is set, but no reg is defined" message. The first insns of the test prog are ... 0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0] 8: 35 00 00 01 00 00 00 00 jge %r0,0,1 10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8] ... and the 3rd one is dead (this does not look intentional to me, but this is a separate topic). sanitize_dead_code() converts dead insns into "ja -1", but keeps zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such an insn, it sees this discrepancy and bails. This problem can be seen only with JITs whose bpf_jit_needs_zext() returns true. Fix by clearning dead insns' zext_dst. The commits that contributed to this problem are: 1. `5aa5bd14c5` ("bpf: add initial suite for selftests"), which introduced the test with the dead code. 2. `5327ed3d44` ("bpf: verifier: mark verified-insn with sub-register zext flag"), which introduced the zext_dst flag. 3. 83a2881903f3 ("bpf: Account for BPF_FETCH in insn_has_def32()"), which introduced the sanity check. 4. 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"), which bisect points to. It's best to fix this on stable branches that contain the second one, since that's the point where the inconsistency was introduced. Fixes: `5327ed3d44` ("bpf: verifier: mark verified-insn with sub-register zext flag") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:35:43 -04:00
Greg Kroah-Hartman	8b444656fa	Merge 5.10.56 into android12-5.10-lts Changes in 5.10.56 selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c io_uring: fix null-ptr-deref in io_sq_offload_start() x86/asm: Ensure asm/proto.h can be included stand-alone pipe: make pipe writes always wake up readers btrfs: fix rw device counting in __btrfs_free_extra_devids btrfs: mark compressed range uptodate only if all bio succeed Revert "ACPI: resources: Add checks for ACPI IRQ override" ACPI: DPTF: Fix reading of attributes x86/kvm: fix vcpu-id indexed array sizes KVM: add missing compat KVM_CLEAR_DIRTY_LOG ocfs2: fix zero out valid data ocfs2: issue zeroout to EOF blocks can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values can: mcba_usb_start(): add missing urb->transfer_dma initialization can: usb_8dev: fix memory leak can: ems_usb: fix memory leak can: esd_usb2: fix memory leak alpha: register early reserved memory in memblock HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT NIU: fix incorrect error return, missed in previous revert drm/amd/display: ensure dentist display clock update finished in DCN20 drm/amdgpu: Avoid printing of stack contents on firmware load error drm/amdgpu: Fix resource leak on probe error path blk-iocost: fix operation ordering in iocg_wake_fn() nfc: nfcsim: fix use after free during module unload cfg80211: Fix possible memory leak in function cfg80211_bss_update RDMA/bnxt_re: Fix stats counters bpf: Fix OOB read when printing XDP link fdinfo mac80211: fix enabling 4-address mode on a sta vif after assoc netfilter: conntrack: adjust stop timestamp to real expiry value netfilter: nft_nat: allow to specify layer 4 protocol NAT only i40e: Fix logic of disabling queues i40e: Fix firmware LLDP agent related warning i40e: Fix queue-to-TC mapping on Tx i40e: Fix log TC creation failure when max num of queues is exceeded tipc: fix implicit-connect for SYN+ tipc: fix sleeping in tipc accept routine net: Set true network header for ECN decapsulation net: qrtr: fix memory leaks ionic: remove intr coalesce update from napi ionic: fix up dim accounting for tx and rx ionic: count csum_none when offload enabled tipc: do not write skb_shinfo frags when doing decrytion octeontx2-pf: Fix interface down flag on error mlx4: Fix missing error code in mlx4_load_one() KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access net: llc: fix skb_over_panic drm/msm/dpu: Fix sm8250_mdp register length drm/msm/dp: Initialize the INTF_CONFIG register skmsg: Make sk_psock_destroy() static net/mlx5: Fix flow table chaining net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() sctp: fix return value check in __sctp_rcv_asconf_lookup tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sis900: Fix missing pci_disable_device() in probe and remove can: hi311x: fix a signedness bug in hi3110_cmd() bpf: Introduce BPF nospec instruction for mitigating Spectre v4 bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Remove superfluous aux sanitation on subprog rejection bpf: verifier: Allocate idmap scratch in verifier env bpf: Fix pointer arithmetic mask tightening under state pruning SMB3: fix readpage for large swap cache powerpc/pseries: Fix regression while building external modules Revert "perf map: Fix dso->nsinfo refcounting" i40e: Add additional info to PHY type error can: j1939: j1939_session_deactivate(): clarify lifetime of session object Linux 5.10.56 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib3c9244afb7ee5d6ee8d3235efe8956898f486c4	2021-08-04 15:02:23 +02:00
Daniel Borkmann	be561c0154	bpf: Fix pointer arithmetic mask tightening under state pruning commit e042aa532c84d18ff13291d00620502ce7a38dda upstream. In 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask") we narrowed the offset mask for unprivileged pointer arithmetic in order to mitigate a corner case where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of- bounds in order to leak kernel memory via side-channel to user space. The verifier's state pruning for scalars leaves one corner case open where in the first verification path R_x holds an unknown scalar with an aux->alu_limit of e.g. 7, and in a second verification path that same register R_x, here denoted as R_x', holds an unknown scalar which has tighter bounds and would thus satisfy range_within(R_x, R_x') as well as tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3: Given the second path fits the register constraints for pruning, the final generated mask from aux->alu_limit will remain at 7. While technically not wrong for the non-speculative domain, it would however be possible to craft similar cases where the mask would be too wide as in 7fedb63a8307. One way to fix it is to detect the presence of unknown scalar map pointer arithmetic and force a deeper search on unknown scalars to ensure that we do not run into a masking mismatch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:45 +02:00
Lorenz Bauer	ffb9d5c48b	bpf: verifier: Allocate idmap scratch in verifier env commit c9e73e3d2b1eb1ea7ff068e05007eec3bd8ef1c9 upstream. func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:45 +02:00
Daniel Borkmann	a11ca29c65	bpf: Remove superfluous aux sanitation on subprog rejection commit 59089a189e3adde4cf85f2ce479738d1ae4c514d upstream. Follow-up to fe9a5ca7e370 ("bpf: Do not mark insn as seen under speculative path verification"). The sanitize_insn_aux_data() helper does not serve a particular purpose in today's code. The original intention for the helper was that if function-by-function verification fails, a given program would be cleared from temporary insn_aux_data[], and then its verification would be re-attempted in the context of the main program a second time. However, a failure in do_check_subprogs() will skip do_check_main() and propagate the error to the user instead, thus such situation can never occur. Given its interaction is not compatible to the Spectre v1 mitigation (due to comparing aux->seen with env->pass_cnt), just remove sanitize_insn_aux_data() to avoid future bugs in this area. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:44 +02:00
Daniel Borkmann	0e9280654a	bpf: Fix leakage due to insufficient speculative store bypass mitigation [ Upstream commit 2039f26f3aca5b0e419b98f65dd36481337b86ee ] Spectre v4 gadgets make use of memory disambiguation, which is a set of techniques that execute memory access instructions, that is, loads and stores, out of program order; Intel's optimization manual, section 2.4.4.5: A load instruction micro-op may depend on a preceding store. Many microarchitectures block loads until all preceding store addresses are known. The memory disambiguator predicts which loads will not depend on any previous stores. When the disambiguator predicts that a load does not have such a dependency, the load takes its data from the L1 data cache. Eventually, the prediction is verified. If an actual conflict is detected, the load and all succeeding instructions are re-executed. `af86ca4e30` ("bpf: Prevent memory disambiguation attack") tried to mitigate this attack by sanitizing the memory locations through preemptive "fast" (low latency) stores of zero prior to the actual "slow" (high latency) store of a pointer value such that upon dependency misprediction the CPU then speculatively executes the load of the pointer value and retrieves the zero value instead of the attacker controlled scalar value previously stored at that location, meaning, subsequent access in the speculative domain is then redirected to the "zero page". The sanitized preemptive store of zero prior to the actual "slow" store is done through a simple ST instruction based on r10 (frame pointer) with relative offset to the stack location that the verifier has been tracking on the original used register for STX, which does not have to be r10. Thus, there are no memory dependencies for this store, since it's only using r10 and immediate constant of zero; hence `af86ca4e30` /assumed/ a low latency operation. However, a recent attack demonstrated that this mitigation is not sufficient since the preemptive store of zero could also be turned into a "slow" store and is thus bypassed as well: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value 31: (7b) (u64 )(r10 -16) = r2 // r9 will remain "fast" register, r10 will become "slow" register below 32: (bf) r9 = r10 // JIT maps BPF reg to x86 reg: // r9 -> r15 (callee saved) // r10 -> rbp // train store forward prediction to break dependency link between both r9 // and r10 by evicting them from the predictor's LRU table. 33: (61) r0 = (u32 )(r7 +24576) 34: (63) (u32 )(r7 +29696) = r0 35: (61) r0 = (u32 )(r7 +24580) 36: (63) (u32 )(r7 +29700) = r0 37: (61) r0 = (u32 )(r7 +24584) 38: (63) (u32 )(r7 +29704) = r0 39: (61) r0 = (u32 )(r7 +24588) 40: (63) (u32 )(r7 +29708) = r0 [...] 543: (61) r0 = (u32 )(r7 +25596) 544: (63) (u32 )(r7 +30716) = r0 // prepare call to bpf_ringbuf_output() helper. the latter will cause rbp // to spill to stack memory while r13/r14/r15 (all callee saved regs) remain // in hardware registers. rbp becomes slow due to push/pop latency. below is // disasm of bpf_ringbuf_output() helper for better visual context: // // ffffffff8117ee20: 41 54 push r12 // ffffffff8117ee22: 55 push rbp // ffffffff8117ee23: 53 push rbx // ffffffff8117ee24: 48 f7 c1 fc ff ff ff test rcx,0xfffffffffffffffc // ffffffff8117ee2b: 0f 85 af 00 00 00 jne ffffffff8117eee0 <-- jump taken // [...] // ffffffff8117eee0: 49 c7 c4 ea ff ff ff mov r12,0xffffffffffffffea // ffffffff8117eee7: 5b pop rbx // ffffffff8117eee8: 5d pop rbp // ffffffff8117eee9: 4c 89 e0 mov rax,r12 // ffffffff8117eeec: 41 5c pop r12 // ffffffff8117eeee: c3 ret 545: (18) r1 = map[id:4] 547: (bf) r2 = r7 548: (b7) r3 = 0 549: (b7) r4 = 4 550: (85) call bpf_ringbuf_output#194288 // instruction 551 inserted by verifier \ 551: (7a) (u64 )(r10 -16) = 0 \| /both/ are now slow stores here // storing map value pointer r7 at fp-16 \| since value of r10 is "slow". 552: (7b) (u64 )(r10 -16) = r7 / // following "fast" read to the same memory location, but due to dependency // misprediction it will speculatively execute before insn 551/552 completes. 553: (79) r2 = (u64 )(r9 -16) // in speculative domain contains attacker controlled r2. in non-speculative // domain this contains r7, and thus accesses r7 +0 below. 554: (71) r3 = (u8 )(r2 +0) // leak r3 As can be seen, the current speculative store bypass mitigation which the verifier inserts at line 551 is insufficient since /both/, the write of the zero sanitation as well as the map value pointer are a high latency instruction due to prior memory access via push/pop of r10 (rbp) in contrast to the low latency read in line 553 as r9 (r15) which stays in hardware registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally, fp-16 can still be r2. Initial thoughts to address this issue was to track spilled pointer loads from stack and enforce their load via LDX through r10 as well so that /both/ the preemptive store of zero /as well as/ the load use the /same/ register such that a dependency is created between the store and load. However, this option is not sufficient either since it can be bypassed as well under speculation. An updated attack with pointer spill/fills now _all_ based on r10 would look as follows: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value [...] // longer store forward prediction training sequence than before. 2062: (61) r0 = (u32 )(r7 +25588) 2063: (63) (u32 )(r7 +30708) = r0 2064: (61) r0 = (u32 )(r7 +25592) 2065: (63) (u32 )(r7 +30712) = r0 2066: (61) r0 = (u32 )(r7 +25596) 2067: (63) (u32 )(r7 +30716) = r0 // store the speculative load address (scalar) this time after the store // forward prediction training. 2068: (7b) (u64 )(r10 -16) = r2 // preoccupy the CPU store port by running sequence of dummy stores. 2069: (63) (u32 )(r7 +29696) = r0 2070: (63) (u32 )(r7 +29700) = r0 2071: (63) (u32 )(r7 +29704) = r0 2072: (63) (u32 )(r7 +29708) = r0 2073: (63) (u32 )(r7 +29712) = r0 2074: (63) (u32 )(r7 +29716) = r0 2075: (63) (u32 )(r7 +29720) = r0 2076: (63) (u32 )(r7 +29724) = r0 2077: (63) (u32 )(r7 +29728) = r0 2078: (63) (u32 )(r7 +29732) = r0 2079: (63) (u32 )(r7 +29736) = r0 2080: (63) (u32 )(r7 +29740) = r0 2081: (63) (u32 )(r7 +29744) = r0 2082: (63) (u32 )(r7 +29748) = r0 2083: (63) (u32 )(r7 +29752) = r0 2084: (63) (u32 )(r7 +29756) = r0 2085: (63) (u32 )(r7 +29760) = r0 2086: (63) (u32 )(r7 +29764) = r0 2087: (63) (u32 )(r7 +29768) = r0 2088: (63) (u32 )(r7 +29772) = r0 2089: (63) (u32 )(r7 +29776) = r0 2090: (63) (u32 )(r7 +29780) = r0 2091: (63) (u32 )(r7 +29784) = r0 2092: (63) (u32 )(r7 +29788) = r0 2093: (63) (u32 )(r7 +29792) = r0 2094: (63) (u32 )(r7 +29796) = r0 2095: (63) (u32 )(r7 +29800) = r0 2096: (63) (u32 )(r7 +29804) = r0 2097: (63) (u32 )(r7 +29808) = r0 2098: (63) (u32 )(r7 +29812) = r0 // overwrite scalar with dummy pointer; same as before, also including the // sanitation store with 0 from the current mitigation by the verifier. 2099: (7a) (u64 )(r10 -16) = 0 \| /both/ are now slow stores here 2100: (7b) (u64 )(r10 -16) = r7 \| since store unit is still busy. // load from stack intended to bypass stores. 2101: (79) r2 = (u64 )(r10 -16) 2102: (71) r3 = (u8 )(r2 +0) // leak r3 [...] Looking at the CPU microarchitecture, the scheduler might issue loads (such as seen in line 2101) before stores (line 2099,2100) because the load execution units become available while the store execution unit is still busy with the sequence of dummy stores (line 2069-2098). And so the load may use the prior stored scalar from r2 at address r10 -16 for speculation. The updated attack may work less reliable on CPU microarchitectures where loads and stores share execution resources. This concludes that the sanitizing with zero stores from `af86ca4e30` ("bpf: Prevent memory disambiguation attack") is insufficient. Moreover, the detection of stack reuse from `af86ca4e30` where previously data (STACK_MISC) has been written to a given stack slot where a pointer value is now to be stored does not have sufficient coverage as precondition for the mitigation either; for several reasons outlined as follows: 1) Stack content from prior program runs could still be preserved and is therefore not "random", best example is to split a speculative store bypass attack between tail calls, program A would prepare and store the oob address at a given stack slot and then tail call into program B which does the "slow" store of a pointer to the stack with subsequent "fast" read. From program B PoV such stack slot type is STACK_INVALID, and therefore also must be subject to mitigation. 2) The STACK_SPILL must not be coupled to register_is_const(&stack->spilled_ptr) condition, for example, the previous content of that memory location could also be a pointer to map or map value. Without the fix, a speculative store bypass is not mitigated in such precondition and can then lead to a type confusion in the speculative domain leaking kernel memory near these pointer types. While brainstorming on various alternative mitigation possibilities, we also stumbled upon a retrospective from Chrome developers [0]: [...] For variant 4, we implemented a mitigation to zero the unused memory of the heap prior to allocation, which cost about 1% when done concurrently and 4% for scavenging. Variant 4 defeats everything we could think of. We explored more mitigations for variant 4 but the threat proved to be more pervasive and dangerous than we anticipated. For example, stack slots used by the register allocator in the optimizing compiler could be subject to type confusion, leading to pointer crafting. Mitigating type confusion for stack slots alone would have required a complete redesign of the backend of the optimizing compiler, perhaps man years of work, without a guarantee of completeness. [...] From BPF side, the problem space is reduced, however, options are rather limited. One idea that has been explored was to xor-obfuscate pointer spills to the BPF stack: [...] // preoccupy the CPU store port by running sequence of dummy stores. [...] 2106: (63) (u32 )(r7 +29796) = r0 2107: (63) (u32 )(r7 +29800) = r0 2108: (63) (u32 )(r7 +29804) = r0 2109: (63) (u32 )(r7 +29808) = r0 2110: (63) (u32 )(r7 +29812) = r0 // overwrite scalar with dummy pointer; xored with random 'secret' value // of 943576462 before store ... 2111: (b4) w11 = 943576462 2112: (af) r11 ^= r7 2113: (7b) (u64 )(r10 -16) = r11 2114: (79) r11 = (u64 )(r10 -16) 2115: (b4) w2 = 943576462 2116: (af) r2 ^= r11 // ... and restored with the same 'secret' value with the help of AX reg. 2117: (71) r3 = (u8 )(r2 +0) [...] While the above would not prevent speculation, it would make data leakage infeasible by directing it to random locations. In order to be effective and prevent type confusion under speculation, such random secret would have to be regenerated for each store. The additional complexity involved for a tracking mechanism that prevents jumps such that restoring spilled pointers would not get corrupted is not worth the gain for unprivileged. Hence, the fix in here eventually opted for emitting a non-public BPF_ST \| BPF_NOSPEC instruction which the x86 JIT translates into a lfence opcode. Inserting the latter in between the store and load instruction is one of the mitigations options [1]. The x86 instruction manual notes: [...] An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible. [...] The latter meaning that the preceding store instruction finished execution and the store is at minimum guaranteed to be in the CPU's store queue, but it's not guaranteed to be in that CPU's L1 cache at that point (globally visible). The latter would only be guaranteed via sfence. So the load which is guaranteed to execute after the lfence for that local CPU would have to rely on store-to-load forwarding. [2], in section 2.3 on store buffers says: [...] For every store operation that is added to the ROB, an entry is allocated in the store buffer. This entry requires both the virtual and physical address of the target. Only if there is no free entry in the store buffer, the frontend stalls until there is an empty slot available in the store buffer again. Otherwise, the CPU can immediately continue adding subsequent instructions to the ROB and execute them out of order. On Intel CPUs, the store buffer has up to 56 entries. [...] One small upside on the fix is that it lifts constraints from `af86ca4e30` where the sanitize_stack_off relative to r10 must be the same when coming from different paths. The BPF_ST \| BPF_NOSPEC gets emitted after a BPF_STX or BPF_ST instruction. This happens either when we store a pointer or data value to the BPF stack for the first time, or upon later pointer spills. The former needs to be enforced since otherwise stale stack data could be leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST \| BPF_NOSPEC mapping is currently optimized away, but others could emit a speculation barrier as well if necessary. For real-world unprivileged programs e.g. generated by LLVM, pointer spill/fill is only generated upon register pressure and LLVM only tries to do that for pointers which are not used often. The program main impact will be the initial BPF_ST \| BPF_NOSPEC sanitation for the STACK_INVALID case when the first write to a stack slot occurs e.g. upon map lookup. In future we might refine ways to mitigate the latter cost. [0] https://arxiv.org/pdf/1902.05178.pdf [1] https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/ [2] https://arxiv.org/pdf/1905.05725.pdf Fixes: `af86ca4e30` ("bpf: Prevent memory disambiguation attack") Fixes: `f7cf25b202` ("bpf: track spill/fill of constants") Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-04 12:46:44 +02:00
Greg Kroah-Hartman	e4cac2c332	Merge 5.10.54 into android12-5.10-lts Changes in 5.10.54 igc: Fix use-after-free error during reset igb: Fix use-after-free error during reset igc: change default return of igc_read_phy_reg() ixgbe: Fix an error handling path in 'ixgbe_probe()' igc: Fix an error handling path in 'igc_probe()' igb: Fix an error handling path in 'igb_probe()' fm10k: Fix an error handling path in 'fm10k_probe()' e1000e: Fix an error handling path in 'e1000_probe()' iavf: Fix an error handling path in 'iavf_probe()' igb: Check if num of q_vectors is smaller than max before array access igb: Fix position of assignment to *ring gve: Fix an error handling path in 'gve_probe()' net: add kcov handle to skb extensions bonding: fix suspicious RCU usage in bond_ipsec_add_sa() bonding: fix null dereference in bond_ipsec_add_sa() ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops bonding: fix suspicious RCU usage in bond_ipsec_del_sa() bonding: disallow setting nested bonding + ipsec offload bonding: Add struct bond_ipesc to manage SA bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() bonding: fix incorrect return value of bond_ipsec_offload_ok() ipv6: fix 'disable_policy' for fwd packets stmmac: platform: Fix signedness bug in stmmac_probe_config_dt() selftests: icmp_redirect: remove from checking for IPv6 route get selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped cxgb4: fix IRQ free race during driver unload mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join nvme-pci: do not call nvme_dev_remove_admin from nvme_remove KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM perf inject: Fix dso->nsinfo refcounting perf map: Fix dso->nsinfo refcounting perf probe: Fix dso->nsinfo refcounting perf env: Fix sibling_dies memory leak perf test session_topology: Delete session->evlist perf test event_update: Fix memory leak of evlist perf dso: Fix memory leak in dso__new_map() perf test maps__merge_in: Fix memory leak of maps perf env: Fix memory leak of cpu_pmu_caps perf report: Free generated help strings for sort option perf script: Fix memory 'threads' and 'cpus' leaks on exit perf lzma: Close lzma stream on exit perf probe-file: Delete namelist in del_events() on the error path perf data: Close all files in close_dir() perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set ASoC: wm_adsp: Correct wm_coeff_tlv_get handling spi: imx: add a check for speed_hz before calculating the clock spi: stm32: fixes pm_runtime calls in probe/remove regulator: hi6421: Use correct variable type for regmap api val argument regulator: hi6421: Fix getting wrong drvdata spi: mediatek: fix fifo rx mode ASoC: rt5631: Fix regcache sync errors on resume bpf, test: fix NULL pointer dereference on invalid expected_attach_type bpf: Fix tail_call_reachable rejection for interpreter when jit failed xdp, net: Fix use-after-free in bpf_xdp_link_release timers: Fix get_next_timer_interrupt() with no timers pending liquidio: Fix unintentional sign extension issue on left shift of u16 s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1] bpf, sockmap: Fix potential memory leak on unlikely error case bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats bpftool: Check malloc return value in mount_bpffs_for_pin net: fix uninit-value in caif_seqpkt_sendmsg usb: hso: fix error handling code of hso_create_net_device dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} efi/tpm: Differentiate missing and invalid final event log table. net: decnet: Fix sleeping inside in af_decnet KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak net: sched: fix memory leak in tcindex_partial_destroy_work sctp: trim optlen when it's a huge value in sctp_setsockopt netrom: Decrease sock refcount when sock timers expire scsi: iscsi: Fix iface sysfs attr detection scsi: target: Fix protect handling in WRITE SAME(32) spi: cadence: Correct initialisation of runtime PM again ACPI: Kconfig: Fix table override from built-in initrd bnxt_en: don't disable an already disabled PCI device bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe() bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task() bnxt_en: Validate vlan protocol ID on RX packets bnxt_en: Check abort error state in bnxt_half_open_nic() net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition net/tcp_fastopen: fix data races around tfo_active_disable_stamp ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID net: hns3: fix possible mismatches resp of mailbox net: hns3: fix rx VLAN offload state inconsistent issue spi: spi-bcm2835: Fix deadlock net/sched: act_skbmod: Skip non-Ethernet packets ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions ceph: don't WARN if we're still opening a session to an MDS nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem" afs: Fix tracepoint string placement with built-in AFS r8169: Avoid duplicate sysfs entry creation error nvme: set the PRACT bit when using Write Zeroes with T10 PI sctp: update active_key for asoc when old key is being replaced tcp: disable TFO blackhole logic by default net: dsa: sja1105: make VID 4095 a bridge VLAN too net: sched: cls_api: Fix the the wrong parameter drm/panel: raspberrypi-touchscreen: Prevent double-free cifs: only write 64kb at a time when fallocating a small region of a file cifs: fix fallocate when trying to allocate a hole. proc: Avoid mixing integer types in mem_rw() mmc: core: Don't allocate IDA for OF aliases s390/ftrace: fix ftrace_update_ftrace_func implementation s390/boot: fix use of expolines in the DMA code ALSA: usb-audio: Add missing proc text entry for BESPOKEN type ALSA: usb-audio: Add registration quirk for JBL Quantum headsets ALSA: sb: Fix potential ABBA deadlock in CSP driver ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine ALSA: hdmi: Expose all pins on MSI MS-7C94 board ALSA: pcm: Call substream ack() method upon compat mmap commit ALSA: pcm: Fix mmap capability check Revert "usb: renesas-xhci: Fix handling of unknown ROM state" usb: xhci: avoid renesas_usb_fw.mem when it's unusable xhci: Fix lost USB 2 remote wake KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state usb: hub: Disable USB 3 device initiated lpm if exit latency is too high usb: hub: Fix link power management max exit latency (MEL) calculations USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS usb: max-3421: Prevent corruption of freed memory usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop() USB: serial: option: add support for u-blox LARA-R6 family USB: serial: cp210x: fix comments for GE CS1000 USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode. usb: dwc2: gadget: Fix sending zero length packet in DDMA mode. usb: typec: stusb160x: register role switch before interrupt registration firmware/efi: Tell memblock about EFI iomem reservations tracepoints: Update static_call before tp_funcs when adding a tracepoint tracing/histogram: Rename "cpu" to "common_cpu" tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. tracing: Synthetic event field_pos is an index not a boolean btrfs: check for missing device in btrfs_trim_fs media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf() ixgbe: Fix packet corruption due to missing DMA sync bus: mhi: core: Validate channel ID when processing command completions posix-cpu-timers: Fix rearm racing against process tick selftest: use mmap instead of posix_memalign to allocate memory io_uring: explicitly count entries for poll reqs io_uring: remove double poll entry on arm failure userfaultfd: do not untag user pointers memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions hugetlbfs: fix mount mode command line processing rbd: don't hold lock_rwsem while running_list is being drained rbd: always kick acquire on "acquired" and "released" notifications misc: eeprom: at24: Always append device id even if label property is set. nds32: fix up stack guard gap driver core: Prevent warning when removing a device link from unregistered consumer drm: Return -ENOTTY for non-drm ioctls drm/amdgpu: update golden setting for sienna_cichlid net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz PCI: Mark AMD Navi14 GPU ATS as broken bonding: fix build issue skbuff: Release nfct refcount on napi stolen or re-used skbs Documentation: Fix intiramfs script name perf inject: Close inject.output on exit usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI drm/i915/gvt: Clear d3_entered on elsp cmd submission. sfc: ensure correct number of XDP queues xhci: add xhci_get_virt_ep() helper skbuff: Fix build with SKB extensions disabled Linux 5.10.54 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e	2021-07-28 15:23:47 +02:00
Daniel Borkmann	39f1735c81	bpf: Fix tail_call_reachable rejection for interpreter when jit failed [ Upstream commit 5dd0a6b8582ffbfa88351949d50eccd5b6694ade ] During testing of f263a81451c1 ("bpf: Track subprog poke descriptors correctly and fix use-after-free") under various failure conditions, for example, when jit_subprogs() fails and tries to clean up the program to be run under the interpreter, we ran into the following freeze: [...] #127/8 tailcall_bpf2bpf_3:FAIL [...] [ 92.041251] BUG: KASAN: slab-out-of-bounds in ___bpf_prog_run+0x1b9d/0x2e20 [ 92.042408] Read of size 8 at addr ffff88800da67f68 by task test_progs/682 [ 92.043707] [ 92.044030] CPU: 1 PID: 682 Comm: test_progs Tainted: G O 5.13.0-53301-ge6c08cb33a30-dirty #87 [ 92.045542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 92.046785] Call Trace: [ 92.047171] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.047773] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.048389] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.049019] ? ktime_get+0x117/0x130 [...] // few hundred [similar] lines more [ 92.659025] ? ktime_get+0x117/0x130 [ 92.659845] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.660738] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.661528] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.662378] ? print_usage_bug+0x50/0x50 [ 92.663221] ? print_usage_bug+0x50/0x50 [ 92.664077] ? bpf_ksym_find+0x9c/0xe0 [ 92.664887] ? ktime_get+0x117/0x130 [ 92.665624] ? kernel_text_address+0xf5/0x100 [ 92.666529] ? __kernel_text_address+0xe/0x30 [ 92.667725] ? unwind_get_return_address+0x2f/0x50 [ 92.668854] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.670185] ? ktime_get+0x117/0x130 [ 92.671130] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.672020] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.672860] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.675159] ? ktime_get+0x117/0x130 [ 92.677074] ? lock_is_held_type+0xd5/0x130 [ 92.678662] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.680046] ? ktime_get+0x117/0x130 [ 92.681285] ? __bpf_prog_run32+0x6b/0x90 [ 92.682601] ? __bpf_prog_run64+0x90/0x90 [ 92.683636] ? lock_downgrade+0x370/0x370 [ 92.684647] ? mark_held_locks+0x44/0x90 [ 92.685652] ? ktime_get+0x117/0x130 [ 92.686752] ? lockdep_hardirqs_on+0x79/0x100 [ 92.688004] ? ktime_get+0x117/0x130 [ 92.688573] ? __cant_migrate+0x2b/0x80 [ 92.689192] ? bpf_test_run+0x2f4/0x510 [ 92.689869] ? bpf_test_timer_continue+0x1c0/0x1c0 [ 92.690856] ? rcu_read_lock_bh_held+0x90/0x90 [ 92.691506] ? __kasan_slab_alloc+0x61/0x80 [ 92.692128] ? eth_type_trans+0x128/0x240 [ 92.692737] ? __build_skb+0x46/0x50 [ 92.693252] ? bpf_prog_test_run_skb+0x65e/0xc50 [ 92.693954] ? bpf_prog_test_run_raw_tp+0x2d0/0x2d0 [ 92.694639] ? __fget_light+0xa1/0x100 [ 92.695162] ? bpf_prog_inc+0x23/0x30 [ 92.695685] ? __sys_bpf+0xb40/0x2c80 [ 92.696324] ? bpf_link_get_from_fd+0x90/0x90 [ 92.697150] ? mark_held_locks+0x24/0x90 [ 92.698007] ? lockdep_hardirqs_on_prepare+0x124/0x220 [ 92.699045] ? finish_task_switch+0xe6/0x370 [ 92.700072] ? lockdep_hardirqs_on+0x79/0x100 [ 92.701233] ? finish_task_switch+0x11d/0x370 [ 92.702264] ? __switch_to+0x2c0/0x740 [ 92.703148] ? mark_held_locks+0x24/0x90 [ 92.704155] ? __x64_sys_bpf+0x45/0x50 [ 92.705146] ? do_syscall_64+0x35/0x80 [ 92.706953] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Turns out that the program rejection from `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") is buggy since env->prog->aux->tail_call_reachable is never true. Commit `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") added a tracker into check_max_stack_depth() which propagates the tail_call_reachable condition throughout the subprograms. This info is then assigned to the subprogram's func[i]->aux->tail_call_reachable. However, in the case of the rejection check upon JIT failure, env->prog->aux->tail_call_reachable is used. func[0]->aux->tail_call_reachable which represents the main program's information did not propagate this to the outer env->prog->aux, though. Add this propagation into check_max_stack_depth() where it needs to belong so that the check can be done reliably. Fixes: `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/618c34e3163ad1a36b1e82377576a6081e182f25.1626123173.git.daniel@iogearbox.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 14:35:37 +02:00
Greg Kroah-Hartman	67e686fc73	Revert "bpf: Track subprog poke descriptors correctly and fix use-after-free" This reverts commit `a9f36bf361` which is commit f263a81451c12da5a342d90572e317e611846f2c upstream. It breaks the Android KABI and is not needed for Android devices at this point in time. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id5d1b6407f9bd5228630b9a813e9799dbc448b96	2021-07-26 11:53:25 +02:00
John Fastabend	a9f36bf361	bpf: Track subprog poke descriptors correctly and fix use-after-free commit f263a81451c12da5a342d90572e317e611846f2c upstream. Subprograms are calling map_poke_track(), but on program release there is no hook to call map_poke_untrack(). However, on program release, the aux memory (and poke descriptor table) is freed even though we still have a reference to it in the element list of the map aux data. When we run map_poke_run(), we then end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run(): [...] [ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e [ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337 [ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399 [ 402.824715] Call Trace: [ 402.824719] dump_stack+0x93/0xc2 [ 402.824727] print_address_description.constprop.0+0x1a/0x140 [ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824744] kasan_report.cold+0x7c/0xd8 [ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824757] prog_array_map_poke_run+0xc2/0x34e [ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0 [...] The elements concerned are walked as follows: for (i = 0; i < elem->aux->size_poke_tab; i++) { poke = &elem->aux->poke_tab[i]; [...] The access to size_poke_tab is a 4 byte read, verified by checking offsets in the KASAN dump: [ 402.825004] The buggy address belongs to the object at ffff8881905a7800 which belongs to the cache kmalloc-1k of size 1024 [ 402.825008] The buggy address is located 320 bytes inside of 1024-byte region [ffff8881905a7800, ffff8881905a7c00) The pahole output of bpf_prog_aux: struct bpf_prog_aux { [...] /* --- cacheline 5 boundary (320 bytes) --- / u32 size_poke_tab; / 320 4 */ [...] In general, subprograms do not necessarily manage their own data structures. For example, BTF func_info and linfo are just pointers to the main program structure. This allows reference counting and cleanup to be done on the latter which simplifies their management a bit. The aux->poke_tab struct, however, did not follow this logic. The initial proposed fix for this use-after-free bug further embedded poke data tracking into the subprogram with proper reference counting. However, Daniel and Alexei questioned why we were treating these objects special; I agree, its unnecessary. The fix here removes the per subprogram poke table allocation and map tracking and instead simply points the aux->poke_tab pointer at the main programs poke table. This way, map tracking is simplified to the main program and we do not need to manage them per subprogram. This also means, bpf_prog_free_deferred(), which unwinds the program reference counting and kfrees objects, needs to ensure that we don't try to double free the poke_tab when free'ing the subprog structures. This is easily solved by NULL'ing the poke_tab pointer. The second detail is to ensure that per subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do this, we add a pointer in the poke structure to point at the subprogram value so JITs can easily check while walking the poke_tab structure if the current entry belongs to the current program. The aux pointer is stable and therefore suitable for such comparison. On the jit_subprogs() error path, we omit cleaning up the poke->aux field because these are only ever referenced from the JIT side, but on error we will never make it to the JIT, so its fine to leave them dangling. Removing these pointers would complicate the error path for no reason. However, we do need to untrack all poke descriptors from the main program as otherwise they could race with the freeing of JIT memory from the subprograms. Lastly, `a748c6975d` ("bpf: propagate poke descriptors to subprograms") had an off-by-one on the subprogram instruction index range check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'. However, subprog_end is the next subprogram's start instruction. Fixes: `a748c6975d` ("bpf: propagate poke descriptors to subprograms") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-25 14:36:21 +02:00
John Fastabend	f97b9c4c07	bpf: Fix null ptr deref with mixed tail calls and subprogs [ Upstream commit 7506d211b932870155bcb39e3dd9e39fab45a7c7 ] The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and then used when emitting 'BPF_JMP\|BPF_TAIL_CALL' insn->code from the individual JITs. The poke_tab[] to use is stored in the insn->imm by the code adding it to that array slot. The JIT then uses imm to find the right entry for an individual instruction. In the x86 bpf_jit_comp.c this is done by calling emit_bpf_tail_call_direct with the poke_tab[] of the imm value. However, we observed the below null-ptr-deref when mixing tail call programs with subprog programs. For this to happen we just need to mix bpf-2-bpf calls and tailcalls with some extra calls or instructions that would be patched later by one of the fixup routines. So whats happening? Before the fixup_call_args() -- where the jit op is done -- various code patching is done by do_misc_fixups(). This may increase the insn count, for example when we patch map_lookup_up using map_gen_lookup hook. This does two things. First, it means the instruction index, insn_idx field, of a tail call instruction will move by a 'delta'. In verifier code, struct bpf_jit_poke_descriptor desc = { .reason = BPF_POKE_REASON_TAIL_CALL, .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), .tail_call.key = bpf_map_key_immediate(aux), .insn_idx = i + delta, }; Then subprog start values subprog_info[i].start will be updated with the delta and any poke descriptor index will also be updated with the delta in adjust_poke_desc(). If we look at the adjust subprog starts though we see its only adjusted when the delta occurs before the new instructions, /* NOTE: fake 'exit' subprog should be updated as well. */ for (i = 0; i <= env->subprog_cnt; i++) { if (env->subprog_info[i].start <= off) continue; Earlier subprograms are not changed because their start values are not moved. But, adjust_poke_desc() does the offset + delta indiscriminately. The result is poke descriptors are potentially corrupted. Then in jit_subprogs() we only populate the poke_tab[] when the above insn_idx is less than the next subprogram start. From above we corrupted our insn_idx so we might incorrectly assume a poke descriptor is not used in a subprogram omitting it from the subprogram. And finally when the jit runs it does the deref of poke_tab when emitting the instruction and crashes with below. Because earlier step omitted the poke descriptor. The fix is straight forward with above context. Simply move same logic from adjust_subprog_starts() into adjust_poke_descs() and only adjust insn_idx when needed. [ 82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload ' [ 82.623001] loop10: detected capacity change from 0 to 8 [ 88.487424] ================================================================== [ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290 [ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295 [ 88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G I 5.12.0+ #386 [ 88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019 [ 88.487490] Call Trace: [ 88.487498] dump_stack+0x93/0xc2 [ 88.487515] kasan_report.cold+0x5f/0xd8 [ 88.487530] ? do_jit+0x184a/0x3290 [ 88.487542] do_jit+0x184a/0x3290 ... [ 88.487709] bpf_int_jit_compile+0x248/0x810 ... [ 88.487765] bpf_check+0x3718/0x5140 ... [ 88.487920] bpf_prog_load+0xa22/0xf10 Fixes: `a748c6975d` ("bpf: propagate poke descriptors to subprograms") Reported-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:26 +02:00
Daniel Borkmann	8c82c52d1d	bpf: Do not mark insn as seen under speculative path verification [ Upstream commit fe9a5ca7e370e613a9a75a13008a3845ea759d6e ] ... in such circumstances, we do not want to mark the instruction as seen given the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable from the non-speculative path verification. We do however want to verify it for safety regardless. With the patch as-is all the insns that have been marked as seen before the patch will also be marked as seen after the patch (just with a potentially different non-zero count). An upcoming patch will also verify paths that are unreachable in the non-speculative domain, hence this extension is needed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-23 14:42:49 +02:00
Daniel Borkmann	e9d271731d	bpf: Inherit expanded/patched seen count from old aux data [ Upstream commit d203b0fd863a2261e5d00b97f3d060c4c2a6db71 ] Instead of relying on current env->pass_cnt, use the seen count from the old aux data in adjust_insn_aux_data(), and expand it to the new range of patched instructions. This change is valid given we always expand 1:n with n>=1, so what applies to the old/original instruction needs to apply for the replacement as well. Not relying on env->pass_cnt is a prerequisite for a later change where we want to avoid marking an instruction seen when verified under speculative execution path. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-23 14:42:49 +02:00
Daniel Borkmann	5fc6ed1831	bpf: Fix leakage under speculation on mispredicted branches [ Upstream commit 9183671af6dbf60a1219371d4ed73e23f43b49db ] The verifier only enumerates valid control-flow paths and skips paths that are unreachable in the non-speculative domain. And so it can miss issues under speculative execution on mispredicted branches. For example, a type confusion has been demonstrated with the following crafted program: // r0 = pointer to a map array entry // r6 = pointer to readable stack slot // r9 = scalar controlled by attacker 1: r0 = (u64 )(r0) // cache miss 2: if r0 != 0x0 goto line 4 3: r6 = r9 4: if r0 != 0x1 goto line 6 5: r9 = (u8 )(r6) 6: // leak r9 Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier concludes that the pointer dereference on line 5 is safe. But: if the attacker trains both the branches to fall-through, such that the following is speculatively executed ... r6 = r9 r9 = (u8 )(r6) // leak r9 ... then the program will dereference an attacker-controlled value and could leak its content under speculative execution via side-channel. This requires to mistrain the branch predictor, which can be rather tricky, because the branches are mutually exclusive. However such training can be done at congruent addresses in user space using different branches that are not mutually exclusive. That is, by training branches in user space ... A: if r0 != 0x0 goto line C B: ... C: if r0 != 0x0 goto line D D: ... ... such that addresses A and C collide to the same CPU branch prediction entries in the PHT (pattern history table) as those of the BPF program's lines 2 and 4, respectively. A non-privileged attacker could simply brute force such collisions in the PHT until observing the attack succeeding. Alternative methods to mistrain the branch predictor are also possible that avoid brute forcing the collisions in the PHT. A reliable attack has been demonstrated, for example, using the following crafted program: // r0 = pointer to a [control] map array entry // r7 = (u64 )(r0 + 0), training/attack phase // r8 = (u64 )(r0 + 8), oob address // [...] // r0 = pointer to a [data] map array entry 1: if r7 == 0x3 goto line 3 2: r8 = r0 // crafted sequence of conditional jumps to separate the conditional // branch in line 193 from the current execution flow 3: if r0 != 0x0 goto line 5 4: if r0 == 0x0 goto exit 5: if r0 != 0x0 goto line 7 6: if r0 == 0x0 goto exit [...] 187: if r0 != 0x0 goto line 189 188: if r0 == 0x0 goto exit // load any slowly-loaded value (due to cache miss in phase 3) ... 189: r3 = (u64 )(r0 + 0x1200) // ... and turn it into known zero for verifier, while preserving slowly- // loaded dependency when executing: 190: r3 &= 1 191: r3 &= 2 // speculatively bypassed phase dependency 192: r7 += r3 193: if r7 == 0x3 goto exit 194: r4 = (u8 )(r8 + 0) // leak r4 As can be seen, in training phase (phase != 0x3), the condition in line 1 turns into false and therefore r8 with the oob address is overridden with the valid map value address, which in line 194 we can read out without issues. However, in attack phase, line 2 is skipped, and due to the cache miss in line 189 where the map value is (zeroed and later) added to the phase register, the condition in line 193 takes the fall-through path due to prior branch predictor training, where under speculation, it'll load the byte at oob address r8 (unknown scalar type at that point) which could then be leaked via side-channel. One way to mitigate these is to 'branch off' an unreachable path, meaning, the current verification path keeps following the is_branch_taken() path and we push the other branch to the verification stack. Given this is unreachable from the non-speculative domain, this branch's vstate is explicitly marked as speculative. This is needed for two reasons: i) if this path is solely seen from speculative execution, then we later on still want the dead code elimination to kick in in order to sanitize these instructions with jmp-1s, and ii) to ensure that paths walked in the non-speculative domain are not pruned from earlier walks of paths walked in the speculative domain. Additionally, for robustness, we mark the registers which have been part of the conditional as unknown in the speculative path given there should be no assumptions made on their content. The fix in here mitigates type confusion attacks described earlier due to i) all code paths in the BPF program being explored and ii) existing verifier logic already ensuring that given memory access instruction references one specific data structure. An alternative to this fix that has also been looked at in this scope was to mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as well as direction encoding (always-goto, always-fallthrough, unknown), such that mixing of different always-* directions themselves as well as mixing of always-* with unknown directions would cause a program rejection by the verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else { x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this would result in only single direction always-* taken paths, and unknown taken paths being allowed, such that the former could be patched from a conditional jump to an unconditional jump (ja). Compared to this approach here, it would have two downsides: i) valid programs that otherwise are not performing any pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are required to turn off path pruning for unprivileged, where both can be avoided in this work through pushing the invalid branch to the verification stack. The issue was originally discovered by Adam and Ofek, and later independently discovered and reported as a result of Benedict and Piotr's research work. Fixes: `b2157399cc` ("bpf: prevent out-of-bounds speculation") Reported-by: Adam Morrison <mad@cs.tau.ac.il> Reported-by: Ofek Kirzner <ofekkir@gmail.com> Reported-by: Benedict Schlueter <benedict.schlueter@rub.de> Reported-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-23 14:42:45 +02:00
Yinjun Zhang	24cb8bb7f6	bpf, offload: Reorder offload callback 'prepare' in verifier [ Upstream commit ceb11679d9fcf3fdb358a310a38760fcbe9b63ed ] Commit `4976b718c3` ("bpf: Introduce pseudo_btf_id") switched the order of resolve_pseudo_ldimm(), in which some pseudo instructions are rewritten. Thus those rewritten instructions cannot be passed to driver via 'prepare' offload callback. Reorder the 'prepare' offload callback to fix it. Fixes: `4976b718c3` ("bpf: Introduce pseudo_btf_id") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210520085834.15023-1-simon.horman@netronome.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-03 09:00:49 +02:00
Daniel Borkmann	27acfd11ba	bpf: No need to simulate speculative domain for immediates commit a7036191277f9fa68d92f2071ddc38c09b1e5ee5 upstream. In 801c6058d14a ("bpf: Fix leakage of uninitialized bpf stack under speculation") we replaced masking logic with direct loads of immediates if the register is a known constant. Given in this case we do not apply any masking, there is also no reason for the operation to be truncated under the speculative domain. Therefore, there is also zero reason for the verifier to branch-off and simulate this case, it only needs to do it for unknown but bounded scalars. As a side-effect, this also enables few test cases that were previously rejected due to simulation under zero truncation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:17:43 +02:00
Daniel Borkmann	c87ef240a8	bpf: Fix mask direction swap upon off reg sign change commit bb01a1bba579b4b1c5566af24d95f1767859771e upstream. Masking direction as indicated via mask_to_left is considered to be calculated once and then used to derive pointer limits. Thus, this needs to be placed into bpf_sanitize_info instead so we can pass it to sanitize_ptr_alu() call after the pointer move. Piotr noticed a corner case where the off reg causes masking direction change which then results in an incorrect final aux->alu_limit. Fixes: 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask") Reported-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:17:43 +02:00
Daniel Borkmann	4e2c7b2974	bpf: Wrap aux data inside bpf_sanitize_info container commit 3d0220f6861d713213b015b582e9f21e5b28d2e0 upstream. Add a container structure struct bpf_sanitize_info which holds the current aux info, and update call-sites to sanitize_ptr_alu() to pass it in. This is needed for passing in additional state later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:17:43 +02:00
Daniel Borkmann	282bfc8848	bpf: Fix alu32 const subreg bound tracking on bitwise operations commit 049c4e13714ecbca567b4d5f6d563f05d431c80e upstream. Fix a bug in the verifier's scalar32_min_max_() functions which leads to incorrect tracking of 32 bit bounds for the simulation of and/or/xor bitops. When both the src & dst subreg is a known constant, then the assumption is that scalar_min_max_() will take care to update bounds correctly. However, this is not the case, for example, consider a register R2 which has a tnum of 0xffffffff00000000, meaning, lower 32 bits are known constant and in this case of value 0x00000001. R2 is then and'ed with a register R3 which is a 64 bit known constant, here, 0x100000002. What can be seen in line '10:' is that 32 bit bounds reach an invalid state where {u,s}32_min_value > {u,s}32_max_value. The reason is scalar32_min_max_() delegates 32 bit bounds updates to scalar_min_max_(), however, that really only takes place when both the 64 bit src & dst register is a known constant. Given scalar32_min_max_() is intended to be designed as closely as possible to scalar_min_max_(), update the 32 bit bounds in this situation through __mark_reg32_known() which will set all {u,s}32_{min,max}_value to the correct constant, which is 0x00000000 after the fix (given 0x00000001 & 0x00000002 in 32 bit space). This is possible given var32_off already holds the final value as dst_reg->var_off is updated before calling scalar32_min_max_*(). Before fix, invalid tracking of R2: [...] 9: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=-9223372036854775807 (0x8000000000000001),smax_value=9223372032559808513 (0x7fffffff00000001),umin_value=1,umax_value=0xffffffff00000001,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_min_value=1,u32_max_value=1) R3_w=inv4294967298 R10=fp0 9: (5f) r2 &= r3 10: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=0,smax_value=4294967296 (0x100000000),umin_value=0,umax_value=0x100000000,var_off=(0x0; 0x100000000),s32_min_value=1,s32_max_value=0,u32_min_value=1,u32_max_value=0) R3_w=inv4294967298 R10=fp0 [...] After fix, correct tracking of R2: [...] 9: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=-9223372036854775807 (0x8000000000000001),smax_value=9223372032559808513 (0x7fffffff00000001),umin_value=1,umax_value=0xffffffff00000001,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_min_value=1,u32_max_value=1) R3_w=inv4294967298 R10=fp0 9: (5f) r2 &= r3 10: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=0,smax_value=4294967296 (0x100000000),umin_value=0,umax_value=0x100000000,var_off=(0x0; 0x100000000),s32_min_value=0,s32_max_value=0,u32_min_value=0,u32_max_value=0) R3_w=inv4294967298 R10=fp0 [...] Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: `2921c90d47` ("bpf: Fix a verifier failure with xor") Reported-by: Manfred Paul (@_manfp) Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:45 +02:00
Daniel Borkmann	4394be0a18	bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds [ Upstream commit 10bf4e83167cc68595b85fd73bb91e8f2c086e36 ] Similarly as `b02709587e` ("bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds."), we also need to fix the propagation of 32 bit unsigned bounds from 64 bit counterparts. That is, really only set the u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit space. For example, the register with a umin_value == 1 does /not/ imply that u32_min_value is also equal to 1, since umax_value could be much larger than 32 bit subregister can hold, and thus u32_min_value is in the interval [0,1] instead. Before fix, invalid tracking result of R2_w=inv1: [...] 5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0 5: (35) if r2 >= 0x1 goto pc+1 [...] // goto path 7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0 7: (b6) if w2 <= 0x1 goto pc+1 [...] // goto path 9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-9223372036854775807,smax_value=9223372032559808513,umin_value=1,umax_value=18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0 9: (bc) w2 = w2 10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0 [...] After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)): [...] 5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0 5: (35) if r2 >= 0x1 goto pc+1 [...] // goto path 7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0 7: (b6) if w2 <= 0x1 goto pc+1 [...] // goto path 9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=9223372032559808513,umax_value=18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0 9: (bc) w2 = w2 10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0 [...] Thus, same issue as in `b02709587e` holds for unsigned subregister tracking. Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in `b02709587e` to make them uniform again. Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Manfred Paul (@_manfp) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:44 +02:00
Daniel Borkmann	2fa15d61e4	bpf: Fix leakage of uninitialized bpf stack under speculation commit 801c6058d14a82179a7ee17a4b532cac6fad067f upstream. The current implemented mechanisms to mitigate data disclosure under speculation mainly address stack and map value oob access from the speculative domain. However, Piotr discovered that uninitialized BPF stack is not protected yet, and thus old data from the kernel stack, potentially including addresses of kernel structures, could still be extracted from that 512 bytes large window. The BPF stack is special compared to map values since it's not zero initialized for every program invocation, whereas map values /are/ zero initialized upon their initial allocation and thus cannot leak any prior data in either domain. In the non-speculative domain, the verifier ensures that every stack slot read must have a prior stack slot write by the BPF program to avoid such data leaking issue. However, this is not enough: for example, when the pointer arithmetic operation moves the stack pointer from the last valid stack offset to the first valid offset, the sanitation logic allows for any intermediate offsets during speculative execution, which could then be used to extract any restricted stack content via side-channel. Given for unprivileged stack pointer arithmetic the use of unknown but bounded scalars is generally forbidden, we can simply turn the register-based arithmetic operation into an immediate-based arithmetic operation without the need for masking. This also gives the benefit of reducing the needed instructions for the operation. Given after the work in 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask"), the aux->alu_limit already holds the final immediate value for the offset register with the known scalar. Thus, a simple mov of the immediate to AX register with using AX as the source for the original instruction is sufficient and possible now in this case. Reported-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-07 11:04:31 +02:00
Daniel Borkmann	2cfa537674	bpf: Fix masking negation logic upon negative dst register commit b9b34ddbe2076ade359cd5ce7537d5ed019e9807 upstream. The negation logic for the case where the off_reg is sitting in the dst register is not correct given then we cannot just invert the add to a sub or vice versa. As a fix, perform the final bitwise and-op unconditionally into AX from the off_reg, then move the pointer from the src to dst and finally use AX as the source for the original pointer arithmetic operation such that the inversion yields a correct result. The single non-AX mov in between is possible given constant blinding is retaining it as it's not an immediate based operation. Fixes: `979d63d50c` ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-07 11:04:31 +02:00
Daniel Borkmann	b642e493a9	bpf: Tighten speculative pointer arithmetic mask [ Upstream commit 7fedb63a8307dda0ec3b8969a3b233a1dd7ea8e0 ] This work tightens the offset mask we use for unprivileged pointer arithmetic in order to mitigate a corner case reported by Piotr and Benedict where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of-bounds in order to leak kernel memory via side-channel to user space. Before this change, the computed ptr_limit for retrieve_ptr_limit() helper represents largest valid distance when moving pointer to the right or left which is then fed as aux->alu_limit to generate masking instructions against the offset register. After the change, the derived aux->alu_limit represents the largest potential value of the offset register which we mask against which is just a narrower subset of the former limit. For minimal complexity, we call sanitize_ptr_alu() from 2 observation points in adjust_ptr_min_max_vals(), that is, before and after the simulated alu operation. In the first step, we retieve the alu_state and alu_limit before the operation as well as we branch-off a verifier path and push it to the verification stack as we did before which checks the dst_reg under truncation, in other words, when the speculative domain would attempt to move the pointer out-of-bounds. In the second step, we retrieve the new alu_limit and calculate the absolute distance between both. Moreover, we commit the alu_state and final alu_limit via update_alu_sanitation_state() to the env's instruction aux data, and bail out from there if there is a mismatch due to coming from different verification paths with different states. Reported-by: Piotr Krysiuk <piotras@gmail.com> Reported-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-28 13:40:00 +02:00
Daniel Borkmann	2982ea926b	bpf: Refactor and streamline bounds check into helper [ Upstream commit 073815b756c51ba9d8384d924c5d1c03ca3d1ae4 ] Move the bounds check in adjust_ptr_min_max_vals() into a small helper named sanitize_check_bounds() in order to simplify the former a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-28 13:40:00 +02:00
Andrei Matei	f3c4b01689	bpf: Allow variable-offset stack access [ Upstream commit 01f810ace9ed37255f27608a0864abebccf0aab3 ] Before this patch, variable offset access to the stack was dissalowed for regular instructions, but was allowed for "indirect" accesses (i.e. helpers). This patch removes the restriction, allowing reading and writing to the stack through stack pointers with variable offsets. This makes stack-allocated buffers more usable in programs, and brings stack pointers closer to other types of pointers. The motivation is being able to use stack-allocated buffers for data manipulation. When the stack size limit is sufficient, allocating buffers on the stack is simpler than per-cpu arrays, or other alternatives. In unpriviledged programs, variable-offset reads and writes are disallowed (they were already disallowed for the indirect access case) because the speculative execution checking code doesn't support them. Additionally, when writing through a variable-offset stack pointer, if any pointers are in the accessible range, there's possilibities of later leaking pointers because the write cannot be tracked precisely. Writes with variable offset mark the whole range as initialized, even though we don't know which stack slots are actually written. This is in order to not reject future reads to these slots. Note that this doesn't affect writes done through helpers; like before, helpers need the whole stack range to be initialized to begin with. All the stack slots are in range are considered scalars after the write; variable-offset register spills are not tracked. For reads, all the stack slots in the variable range needs to be initialized (but see above about what writes do), otherwise the read is rejected. All register spilled in stack slots that might be read are marked as having been read, however reads through such pointers don't do register filling; the target register will always be either a scalar or a constant zero. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210207011027.676572-2-andreimatei1@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-28 13:40:00 +02:00
Yonghong Song	f79efcb007	bpf: Permits pointers on stack for helper calls [ Upstream commit cd17d38f8b28f808c368121041c0a4fa91757e0d ] Currently, when checking stack memory accessed by helper calls, for spills, only PTR_TO_BTF_ID and SCALAR_VALUE are allowed. Song discovered an issue where the below bpf program int dump_task(struct bpf_iter__task ctx) { struct seq_file seq = ctx->meta->seq; static char[] info = "abc"; BPF_SEQ_PRINTF(seq, "%s\n", info); return 0; } may cause a verifier failure. The verifier output looks like: ; struct seq_file seq = ctx->meta->seq; 1: (79) r1 = (u64 )(r1 +0) ; BPF_SEQ_PRINTF(seq, "%s\n", info); 2: (18) r2 = 0xffff9054400f6000 4: (7b) (u64 *)(r10 -8) = r2 5: (bf) r4 = r10 ; 6: (07) r4 += -8 ; BPF_SEQ_PRINTF(seq, "%s\n", info); 7: (18) r2 = 0xffff9054400fe000 9: (b4) w3 = 4 10: (b4) w5 = 8 11: (85) call bpf_seq_printf#126 R1_w=ptr_seq_file(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=4,imm=0) R3_w=inv4 R4_w=fp-8 R5_w=inv8 R10=fp0 fp-8_w=map_value last_idx 11 first_idx 0 regs=8 stack=0 before 10: (b4) w5 = 8 regs=8 stack=0 before 9: (b4) w3 = 4 invalid indirect read from stack off -8+0 size 8 Basically, the verifier complains the map_value pointer at "fp-8" location. To fix the issue, if env->allow_ptr_leaks is true, let us also permit pointers on the stack to be accessible by the helper. Reported-by: Song Liu <songliubraving@fb.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201210013349.943719-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-28 13:40:00 +02:00
Daniel Borkmann	fbe6603e7c	bpf: Move sanitize_val_alu out of op switch commit f528819334881fd622fdadeddb3f7edaed8b7c9b upstream. Add a small sanitize_needed() helper function and move sanitize_val_alu() out of the main opcode switch. In upcoming work, we'll move sanitize_ptr_alu() as well out of its opcode switch so this helps to streamline both. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-21 13:01:00 +02:00
Daniel Borkmann	7723d32438	bpf: Improve verifier error messages for users commit a6aaece00a57fa6f22575364b3903dfbccf5345d upstream. Consolidate all error handling and provide more user-friendly error messages from sanitize_ptr_alu() and sanitize_val_alu(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-21 13:01:00 +02:00
Daniel Borkmann	55565c3079	bpf: Rework ptr_limit into alu_limit and add common error path commit b658bbb844e28f1862867f37e8ca11a8e2aa94a3 upstream. Small refactor with no semantic changes in order to consolidate the max ptr_limit boundary check. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-21 13:01:00 +02:00
Daniel Borkmann	480d875f12	bpf: Move off_reg into sanitize_ptr_alu [ Upstream commit 6f55b2f2a1178856c19bbce2f71449926e731914 ] Small refactor to drag off_reg into sanitize_ptr_alu(), so we later on can use off_reg for generalizing some of the checks for all pointer types. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-21 13:01:00 +02:00
Daniel Borkmann	589fd9684d	bpf: Ensure off_reg has no mixed signed bounds for all types [ Upstream commit 24c109bb1537c12c02aeed2d51a347b4d6a9b76e ] The mixed signed bounds check really belongs into retrieve_ptr_limit() instead of outside of it in adjust_ptr_min_max_vals(). The reason is that this check is not tied to PTR_TO_MAP_VALUE only, but to all pointer types that we handle in retrieve_ptr_limit() and given errors from the latter propagate back to adjust_ptr_min_max_vals() and lead to rejection of the program, it's a better place to reside to avoid anything slipping through for future types. The reason why we must reject such off_reg is that we otherwise would not be able to derive a mask, see details in `9d7eceede7` ("bpf: restrict unknown scalars of mixed signed bounds for unprivileged"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-21 13:00:59 +02:00
Daniel Borkmann	4f3ff11204	bpf: Use correct permission flag for mixed signed bounds arithmetic [ Upstream commit 9601148392520e2e134936e76788fc2a6371e7be ] We forbid adding unknown scalars with mixed signed bounds due to the spectre v1 masking mitigation. Hence this also needs bypass_spec_v1 flag instead of allow_ptr_leaks. Fixes: `2c78ee898d` ("bpf: Implement CAP_BPF") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-21 13:00:59 +02:00
Toke Høiland-Jørgensen	b7004ecafa	bpf: Enforce that struct_ops programs be GPL-only commit 12aa8a9467b354ef893ce0fc5719a4de4949a9fb upstream. With the introduction of the struct_ops program type, it became possible to implement kernel functionality in BPF, making it viable to use BPF in place of a regular kernel module for these particular operations. Thus far, the only user of this mechanism is for implementing TCP congestion control algorithms. These are clearly marked as GPL-only when implemented as modules (as seen by the use of EXPORT_SYMBOL_GPL for tcp_register_congestion_control()), so it seems like an oversight that this was not carried over to BPF implementations. Since this is the only user of the struct_ops mechanism, just enforcing GPL-only for the struct_ops program type seems like the simplest way to fix this. Fixes: `0baf26b0fc` ("bpf: tcp: Support tcp_congestion_ops in bpf") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210326100314.121853-1-toke@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-14 08:42:00 +02:00
Piotr Krysiuk	1010f17aaa	bpf: Add sanity check for upper ptr_limit commit 1b1597e64e1a610c7a96710fc4717158e98a08b3 upstream. Given we know the max possible value of ptr_limit at the time of retrieving the latter, add basic assertions, so that the verifier can bail out if anything looks odd and reject the program. Nothing triggered this so far, but it also does not hurt to have these. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-20 10:43:43 +01:00
Piotr Krysiuk	6a3504bf40	bpf: Simplify alu_limit masking for pointer arithmetic commit b5871dca250cd391885218b99cc015aca1a51aea upstream. Instead of having the mov32 with aux->alu_limit - 1 immediate, move this operation to retrieve_ptr_limit() instead to simplify the logic and to allow for subsequent sanity boundary checks inside retrieve_ptr_limit(). This avoids in future that at the time of the verifier masking rewrite we'd run into an underflow which would not sign extend due to the nature of mov32 instruction. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-20 10:43:43 +01:00
Piotr Krysiuk	ac1b87a18c	bpf: Fix off-by-one for area size in creating mask to left commit 10d2bb2e6b1d8c4576c56a748f697dbeb8388899 upstream. retrieve_ptr_limit() computes the ptr_limit for registers with stack and map_value type. ptr_limit is the size of the memory area that is still valid / in-bounds from the point of the current position and direction of the operation (add / sub). This size will later be used for masking the operation such that attempting out-of-bounds access in the speculative domain is redirected to remain within the bounds of the current map value. When masking to the right the size is correct, however, when masking to the left, the size is off-by-one which would lead to an incorrect mask and thus incorrect arithmetic operation in the non-speculative domain. Piotr found that if the resulting alu_limit value is zero, then the BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading 0xffffffff into AX instead of sign-extending to the full 64 bit range, and as a result, this allows abuse for executing speculatively out-of- bounds loads against 4GB window of address space and thus extracting the contents of kernel memory via side-channel. Fixes: `979d63d50c` ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-20 10:43:43 +01:00
Piotr Krysiuk	c4d37eea1c	bpf: Prohibit alu ops for pointer types not defining ptr_limit commit f232326f6966cf2a1d1db7bc917a4ce5f9f55f76 upstream. The purpose of this patch is to streamline error propagation and in particular to propagate retrieve_ptr_limit() errors for pointer types that are not defining a ptr_limit such that register-based alu ops against these types can be rejected. The main rationale is that a gap has been identified by Piotr in the existing protection against speculatively out-of-bounds loads, for example, in case of ctx pointers, unprivileged programs can still perform pointer arithmetic. This can be abused to execute speculatively out-of-bounds loads without restrictions and thus extract contents of kernel memory. Fix this by rejecting unprivileged programs that attempt any pointer arithmetic on unprotected pointer types. The two affected ones are pointer to ctx as well as pointer to map. Field access to a modified ctx' pointer is rejected at a later point in time in the verifier, and `7c69673262` ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") only relevant for root-only use cases. Risk of unprivileged program breakage is considered very low. Fixes: `7c69673262` ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: `b2157399cc` ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-20 10:43:43 +01:00
Ilya Leoshkevich	f4a5c7ff2a	bpf: Clear subreg_def for global function return values [ Upstream commit 45159b27637b0fef6d5ddb86fc7c46b13c77960f ] test_global_func4 fails on s390 as reported by Yauheni in [1]. The immediate problem is that the zext code includes the instruction, whose result needs to be zero-extended, into the zero-extension patchlet, and if this instruction happens to be a branch, then its delta is not adjusted. As a result, the verifier rejects the program later. However, according to [2], as far as the verifier's algorithm is concerned and as specified by the insn_no_def() function, branching insns do not define anything. This includes call insns, even though one might argue that they define %r0. This means that the real problem is that zero extension kicks in at all. This happens because clear_caller_saved_regs() sets BPF_REG_0's subreg_def after global function calls. This can be fixed in many ways; this patch mimics what helper function call handling already does. [1] https://lore.kernel.org/bpf/20200903140542.156624-1-yauheni.kaliuta@redhat.com/ [2] https://lore.kernel.org/bpf/CAADnVQ+2RPKcftZw8d+B1UwB35cpBhpF5u3OocNh90D9pETPwg@mail.gmail.com/ Fixes: `51c39bb1d5` ("bpf: Introduce function-by-function verification") Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210212040408.90109-1-iii@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-04 11:37:34 +01:00
Daniel Borkmann	3320bae8c1	bpf: Fix truncation handling for mod32 dst reg wrt zero commit 9b00f1b78809309163dda2d044d9e94a3c0248a3 upstream. Recently noticed that when mod32 with a known src reg of 0 is performed, then the dst register is 32-bit truncated in verifier: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 0 1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=inv0 R1_w=inv-1 R10=fp0 2: (b4) w2 = -1 3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0 3: (9c) w1 %= w0 4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 4: (b7) r0 = 1 5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 5: (1d) if r1 == r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: (b7) r0 = 2 7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 7: (95) exit 7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0 7: (95) exit However, as a runtime result, we get 2 instead of 1, meaning the dst register does not contain (u32)-1 in this case. The reason is fairly straight forward given the 0 test leaves the dst register as-is: # ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+1 4: (9c) w1 %= w0 5: (b7) r0 = 1 6: (1d) if r1 == r2 goto pc+1 7: (b7) r0 = 2 8: (95) exit This was originally not an issue given the dst register was marked as completely unknown (aka 64 bit unknown). However, after `468f6eafa6` ("bpf: fix 32-bit ALU op verification") the verifier casts the register output to 32 bit, and hence it becomes 32 bit unknown. Note that for the case where the src register is unknown, the dst register is marked 64 bit unknown. After the fix, the register is truncated by the runtime and the test passes: # ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+2 4: (9c) w1 %= w0 5: (05) goto pc+1 6: (bc) w1 = w1 7: (b7) r0 = 1 8: (1d) if r1 == r2 goto pc+1 9: (b7) r0 = 2 10: (95) exit Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows: mod32: mod64: (16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1 (9c) w1 %= w0 (9f) r1 %= r0 (05) goto pc+1 (bc) w1 = w1 Fixes: `468f6eafa6` ("bpf: fix 32-bit ALU op verification") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-26 10:12:59 +01:00
Daniel Borkmann	67afdc7d95	bpf: Fix verifier jsgt branch analysis on max bound commit ee114dd64c0071500345439fc79dd5e0f9d106ed upstream. Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return code for both will tell the caller whether a given conditional jump is taken or not, e.g. 1 means branch will be taken [for the involved registers] and the goto target will be executed, 0 means branch will not be taken and instead we fall-through to the next insn, and last but not least a -1 denotes that it is not known at verification time whether a branch will be taken or not. Now while the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval since the branch will also be taken for reg->s32_max_value == sval. The jgt branch analysis, for example, gets this right. Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: `4f7b3e8258` ("bpf: improve verifier branch analysis") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-13 13:55:15 +01:00
Daniel Borkmann	1d16cc210f	bpf: Fix 32 bit src register truncation on div/mod commit e88b2c6e5a4d9ce30d75391e4d950da74bb2bd90 upstream. While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. Fixes: `68fda450a7` ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-13 13:55:14 +01:00
Daniel Borkmann	569033c082	bpf: Fix verifier jmp32 pruning decision logic commit fd675184fc7abfd1e1c52d23e8e900676b5a1c1a upstream. Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 808464450 1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b4) w4 = 808464432 2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0 2: (9c) w4 %= w0 3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 3: (66) if w4 s> 0x30303030 goto pc+0 R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit propagating r0 from 6 to 7: safe 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 propagating r0 7: safe propagating r0 from 6 to 7: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 The underlying program was xlated as follows: # bpftool p d x i 10 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 5: (66) if w4 s> 0x30303030 goto pc+0 6: (7f) r0 >>= r0 7: (bc) w0 = w0 8: (15) if r0 == 0x0 goto pc+1 9: (9c) w4 %= w0 10: (66) if w0 s> 0x3030 goto pc+0 11: (d6) if w0 s<= 0x303030 goto pc+1 12: (05) goto pc-1 13: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we are actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking a closer look at the verifier analysis, the reason is that it misjudges its pruning decision at the first 'from 6 to 7: safe' occasion. What happens is that while both old/cur registers are marked as precise, they get misjudged for the jmp32 case as range_within() yields true, meaning that the prior verification path with a wider register bound could be verified successfully and therefore the current path with a narrower register bound is deemed safe as well whereas in reality it's not. R0 old/cur path's bounds compare as follows: old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff) cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff) old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff The 64 bit bounds generally look okay and while the information that got propagated from 32 to 64 bit looks correct as well, it's not precise enough for judging a conditional jmp32. Given the latter only operates on subregisters we also need to take these into account as well for a range_within() probe in order to be able to prune paths. Extending the range_within() constraint to both bounds will be able to tell us that the old signed 32 bit bounds are not wider than the cur signed 32 bit bounds. With the fix in place, the program will now verify the 'goto' branch case as it should have been: [...] 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit 7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: (30) r0 = (u8 )skb[808464432] BPF_LD_[ABS\|IND] uses reserved fields processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1 The bug is quite subtle in the sense that when verifier would determine that a given branch is dead code, it would (here: wrongly) remove these instructions from the program and hard-wire the taken branch for privileged programs instead of the 'goto pc-1' rewrites which will cause hard to debug problems. Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-02-13 13:55:14 +01:00
Gilad Reti	de661caaee	bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling commit 744ea4e3885eccb6d332a06fae9eb7420a622c0f upstream. Add support for pointer to mem register spilling, to allow the verifier to track pointers to valid memory addresses. Such pointers are returned for example by a successful call of the bpf_ringbuf_reserve helper. The patch was partially contributed by CyberArk Software, Inc. Fixes: `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Gilad Reti <gilad.reti@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210113053810.13518-1-gilad.reti@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-23 16:03:58 +01:00
Daniel Borkmann	5f52a8a71b	bpf: Fix signed_{sub,add32}_overflows type handling commit bc895e8b2a64e502fbba72748d59618272052a8b upstream. Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy comment). It looks like this might have slipped in via copy/paste issue, also given prior to `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") the signature of signed_sub_overflows() had s64 a and s64 b as its input args whereas now they are truncated to s32. Thus restore proper types. Also, the case of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both have s32 as inputs, therefore align the former. Fixes: `3f50f132d8` ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: De4dCr0w <sa516203@mail.ustc.edu.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-23 16:03:58 +01:00

1 2 3 4 5 ...

652 Commits