Commit Graph

392 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
d8c7f0a3cd Merge 5.10.20 into android12-5.10
Changes in 5.10.20
	vmlinux.lds.h: add DWARF v5 sections
	vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
	debugfs: be more robust at handling improper input in debugfs_lookup()
	debugfs: do not attempt to create a new file before the filesystem is initalized
	scsi: libsas: docs: Remove notify_ha_event()
	scsi: qla2xxx: Fix mailbox Ch erroneous error
	kdb: Make memory allocations more robust
	w1: w1_therm: Fix conversion result for negative temperatures
	PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
	PCI: Decline to resize resources if boot config must be preserved
	virt: vbox: Do not use wait_event_interruptible when called from kernel context
	bfq: Avoid false bfq queue merging
	ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
	MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
	vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y
	random: fix the RNDRESEEDCRNG ioctl
	ALSA: pcm: Call sync_stop at disconnection
	ALSA: pcm: Assure sync with the pending stop operation at suspend
	ALSA: pcm: Don't call sync_stop if it hasn't been stopped
	drm/i915/gt: One more flush for Baytrail clear residuals
	ath10k: Fix error handling in case of CE pipe init failure
	Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
	Bluetooth: hci_uart: Fix a race for write_work scheduling
	Bluetooth: Fix initializing response id after clearing struct
	arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio
	arm64: dts: renesas: beacon: Fix audio-1.8V pin enable
	ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
	ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
	ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
	ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
	ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
	ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
	arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
	arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
	memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops
	Bluetooth: hci_qca: Fix memleak in qca_controller_memdump
	staging: vchiq: Fix bulk userdata handling
	staging: vchiq: Fix bulk transfers on 64-bit builds
	arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible
	net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
	bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
	bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
	firmware: arm_scmi: Fix call site of scmi_notification_exit
	arm64: dts: allwinner: A64: properly connect USB PHY to port 0
	arm64: dts: allwinner: H6: properly connect USB PHY to port 0
	arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
	arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency
	arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
	arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors
	cpufreq: brcmstb-avs-cpufreq: Free resources in error path
	cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
	arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node
	ACPICA: Fix exception code class checks
	usb: gadget: u_audio: Free requests only after callback
	arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node
	soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model()
	soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function
	staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet
	Bluetooth: drop HCI device reference before return
	Bluetooth: Put HCI device if inquiry procedure interrupts
	memory: ti-aemif: Drop child node when jumping out loop
	ARM: dts: Configure missing thermal interrupt for 4430
	usb: dwc2: Do not update data length if it is 0 on inbound transfers
	usb: dwc2: Abort transaction after errors with unknown reason
	usb: dwc2: Make "trimming xfer length" a debug message
	staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
	x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
	arm64: dts: renesas: beacon: Fix EEPROM compatible value
	can: mcp251xfd: mcp251xfd_probe(): fix errata reference
	ARM: dts: armada388-helios4: assign pinctrl to LEDs
	ARM: dts: armada388-helios4: assign pinctrl to each fan
	arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware
	opp: Correct debug message in _opp_add_static_v2()
	Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv
	soc: qcom: ocmem: don't return NULL in of_get_ocmem
	arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
	arm64: dts: meson: fix broken wifi node for Khadas VIM3L
	iwlwifi: mvm: set enabled in the PPAG command properly
	ARM: s3c: fix fiq for clang IAS
	optee: simplify i2c access
	staging: wfx: fix possible panic with re-queued frames
	ARM: at91: use proper asm syntax in pm_suspend
	ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
	ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
	ath11k: fix a locking bug in ath11k_mac_op_start()
	soc: aspeed: snoop: Add clock control logic
	iwlwifi: mvm: fix the type we use in the PPAG table validity checks
	iwlwifi: mvm: store PPAG enabled/disabled flag properly
	iwlwifi: mvm: send stored PPAG command instead of local
	iwlwifi: mvm: assign SAR table revision to the command later
	iwlwifi: mvm: don't check if CSA event is running before removing
	bpf_lru_list: Read double-checked variable once without lock
	iwlwifi: pnvm: set the PNVM again if it was already loaded
	iwlwifi: pnvm: increment the pointer before checking the TLV
	ath9k: fix data bus crash when setting nf_override via debugfs
	selftests/bpf: Convert test_xdp_redirect.sh to bash
	ibmvnic: Set to CLOSED state even on error
	bnxt_en: reverse order of TX disable and carrier off
	bnxt_en: Fix devlink info's stored fw.psid version format.
	xen/netback: fix spurious event detection for common event case
	dpaa2-eth: fix memory leak in XDP_REDIRECT
	net: phy: consider that suspend2ram may cut off PHY power
	net/mlx5e: Don't change interrupt moderation params when DIM is enabled
	net/mlx5e: Change interrupt moderation channel params also when channels are closed
	net/mlx5: Fix health error state handling
	net/mlx5e: Replace synchronize_rcu with synchronize_net
	net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
	net/mlx5: Disable devlink reload for multi port slave device
	net/mlx5: Disallow RoCE on multi port slave device
	net/mlx5: Disallow RoCE on lag device
	net/mlx5: Disable devlink reload for lag devices
	net/mlx5e: CT: manage the lifetime of the ct entry object
	net/mlx5e: Check tunnel offload is required before setting SWP
	mac80211: fix potential overflow when multiplying to u32 integers
	libbpf: Ignore non function pointer member in struct_ops
	bpf: Fix an unitialized value in bpf_iter
	bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation
	bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
	selftests: mptcp: fix ACKRX debug message
	tcp: fix SO_RCVLOWAT related hangs under mem pressure
	net: axienet: Handle deferred probe on clock properly
	cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
	b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
	bpf: Clear subreg_def for global function return values
	ibmvnic: add memory barrier to protect long term buffer
	ibmvnic: skip send_request_unmap for timeout reset
	net: dsa: felix: perform teardown in reverse order of setup
	net: dsa: felix: don't deinitialize unused ports
	net: phy: mscc: adding LCPLL reset to VSC8514
	net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
	net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
	net: amd-xgbe: Reset link when the link never comes back
	net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
	net: mvneta: Remove per-cpu queue mapping for Armada 3700
	net: enetc: fix destroyed phylink dereference during unbind
	tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
	tty: implement read_iter
	fbdev: aty: SPARC64 requires FB_ATY_CT
	drm/gma500: Fix error return code in psb_driver_load()
	gma500: clean up error handling in init
	drm/fb-helper: Add missed unlocks in setcmap_legacy()
	drm/panel: mantix: Tweak init sequence
	drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check
	crypto: sun4i-ss - linearize buffers content must be kept
	crypto: sun4i-ss - fix kmap usage
	crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
	hwrng: ingenic - Fix a resource leak in an error handling path
	media: allegro: Fix use after free on error
	kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state()
	drm: rcar-du: Fix PM reference leak in rcar_cmm_enable()
	drm: rcar-du: Fix crash when using LVDS1 clock for CRTC
	drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node
	drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
	MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
	MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
	drm/virtio: make sure context is created in gem open
	drm/fourcc: fix Amlogic format modifier masks
	media: ipu3-cio2: Build only for x86
	media: i2c: ov5670: Fix PIXEL_RATE minimum value
	media: imx: Unregister csc/scaler only if registered
	media: imx: Fix csc/scaler unregister
	media: mtk-vcodec: fix error return code in vdec_vp9_decode()
	media: camss: missing error code in msm_video_register()
	media: vsp1: Fix an error handling path in the probe function
	media: em28xx: Fix use-after-free in em28xx_alloc_urbs
	media: media/pci: Fix memleak in empress_init
	media: tm6000: Fix memleak in tm6000_start_stream
	media: aspeed: fix error return code in aspeed_video_setup_video()
	ASoC: cs42l56: fix up error handling in probe
	ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai
	evm: Fix memleak in init_desc
	crypto: bcm - Rename struct device_private to bcm_device_private
	sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue
	drm/sun4i: tcon: fix inverted DCLK polarity
	media: imx7: csi: Fix regression for parallel cameras on i.MX6UL
	media: imx7: csi: Fix pad link validation
	media: ti-vpe: cal: fix write to unallocated memory
	MIPS: properly stop .eh_frame generation
	MIPS: Compare __SYNC_loongson3_war against 0
	drm/tegra: Fix reference leak when pm_runtime_get_sync() fails
	drm/amdgpu: toggle on DF Cstate after finishing xgmi injection
	bsg: free the request before return error code
	macintosh/adb-iop: Use big-endian autopoll mask
	drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
	drm/amd/display: Fix HDMI deep color output for DCE 6-11.
	media: software_node: Fix refcounts in software_node_get_next_child()
	media: lmedm04: Fix misuse of comma
	media: vidtv: psi: fix missing crc for PMT
	media: atomisp: Fix a buffer overflow in debug code
	media: qm1d1c0042: fix error return code in qm1d1c0042_init()
	media: cx25821: Fix a bug when reallocating some dma memory
	media: mtk-vcodec: fix argument used when DEBUG is defined
	media: pxa_camera: declare variable when DEBUG is defined
	media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
	sched/eas: Don't update misfit status if the task is pinned
	f2fs: compress: fix potential deadlock
	ASoC: qcom: lpass-cpu: Remove bit clock state check
	ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend
	perf/arm-cmn: Fix PMU instance naming
	perf/arm-cmn: Move IRQs when migrating context
	mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions()
	crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
	crypto: talitos - Fix ctr(aes) on SEC1
	drm/nouveau: bail out of nouveau_channel_new if channel init fails
	mm: proc: Invalidate TLB after clearing soft-dirty page state
	ata: ahci_brcm: Add back regulators management
	ASoC: cpcap: fix microphone timeslot mask
	ASoC: codecs: add missing max_register in regmap config
	mtd: parsers: afs: Fix freeing the part name memory in failure
	f2fs: fix to avoid inconsistent quota data
	drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
	f2fs: fix a wrong condition in __submit_bio
	ASoC: qcom: Fix typo error in HDMI regmap config callbacks
	KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs
	drm/mediatek: Check if fb is null
	Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E
	ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E
	locking/lockdep: Avoid unmatched unlock
	ASoC: qcom: lpass: Fix i2s ctl register bit map
	ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown
	ASoC: SOF: debug: Fix a potential issue on string buffer termination
	btrfs: clarify error returns values in __load_free_space_cache
	btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
	KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64
	s390/zcrypt: return EIO when msg retry limit reached
	drm/vc4: hdmi: Move hdmi reset to bind
	drm/vc4: hdmi: Fix register offset with longer CEC messages
	drm/vc4: hdmi: Fix up CEC registers
	drm/vc4: hdmi: Restore cec physical address on reconnect
	drm/vc4: hdmi: Compute the CEC clock divider from the clock rate
	drm/vc4: hdmi: Update the CEC clock divider on HSM rate change
	drm/lima: fix reference leak in lima_pm_busy
	drm/dp_mst: Don't cache EDIDs for physical ports
	hwrng: timeriomem - Fix cooldown period calculation
	crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
	io_uring: fix possible deadlock in io_uring_poll
	nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
	nvmet-tcp: fix potential race of tcp socket closing accept_work
	nvme-multipath: set nr_zones for zoned namespaces
	nvmet: remove extra variable in identify ns
	nvmet: set status to 0 in case for invalid nsid
	ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk
	ima: Free IMA measurement buffer on error
	ima: Free IMA measurement buffer after kexec syscall
	ASoC: simple-card-utils: Fix device module clock
	fs/jfs: fix potential integer overflow on shift of a int
	jffs2: fix use after free in jffs2_sum_write_data()
	ubifs: Fix memleak in ubifs_init_authentication
	ubifs: replay: Fix high stack usage, again
	ubifs: Fix error return code in alloc_wbufs()
	irqchip/imx: IMX_INTMUX should not default to y, unconditionally
	smp: Process pending softirqs in flush_smp_call_function_from_idle()
	drm/amdgpu/display: remove hdcp_srm sysfs on device removal
	capabilities: Don't allow writing ambiguous v3 file capabilities
	HSI: Fix PM usage counter unbalance in ssi_hw_init
	power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression
	clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
	clk: meson: clk-pll: make "ret" a signed integer
	clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate()
	selftests/powerpc: Make the test check in eeh-basic.sh posix compliant
	regulator: qcom-rpmh-regulator: add pm8009-1 chip revision
	arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators
	quota: Fix memory leak when handling corrupted quota file
	i2c: iproc: handle only slave interrupts which are enabled
	i2c: iproc: update slave isr mask (ISR_MASK_SLAVE)
	i2c: iproc: handle master read request
	spi: cadence-quadspi: Abort read if dummy cycles required are too many
	clk: sunxi-ng: h6: Fix CEC clock
	clk: renesas: r8a779a0: Remove non-existent S2 clock
	clk: renesas: r8a779a0: Fix parent of CBFUSA clock
	HID: core: detect and skip invalid inputs to snto32()
	RDMA/siw: Fix handling of zero-sized Read and Receive Queues.
	dmaengine: fsldma: Fix a resource leak in the remove function
	dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
	dmaengine: owl-dma: Fix a resource leak in the remove function
	dmaengine: hsu: disable spurious interrupt
	mfd: bd9571mwv: Use devm_mfd_add_devices()
	power: supply: cpcap-charger: Fix missing power_supply_put()
	power: supply: cpcap-battery: Fix missing power_supply_put()
	power: supply: cpcap-charger: Fix power_supply_put on null battery pointer
	fdt: Properly handle "no-map" field in the memory region
	of/fdt: Make sure no-map does not remove already reserved regions
	RDMA/rtrs: Extend ibtrs_cq_qp_create
	RDMA/rtrs-srv: Release lock before call into close_sess
	RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect
	RDMA/rtrs-clt: Set mininum limit when create QP
	RDMA/rtrs: Call kobject_put in the failure path
	RDMA/rtrs-srv: Fix missing wr_cqe
	RDMA/rtrs-clt: Refactor the failure cases in alloc_clt
	RDMA/rtrs-srv: Init wr_cnt as 1
	power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
	rtc: s5m: select REGMAP_I2C
	dmaengine: idxd: set DMA channel to be private
	power: supply: fix sbs-charger build, needs REGMAP_I2C
	clocksource/drivers/ixp4xx: Select TIMER_OF when needed
	clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
	spi: imx: Don't print error on -EPROBEDEFER
	RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
	IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
	clk: sunxi-ng: h6: Fix clock divider range on some clocks
	platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT
	platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask
	regulator: axp20x: Fix reference cout leak
	watch_queue: Drop references to /dev/watch_queue
	certs: Fix blacklist flag type confusion
	regulator: s5m8767: Fix reference count leak
	spi: atmel: Put allocated master before return
	regulator: s5m8767: Drop regulators OF node reference
	power: supply: axp20x_usb_power: Init work before enabling IRQs
	power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable
	regulator: core: Avoid debugfs: Directory ... already present! error
	isofs: release buffer head before return
	watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready
	auxdisplay: ht16k33: Fix refresh rate handling
	objtool: Fix error handling for STD/CLD warnings
	objtool: Fix retpoline detection in asm code
	objtool: Fix ".cold" section suffix check for newer versions of GCC
	scsi: lpfc: Fix ancient double free
	iommu: Switch gather->end to the inclusive end
	IB/umad: Return EIO in case of when device disassociated
	IB/umad: Return EPOLLERR in case of when device disassociated
	KVM: PPC: Make the VMX instruction emulation routines static
	powerpc/47x: Disable 256k page size
	powerpc/time: Enable sched clock for irqtime
	mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function
	mmc: sdhci-sprd: Fix some resource leaks in the remove function
	mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
	mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
	ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
	i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct
	amba: Fix resource leak for drivers without .remove
	iommu: Move iotlb_sync_map out from __iommu_map
	iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
	IB/mlx5: Return appropriate error code instead of ENOMEM
	IB/cm: Avoid a loop when device has 255 ports
	tracepoint: Do not fail unregistering a probe due to memory failure
	rtc: zynqmp: depend on HAS_IOMEM
	perf tools: Fix DSO filtering when not finding a map for a sampled address
	perf vendor events arm64: Fix Ampere eMag event typo
	RDMA/rxe: Fix coding error in rxe_recv.c
	RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt
	RDMA/rxe: Correct skb on loopback path
	spi: stm32: properly handle 0 byte transfer
	mfd: altera-sysmgr: Fix physical address storing more
	mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
	powerpc/pseries/dlpar: handle ibm, configure-connector delay status
	powerpc/8xx: Fix software emulation interrupt
	clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
	kunit: tool: fix unit test cleanup handling
	kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
	RDMA/hns: Fixed wrong judgments in the goto branch
	RDMA/siw: Fix calculation of tx_valid_cpus size
	RDMA/hns: Fix type of sq_signal_bits
	RDMA/hns: Disable RQ inline by default
	clk: divider: fix initialization with parent_hw
	spi: pxa2xx: Fix the controller numbering for Wildcat Point
	powerpc/uaccess: Avoid might_fault() when user access is enabled
	powerpc/kuap: Restore AMR after replaying soft interrupts
	regulator: qcom-rpmh: fix pm8009 ldo7
	clk: aspeed: Fix APLL calculate formula from ast2600-A2
	selftests/ftrace: Update synthetic event syntax errors
	perf symbols: Use (long) for iterator for bfd symbols
	regulator: bd718x7, bd71828, Fix dvs voltage levels
	spi: dw: Avoid stack content exposure
	spi: Skip zero-length transfers in spi_transfer_one_message()
	printk: avoid prb_first_valid_seq() where possible
	perf symbols: Fix return value when loading PE DSO
	nfsd: register pernet ops last, unregister first
	svcrdma: Hold private mutex while invoking rdma_accept()
	ceph: fix flush_snap logic after putting caps
	RDMA/hns: Fixes missing error code of CMDQ
	RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
	RDMA/rtrs-srv: Fix stack-out-of-bounds
	RDMA/rtrs: Only allow addition of path to an already established session
	RDMA/rtrs-srv: fix memory leak by missing kobject free
	RDMA/rtrs-srv-sysfs: fix missing put_device
	RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
	Input: sur40 - fix an error code in sur40_probe()
	perf record: Fix continue profiling after draining the buffer
	perf intel-pt: Fix missing CYC processing in PSB
	perf intel-pt: Fix premature IPC
	perf intel-pt: Fix IPC with CYC threshold
	perf test: Fix unaligned access in sample parsing test
	Input: elo - fix an error code in elo_connect()
	sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
	sparc: fix led.c driver when PROC_FS is not enabled
	Input: zinitix - fix return type of zinitix_init_touch()
	ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled
	misc: eeprom_93xx46: Fix module alias to enable module autoprobe
	phy: rockchip-emmc: emmc_phy_init() always return 0
	phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe()
	misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
	PCI: rcar: Always allocate MSI addresses in 32bit space
	soundwire: cadence: fix ACK/NAK handling
	pwm: rockchip: Enable APB clock during register access while probing
	pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
	pwm: rockchip: Eliminate potential race condition when probing
	PCI: xilinx-cpm: Fix reference count leak on error path
	VMCI: Use set_page_dirty_lock() when unregistering guest memory
	PCI: Align checking of syscall user config accessors
	mei: hbm: call mei_set_devstate() on hbm stop response
	drm/msm: Fix MSM_INFO_GET_IOVA with carveout
	drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
	drm/msm/mdp5: Fix wait-for-commit for cmd panels
	drm/msm: Fix race of GPU init vs timestamp power management.
	drm/msm: Fix races managing the OOB state for timestamp vs timestamps.
	drm/msm/dp: trigger unplug event in msm_dp_display_disable
	vfio/iommu_type1: Populate full dirty when detach non-pinned group
	vfio/iommu_type1: Fix some sanity checks in detach group
	vfio-pci/zdev: fix possible segmentation fault issue
	ext4: fix potential htree index checksum corruption
	phy: USB_LGM_PHY should depend on X86
	coresight: etm4x: Skip accessing TRCPDCR in save/restore
	nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
	nvmem: core: skip child nodes not matching binding
	soundwire: bus: use sdw_update_no_pm when initializing a device
	soundwire: bus: use sdw_write_no_pm when setting the bus scale registers
	soundwire: export sdw_write/read_no_pm functions
	soundwire: bus: fix confusion on device used by pm_runtime
	misc: fastrpc: fix incorrect usage of dma_map_sgtable
	remoteproc/mediatek: acknowledge watchdog IRQ after handled
	regmap: sdw: use _no_pm functions in regmap_read/write
	ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
	mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL
	device-dax: Fix default return code of range_parse()
	PCI: pci-bridge-emul: Fix array overruns, improve safety
	PCI: cadence: Fix DMA range mapping early return error
	i40e: Fix flow for IPv6 next header (extension header)
	i40e: Add zero-initialization of AQ command structures
	i40e: Fix overwriting flow control settings during driver loading
	i40e: Fix addition of RX filters after enabling FW LLDP agent
	i40e: Fix VFs not created
	Take mmap lock in cacheflush syscall
	nios2: fixed broken sys_clone syscall
	i40e: Fix add TC filter for IPv6
	octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()
	pwm: iqs620a: Fix overflow and optimize calculations
	vfio/type1: Use follow_pte()
	ice: report correct max number of TCs
	ice: Account for port VLAN in VF max packet size calculation
	ice: Fix state bits on LLDP mode switch
	ice: update the number of available RSS queues
	net: stmmac: fix CBS idleslope and sendslope calculation
	net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
	PCI: rockchip: Make 'ep-gpios' DT property optional
	vxlan: move debug check after netdev unregister
	wireguard: device: do not generate ICMP for non-IP packets
	wireguard: kconfig: use arm chacha even with no neon
	ocfs2: fix a use after free on error
	mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
	mm: memcontrol: fix slub memory accounting
	mm/memory.c: fix potential pte_unmap_unlock pte error
	mm/hugetlb: fix potential double free in hugetlb_register_node() error path
	mm/hugetlb: suppress wrong warning info when alloc gigantic page
	mm/compaction: fix misbehaviors of fast_find_migrateblock()
	r8169: fix jumbo packet handling on RTL8168e
	NFSv4: Fixes for nfs4_bitmask_adjust()
	KVM: SVM: Intercept INVPCID when it's disabled to inject #UD
	KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
	arm64: Add missing ISB after invalidating TLB in __primary_switch
	i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
	i2c: exynos5: Preserve high speed master code
	mm,thp,shmem: make khugepaged obey tmpfs mount flags
	mm: fix memory_failure() handling of dax-namespace metadata
	mm/rmap: fix potential pte_unmap on an not mapped pte
	proc: use kvzalloc for our kernel buffer
	csky: Fix a size determination in gpr_get()
	scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
	scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
	block: reopen the device in blkdev_reread_part
	ide/falconide: Fix module unload
	scsi: sd: Fix Opal support
	blk-settings: align max_sectors on "logical_block_size" boundary
	soundwire: intel: fix possible crash when no device is detected
	ACPI: property: Fix fwnode string properties matching
	ACPI: configfs: add missing check after configfs_register_default_group()
	cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
	HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming
	HID: wacom: Ignore attempts to overwrite the touch_max value from HID
	Input: raydium_ts_i2c - do not send zero length
	Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
	Input: joydev - prevent potential read overflow in ioctl
	Input: i8042 - add ASUS Zenbook Flip to noselftest list
	media: mceusb: Fix potential out-of-bounds shift
	USB: serial: option: update interface mapping for ZTE P685M
	usb: musb: Fix runtime PM race in musb_queue_resume_work
	usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
	usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
	USB: serial: ftdi_sio: fix FTX sub-integer prescaler
	USB: serial: pl2303: fix line-speed handling on newer chips
	USB: serial: mos7840: fix error code in mos7840_write()
	USB: serial: mos7720: fix error code in mos7720_write()
	phy: lantiq: rcu-usb2: wait after clock enable
	ALSA: fireface: fix to parse sync status register of latter protocol
	ALSA: hda: Add another CometLake-H PCI ID
	ALSA: hda/hdmi: Drop bogus check at closing a stream
	ALSA: hda/realtek: modify EAPD in the ALC886
	ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup
	MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too
	MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes
	MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target='
	Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y"
	Revert "bcache: Kill btree_io_wq"
	bcache: Give btree_io_wq correct semantics again
	bcache: Move journal work to new flush wq
	Revert "drm/amd/display: Update NV1x SR latency values"
	drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()
	drm/amd/display: Remove Assert from dcn10_get_dig_frontend
	drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1
	drm/amdkfd: Fix recursive lock warnings
	drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2)
	drm/nouveau/kms: handle mDP connectors
	drm/modes: Switch to 64bit maths to avoid integer overflow
	drm/sched: Cancel and flush all outstanding jobs before finish.
	drm/panel: kd35t133: allow using non-continuous dsi clock
	drm/rockchip: Require the YTR modifier for AFBC
	ASoC: siu: Fix build error by a wrong const prefix
	selinux: fix inconsistency between inode_getxattr and inode_listsecurity
	erofs: initialized fields can only be observed after bit is set
	tpm_tis: Fix check_locality for correct locality acquisition
	tpm_tis: Clean up locality release
	KEYS: trusted: Fix incorrect handling of tpm_get_random()
	KEYS: trusted: Fix migratable=1 failing
	KEYS: trusted: Reserve TPM for seal and unseal operations
	btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
	btrfs: do not warn if we can't find the reloc root when looking up backref
	btrfs: add asserts for deleting backref cache nodes
	btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
	btrfs: fix reloc root leak with 0 ref reloc roots on recovery
	btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
	btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
	btrfs: account for new extents being deleted in total_bytes_pinned
	btrfs: fix extent buffer leak on failure to copy root
	drm/i915/gt: Flush before changing register state
	drm/i915/gt: Correct surface base address for renderclear
	crypto: arm64/sha - add missing module aliases
	crypto: aesni - prevent misaligned buffers on the stack
	crypto: michael_mic - fix broken misalignment handling
	crypto: sun4i-ss - checking sg length is not sufficient
	crypto: sun4i-ss - IV register does not work on A10 and A13
	crypto: sun4i-ss - handle BigEndian for cipher
	crypto: sun4i-ss - initialize need_fallback
	soc: samsung: exynos-asv: don't defer early on not-supported SoCs
	soc: samsung: exynos-asv: handle reading revision register error
	seccomp: Add missing return in non-void function
	arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
	misc: rtsx: init of rts522a add OCP power off when no card is present
	drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
	pstore: Fix typo in compression option name
	dts64: mt7622: fix slow sd card access
	arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2
	staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
	staging: gdm724x: Fix DMA from stack
	staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
	floppy: reintroduce O_NDELAY fix
	media: i2c: max9286: fix access to unallocated memory
	media: ir_toy: add another IR Droid device
	media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
	media: marvell-ccic: power up the device on mclk enable
	media: smipcie: fix interrupt handling and IR timeout
	x86/virt: Eat faults on VMXOFF in reboot flows
	x86/reboot: Force all cpus to exit VMX root if VMX is supported
	x86/fault: Fix AMD erratum #91 errata fixup for user code
	x86/entry: Fix instrumentation annotation
	powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan
	rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
	rcu/nocb: Perform deferred wake up before last idle's need_resched() check
	kprobes: Fix to delay the kprobes jump optimization
	arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
	iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs
	arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
	arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
	arm64 module: set plt* section addresses to 0x0
	arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
	riscv: Disable KSAN_SANITIZE for vDSO
	watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ
	watchdog: mei_wdt: request stop on unregister
	coresight: etm4x: Handle accesses to TRCSTALLCTLR
	mtd: spi-nor: sfdp: Fix last erase region marking
	mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region
	mtd: spi-nor: core: Fix erase type discovery for overlaid region
	mtd: spi-nor: core: Add erase size check for erase command initialization
	mtd: spi-nor: hisi-sfc: Put child node np on error path
	fs/affs: release old buffer head on error path
	seq_file: document how per-entry resources are managed.
	x86: fix seq_file iteration for pat/memtype.c
	mm: memcontrol: fix swap undercounting in cgroup2
	mm: memcontrol: fix get_active_memcg return value
	hugetlb: fix update_and_free_page contig page struct assumption
	hugetlb: fix copy_huge_page_from_user contig page struct assumption
	mm/vmscan: restore zone_reclaim_mode ABI
	mm, compaction: make fast_isolate_freepages() stay within zone
	KVM: nSVM: fix running nested guests when npt=0
	nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
	module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
	mmc: sdhci-esdhc-imx: fix kernel panic when remove module
	mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure
	powerpc/32: Preserve cr1 in exception prolog stack check to fix build error
	powerpc/kexec_file: fix FDT size estimation for kdump kernel
	powerpc/32s: Add missing call to kuep_lock on syscall entry
	spmi: spmi-pmic-arb: Fix hw_irq overflow
	mei: fix transfer over dma with extended header
	mei: me: emmitsburg workstation DID
	mei: me: add adler lake point S DID
	mei: me: add adler lake point LP DID
	gpio: pcf857x: Fix missing first interrupt
	mfd: gateworks-gsc: Fix interrupt type
	printk: fix deadlock when kernel panic
	exfat: fix shift-out-of-bounds in exfat_fill_super()
	zonefs: Fix file size of zones in full condition
	kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
	thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
	cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
	cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
	cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
	proc: don't allow async path resolution of /proc/thread-self components
	s390/vtime: fix inline assembly clobber list
	virtio/s390: implement virtio-ccw revision 2 correctly
	um: mm: check more comprehensively for stub changes
	um: defer killing userspace on page table update failures
	irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
	f2fs: fix out-of-repair __setattr_copy()
	f2fs: enforce the immutable flag on open files
	f2fs: flush data when enabling checkpoint back
	sparc32: fix a user-triggerable oops in clear_user()
	spi: fsl: invert spisel_boot signal on MPC8309
	spi: spi-synquacer: fix set_cs handling
	gfs2: fix glock confusion in function signal_our_withdraw
	gfs2: Don't skip dlm unlock if glock has an lvb
	gfs2: Lock imbalance on error path in gfs2_recover_one
	gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
	dm: fix deadlock when swapping to encrypted device
	dm table: fix iterate_devices based device capability checks
	dm table: fix DAX iterate_devices based device capability checks
	dm table: fix zoned iterate_devices based device capability checks
	dm writecache: fix performance degradation in ssd mode
	dm writecache: return the exact table values that were set
	dm writecache: fix writing beyond end of underlying device when shrinking
	dm era: Recover committed writeset after crash
	dm era: Update in-core bitset after committing the metadata
	dm era: Verify the data block size hasn't changed
	dm era: Fix bitset memory leaks
	dm era: Use correct value size in equality function of writeset tree
	dm era: Reinitialize bitset cache before digesting a new writeset
	dm era: only resize metadata in preresume
	drm/i915: Reject 446-480MHz HDMI clock on GLK
	kgdb: fix to kill breakpoints on initmem after boot
	ipv6: silence compilation warning for non-IPV6 builds
	net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
	wireguard: selftests: test multiple parallel streams
	wireguard: queueing: get rid of per-peer ring buffers
	net: sched: fix police ext initialization
	net: qrtr: Fix memory leak in qrtr_tun_open
	net_sched: fix RTNL deadlock again caused by request_module()
	ARM: dts: aspeed: Add LCLK to lpc-snoop
	Linux 5.10.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
2021-03-07 12:33:33 +01:00
Andrii Nakryiko
faf4b1fba2 bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h
[ Upstream commit a643bff752dcf72a07e1b2ab2f8587e4f51118be ]

Add bpf_patch_call_args() prototype. This function is called from BPF verifier
and only if CONFIG_BPF_JIT_ALWAYS_ON is not defined. This fixes compiler
warning about missing prototype in some kernel configurations.

Fixes: 1ea47e01ad ("bpf: add support for bpf_call to interpreter")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-2-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:22 +01:00
Sami Tolvanen
e97c57662c ANDROID: bpf: disable CFI in dispatcher functions
BPF dispatcher functions are patched at runtime to perform direct
instead of indirect calls. Disable CFI for the dispatcher functions
to avoid conflicts.

Bug: 145210207
Change-Id: Iea72f5a9fe09dd5adbb90b0174945707f42594b0
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2021-01-14 16:32:10 +00:00
Daniel Borkmann
4a8f87e60f bpf: Allow for map-in-map with dynamic inner array map entries
Recent work in f4d0525921 ("bpf: Add map_meta_equal map ops") and 134fede4ee
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.

We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.

The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.

Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.

Example code generation where inner map is dynamic sized array:

  # bpftool p d x i 125
  int handle__sys_enter(void * ctx):
  ; int handle__sys_enter(void *ctx)
     0: (b4) w1 = 0
  ; int key = 0;
     1: (63) *(u32 *)(r10 -4) = r1
     2: (bf) r2 = r10
  ;
     3: (07) r2 += -4
  ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
     4: (18) r1 = map[id:468]
     6: (07) r1 += 272
     7: (61) r0 = *(u32 *)(r2 +0)
     8: (35) if r0 >= 0x3 goto pc+5
     9: (67) r0 <<= 3
    10: (0f) r0 += r1
    11: (79) r0 = *(u64 *)(r0 +0)
    12: (15) if r0 == 0x0 goto pc+1
    13: (05) goto pc+1
    14: (b7) r0 = 0
    15: (b4) w6 = -1
  ; if (!inner_map)
    16: (15) if r0 == 0x0 goto pc+6
    17: (bf) r2 = r10
  ;
    18: (07) r2 += -4
  ; val = bpf_map_lookup_elem(inner_map, &key);
    19: (bf) r1 = r0                               | No inlining but instead
    20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
  ; return val ? *val : -1;                        | for inner array lookup.
    21: (15) if r0 == 0x0 goto pc+1
  ; return val ? *val : -1;
    22: (61) r6 = *(u32 *)(r0 +0)
  ; }
    23: (bc) w0 = w6
    24: (95) exit

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2020-10-11 10:21:04 -07:00
Hao Luo
63d9b80dcf bpf: Introducte bpf_this_cpu_ptr()
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
2020-10-02 15:00:49 -07:00
Hao Luo
eaa6bcb71e bpf: Introduce bpf_per_cpu_ptr()
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
2020-10-02 15:00:49 -07:00
Toke Høiland-Jørgensen
4a1e7c0c63 bpf: Support attaching freplace programs to multiple attach points
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
2020-09-29 13:09:24 -07:00
Toke Høiland-Jørgensen
3aac1ead5e bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach
In preparation for allowing multiple attachments of freplace programs, move
the references to the target program and trampoline into the
bpf_tracing_link structure when that is created. To do this atomically,
introduce a new mutex in prog->aux to protect writing to the two pointers
to target prog and trampoline, and rename the members to make it clear that
they are related.

With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk
2020-09-29 13:09:23 -07:00
Alan Maguire
c4d0bfb450 bpf: Add bpf_snprintf_btf helper
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF).  Its signature is

long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
		      u32 btf_ptr_size, u64 flags);

struct btf_ptr * specifies

- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
  are not to be confused with the BTF_F_* flags
  below that control how the btf_ptr is displayed; the
  flags member of the struct btf_ptr may be used to
  disambiguate types in kernel versus module BTF, etc;
  the main distinction is the flags relate to the type
  and information needed in identifying it; not how it
  is displayed.

For example a BPF program with a struct sk_buff *skb
could do the following:

	static struct btf_ptr b = { };

	b.ptr = skb;
	b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
	bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);

Default output looks like this:

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)0x000000007524fd8b,
 .data = (unsigned char *)0x000000007524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Flags modifying display are as follows:

- BTF_F_COMPACT:	no formatting around type information
- BTF_F_NONAME:		no struct/union member names/types
- BTF_F_PTR_RAW:	show raw (unobfuscated) pointer values;
			equivalent to %px.
- BTF_F_ZERO:		show zero-valued struct/union members;
			they are not displayed by default

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Alan Maguire
76654e67f3 bpf: Provide function to get vmlinux BTF information
It will be used later for BPF structure display support

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-2-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Toke Høiland-Jørgensen
f7b12b6fea bpf: verifier: refactor check_attach_btf_id()
The check_attach_btf_id() function really does three things:

1. It performs a bunch of checks on the program to ensure that the
   attachment is valid.

2. It stores a bunch of state about the attachment being requested in
   the verifier environment and struct bpf_prog objects.

3. It allocates a trampoline for the attachment.

This patch splits out (1.) and (3.) into separate functions which will
perform the checks, but return the computed values instead of directly
modifying the environment. This is done in preparation for reusing the
checks when the actual attachment is happening, which will allow tracing
programs to have multiple (compatible) attachments.

This also fixes a bug where a bunch of checks were skipped if a trampoline
already existed for the tracing target.

Fixes: 6ba43b761c ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Fixes: 1e6c62a882 ("bpf: Introduce sleepable BPF programs")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-28 17:10:34 -07:00
Toke Høiland-Jørgensen
efc68158c4 bpf: change logging calls from verbose() to bpf_log() and use log pointer
In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.

Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-28 17:09:59 -07:00
Song Liu
1b4d60ec16 bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
John Fastabend
ba5f4cfeac bpf: Add comment to document BTF type PTR_TO_BTF_ID_OR_NULL
The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types
denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET
and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the
type PTR_TO_SOCKET is guaranteed to _not_ have a null value.

In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly
different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value,
but it is safe to read this pointer in the program context because
the program context will handle any faults. The fallout is for
PTR_TO_BTF_ID the verifier can assume reads are safe, but can not
use the type in branch analysis. Additionally, authors need to be
extra careful when passing PTR_TO_BTF_ID into helpers. In general
helpers consuming type PTR_TO_BTF_ID will need to assume it may
be null.

Seeing the above is not obvious to readers without the back knowledge
lets add a comment in the type definition.

Editorial comment, as networking and tracing programs get closer
and more tightly merged we may need to consider a new type that we
can ensure is non-null for branch analysis and also passing into
helpers.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
2020-09-25 17:05:14 -07:00
Martin KaFai Lau
1df8f55a37 bpf: Enable bpf_skc_to_* sock casting helper to networking prog type
There is a constant need to add more fields into the bpf_tcp_sock
for the bpf programs running at tc, sock_ops...etc.

A current workaround could be to use bpf_probe_read_kernel().  However,
other than making another helper call for reading each field and missing
CO-RE, it is also not as intuitive to use as directly reading
"tp->lsndtime" for example.  While already having perfmon cap to do
bpf_probe_read_kernel(), it will be much easier if the bpf prog can
directly read from the tcp_sock.

This patch tries to do that by using the existing casting-helpers
bpf_skc_to_*() whose func_proto returns a btf_id.  For example, the
func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
kernel "struct tcp_sock".

These helpers are also added to is_ptr_cast_function().
It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
That will keep the ref-tracking works properly.

The bpf_skc_to_* helpers are made available to most of the bpf prog
types in filter.c. The bpf_skc_to_* helpers will be limited by
perfmon cap.

This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON.  The helper accepting
this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON).  The bpf_skc_to_*()
helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
they will accept pointer obtained from skb->sk.

Instead of specifying both arg_type and arg_btf_id in the same func_proto
which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
compatible_reg_types[] in verifier.c.  The reason is the arg_btf_id is
always the same.  Discussion in this thread:
https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/

The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
expecting a PTR_TO_BTF_ID which could be NULL.  This is the same
behavior as the existing helper taking ARG_PTR_TO_BTF_ID.

The _SOCK_COMMON part means the helper is also expecting the legacy
SOCK_COMMON pointer.

By excluding the _OR_NULL part, the bpf prog cannot call helper
with a literal NULL which doesn't make sense in most cases.
e.g. bpf_skc_to_tcp_sock(NULL) will be rejected.  All PTR_TO_*_OR_NULL
reg has to do a NULL check first before passing into the helper or else
the bpf prog will be rejected.  This behavior is nothing new and
consistent with the current expectation during bpf-prog-load.

[ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
  ARG_PTR_TO_SOCK* of other existing helpers later such that
  those existing helpers can take the PTR_TO_BTF_ID returned by
  the bpf_skc_to_*() helpers.

  The only special case is bpf_sk_lookup_assign() which can accept a
  literal NULL ptr.  It has to be handled specially in another follow
  up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
  to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ]

[ When converting the older helpers that take ARG_PTR_TO_SOCK* in
  the later patch, if the kernel does not support BTF,
  ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
  because no reg->type could have PTR_TO_BTF_ID in this case.

  It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
  here though because these helpers must require BTF vmlinux to begin
  with. ]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Lorenz Bauer
f79e7ea571 bpf: Use a table to drive helper arg type checks
The mapping between bpf_arg_type and bpf_reg_type is encoded in a big
hairy if statement that is hard to follow. The debug output also leaves
to be desired: if a reg_type doesn't match we only print one of the
options, instead printing all the valid ones.

Convert the if statement into a table which is then used to drive type
checking. If none of the reg_types match we print all options, e.g.:

    R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-12-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
9436ef6e86 bpf: Allow specifying a BTF ID per argument in function protos
Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal
which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of
IDs, one for each argument. This array is only accessed up to the highest
numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than
five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id
is a function pointer that is called by the verifier if present. It gets the
actual BTF ID of the register, and the argument number we're currently checking.
It turns out that the only user check_arg_btf_id ignores the argument, and is
simply used to check whether the BTF ID has a struct sock_common at it's start.

Replace both of these mechanisms with an explicit BTF ID for each argument
in a function proto. Thanks to btf_struct_ids_match this is very flexible:
check_arg_btf_id can be replaced by requiring struct sock_common.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Lorenz Bauer
2af30f115d btf: Make btf_set_contains take a const pointer
bsearch doesn't modify the contents of the array, so we can take a const pointer.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-2-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Maciej Fijalkowski
ebf7d1f508 bpf, x64: rework pro/epilogue and tailcall handling in JIT
This commit serves two things:
1) it optimizes BPF prologue/epilogue generation
2) it makes possible to have tailcalls within BPF subprogram

Both points are related to each other since without 1), 2) could not be
achieved.

In [1], Alexei says:
"The prologue will look like:
nop5
xor eax,eax  // two new bytes if bpf_tail_call() is used in this
             // function
push rbp
mov rbp, rsp
sub rsp, rounded_stack_depth
push rax // zero init tail_call counter
variable number of push rbx,r13,r14,r15

Then bpf_tail_call will pop variable number rbx,..
and final 'pop rax'
Then 'add rsp, size_of_current_stack_frame'
jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov
rbp, rsp'

This way new function will set its own stack size and will init tail
call
counter with whatever value the parent had.

If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'.
Instead it would need to have 'nop2' in there."

Implement that suggestion.

Since the layout of stack is changed, tail call counter handling can not
rely anymore on popping it to rbx just like it have been handled for
constant prologue case and later overwrite of rbx with actual value of
rbx pushed to stack. Therefore, let's use one of the register (%rcx) that
is considered to be volatile/caller-saved and pop the value of tail call
counter in there in the epilogue.

Drop the BUILD_BUG_ON in emit_prologue and in
emit_bpf_tail_call_indirect where instruction layout is not constant
anymore.

Introduce new poke target, 'tailcall_bypass' to poke descriptor that is
dedicated for skipping the register pops and stack unwind that are
generated right before the actual jump to target program.
For case when the target program is not present, BPF program will skip
the pop instructions and nop5 dedicated for jmpq $target. An example of
such state when only R6 of callee saved registers is used by program:

ffffffffc0513aa1:       e9 0e 00 00 00          jmpq   0xffffffffc0513ab4
ffffffffc0513aa6:       5b                      pop    %rbx
ffffffffc0513aa7:       58                      pop    %rax
ffffffffc0513aa8:       48 81 c4 00 00 00 00    add    $0x0,%rsp
ffffffffc0513aaf:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffffc0513ab4:       48 89 df                mov    %rbx,%rdi

When target program is inserted, the jump that was there to skip
pops/nop5 will become the nop5, so CPU will go over pops and do the
actual tailcall.

One might ask why there simply can not be pushes after the nop5?
In the following example snippet:

ffffffffc037030c:       48 89 fb                mov    %rdi,%rbx
(...)
ffffffffc0370332:       5b                      pop    %rbx
ffffffffc0370333:       58                      pop    %rax
ffffffffc0370334:       48 81 c4 00 00 00 00    add    $0x0,%rsp
ffffffffc037033b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffffc0370340:       48 81 ec 00 00 00 00    sub    $0x0,%rsp
ffffffffc0370347:       50                      push   %rax
ffffffffc0370348:       53                      push   %rbx
ffffffffc0370349:       48 89 df                mov    %rbx,%rdi
ffffffffc037034c:       e8 f7 21 00 00          callq  0xffffffffc0372548

There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall
and jump target is not present. ctx is in %rbx register and BPF
subprogram that we will call into on ffffffffc037034c is relying on it,
e.g. it will pick ctx from there. Such code layout is therefore broken
as we would overwrite the content of %rbx with the value that was pushed
on the prologue. That is the reason for the 'bypass' approach.

Special care needs to be taken during the install/update/remove of
tailcall target. In case when target program is not present, the CPU
must not execute the pop instructions that precede the tailcall.

To address that, the following states can be defined:
A nop, unwind, nop
B nop, unwind, tail
C skip, unwind, nop
D skip, unwind, tail

A is forbidden (lead to incorrectness). The state transitions between
tailcall install/update/remove will work as follows:

First install tail call f: C->D->B(f)
 * poke the tailcall, after that get rid of the skip
Update tail call f to f': B(f)->B(f')
 * poke the tailcall (poke->tailcall_target) and do NOT touch the
   poke->tailcall_bypass
Remove tail call: B(f')->C(f')
 * poke->tailcall_bypass is poked back to jump, then we wait the RCU
   grace period so that other programs will finish its execution and
   after that we are safe to remove the poke->tailcall_target
Install new tail call (f''): C(f')->D(f'')->B(f'').
 * same as first step

This way CPU can never be exposed to "unwind, tail" state.

Last but not least, when tailcalls get mixed with bpf2bpf calls, it
would be possible to encounter the endless loop due to clearing the
tailcall counter if for example we would use the tailcall3-like from BPF
selftests program that would be subprogram-based, meaning the tailcall
would be present within the BPF subprogram.

This test, broken down to particular steps, would do:
entry -> set tailcall counter to 0, bump it by 1, tailcall to func0
func0 -> call subprog_tail
(we are NOT skipping the first 11 bytes of prologue and this subprogram
has a tailcall, therefore we clear the counter...)
subprog -> do the same thing as entry

and then loop forever.

To address this, the idea is to go through the call chain of bpf2bpf progs
and look for a tailcall presence throughout whole chain. If we saw a single
tail call then each node in this call chain needs to be marked as a subprog
that can reach the tailcall. We would later feed the JIT with this info
and:
- set eax to 0 only when tailcall is reachable and this is the entry prog
- if tailcall is reachable but there's no tailcall in insns of currently
  JITed prog then push rax anyway, so that it will be possible to
  propagate further down the call chain
- finally if tailcall is reachable, then we need to precede the 'call'
  insn with mov rax, [rbp - (stack_depth + 8)]

Tail call related cases from test_verifier kselftest are also working
fine. Sample BPF programs that utilize tail calls (sockex3, tracex5)
work properly as well.

[1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:55:30 -07:00
Maciej Fijalkowski
cf71b174d3 bpf: rename poke descriptor's 'ip' member to 'tailcall_target'
Reflect the actual purpose of poke->ip and rename it to
poke->tailcall_target so that it will not the be confused with another
poke target that will be introduced in next commit.

While at it, do the same thing with poke->ip_stable - rename it to
poke->tailcall_target_stable.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 12:59:31 -07:00
Maciej Fijalkowski
a748c6975d bpf: propagate poke descriptors to subprograms
Previously, there was no need for poke descriptors being present in
subprogram's bpf_prog_aux struct since tailcalls were simply not allowed
in them. Each subprog is JITed independently so in order to enable
JITing subprograms that use tailcalls, do the following:

- in fixup_bpf_calls() store the index of tailcall insn onto the generated
  poke descriptor,
- in case when insn patching occurs, adjust the tailcall insn idx from
  bpf_patch_insn_data,
- then in jit_subprogs() check whether the given poke descriptor belongs
  to the current subprog by checking if that previously stored absolute
  index of tail call insn is in the scope of the insns of given subprog,
- update the insn->imm with new poke descriptor slot so that while JITing
  the proper poke descriptor will be grabbed

This way each of the main program's poke descriptors are distributed
across the subprograms poke descriptor array, so main program's
descriptors can be untracked out of the prog array map.

Add also subprog's aux struct to the BPF map poke_progs list by calling
on it map_poke_track().

In case of any error, call the map_poke_untrack() on subprog's aux
structs that have already been registered to prog array map.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 12:59:31 -07:00
YiFei Zhu
984fe94f94 bpf: Mutex protect used_maps array and count
To support modifying the used_maps array, we use a mutex to protect
the use of the counter and the array. The mutex is initialized right
after the prog aux is allocated, and destroyed right before prog
aux is freed. This way we guarantee it's initialized for both cBPF
and eBPF.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Link: https://lore.kernel.org/bpf/20200915234543.3220146-2-sdf@google.com
2020-09-15 18:28:27 -07:00
Alexei Starovoitov
07be4c4a3e bpf: Add bpf_copy_from_user() helper.
Sleepable BPF programs can now use copy_from_user() to access user memory.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
2020-08-28 21:20:33 +02:00
Alexei Starovoitov
1e6c62a882 bpf: Introduce sleepable BPF programs
Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.

The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.

There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.

When rcu_read_lock_trace is held it means that some sleepable bpf program is
running from bpf trampoline. Those programs can use bpf arrays and preallocated
hash/lru maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();

Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.

This is the first step of introducing sleepable progs. Eventually dynamically
allocated hash maps can be allowed and networking program types can become
sleepable too.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2020-08-28 21:20:33 +02:00
Martin KaFai Lau
f4d0525921 bpf: Add map_meta_equal map ops
Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.

In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time.  It limits the use case that
wants to replace a smaller inner map with a larger inner map.  There are
some maps do use max_entries during verification though.  For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.

To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops.  Each map-type can decide what to check when its
map is used as an inner map during runtime.

Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists.  This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops.  It is based on the
discussion in [1].

All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch.  A later patch will
relax the max_entries check for most maps.  bpf_types.h
counts 28 map types.  This patch adds 23 ".map_meta_equal"
by using coccinelle.  -5 for
	BPF_MAP_TYPE_PROG_ARRAY
	BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
	BPF_MAP_TYPE_STRUCT_OPS
	BPF_MAP_TYPE_ARRAY_OF_MAPS
	BPF_MAP_TYPE_HASH_OF_MAPS

The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.

[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 15:41:30 +02:00
Jiri Olsa
eae2e83e62 bpf: Add BTF_SET_START/END macros
Adding support to define sorted set of BTF ID values.

Following defines sorted set of BTF ID values:

  BTF_SET_START(btf_allowlist_d_path)
  BTF_ID(func, vfs_truncate)
  BTF_ID(func, vfs_fallocate)
  BTF_ID(func, dentry_open)
  BTF_ID(func, vfs_getattr)
  BTF_ID(func, filp_close)
  BTF_SET_END(btf_allowlist_d_path)

It defines following 'struct btf_id_set' variable to access
values and count:

  struct btf_id_set btf_allowlist_d_path;

Adding 'allowed' callback to struct bpf_func_proto, to allow
verifier the check on allowed callers.

Adding btf_id_set_contains function, which will be used by
allowed callbacks to verify the caller's BTF ID value is
within allowed set.

Also removing extra '\' in __BTF_ID_LIST macro.

Added BTF_SET_START_GLOBAL macro for global sets.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-10-jolsa@kernel.org
2020-08-25 15:37:41 -07:00
Jiri Olsa
faaf4a790d bpf: Add btf_struct_ids_match function
Adding btf_struct_ids_match function to check if given address provided
by BTF object + offset is also address of another nested BTF object.

This allows to pass an argument to helper, which is defined via parent
BTF object + offset, like for bpf_d_path (added in following changes):

  SEC("fentry/filp_close")
  int BPF_PROG(prog_close, struct file *file, void *id)
  {
    ...
    ret = bpf_d_path(&file->f_path, ...

The first bpf_d_path argument is hold by verifier as BTF file object
plus offset of f_path member.

The btf_struct_ids_match function will walk the struct file object and
check if there's nested struct path object on the given offset.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-9-jolsa@kernel.org
2020-08-25 15:37:41 -07:00
KP Singh
f836a56e84 bpf: Generalize bpf_sk_storage
Refactor the functionality in bpf_sk_storage.c so that concept of
storage linked to kernel objects can be extended to other objects like
inode, task_struct etc.

Each new local storage will still be a separate map and provide its own
set of helpers. This allows for future object specific extensions and
still share a lot of the underlying implementation.

This includes the changes suggested by Martin in:

  https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/

adding new map operations to support bpf_local_storage maps:

* storages for different kernel objects to optionally have different
  memory charging strategy (map_local_storage_charge,
  map_local_storage_uncharge)
* Functionality to extract the storage pointer from a pointer to the
  owning object (map_owner_storage_ptr)

Co-developed-by: Martin KaFai Lau <kafai@fb.com>

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-4-kpsingh@chromium.org
2020-08-25 15:00:04 -07:00
Lorenz Bauer
13b79d3ffb bpf: sockmap: Call sock_map_update_elem directly
Don't go via map->ops to call sock_map_update_elem, since we know
what function to call in bpf_map_update_value. Since we currently
don't allow calling map_update_elem from BPF context, we can remove
ops->map_update_elem and rename the function to sock_map_update_elem_sys.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200821102948.21918-4-lmb@cloudflare.com
2020-08-21 15:16:11 -07:00
Yonghong Song
b76f222690 bpf: Implement link_query callbacks in map element iterators
For bpf_map_elem and bpf_sk_local_storage bpf iterators,
additional map_id should be shown for fdinfo and
userspace query. For example, the following is for
a bpf_map_elem iterator.
  $ cat /proc/1753/fdinfo/9
  pos:    0
  flags:  02000000
  mnt_id: 14
  link_type:      iter
  link_id:        34
  prog_tag:       104be6d3fe45e6aa
  prog_id:        173
  target_name:    bpf_map_elem
  map_id: 127

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821184419.574240-1-yhs@fb.com
2020-08-21 14:01:39 -07:00
Yonghong Song
6b0a249a30 bpf: Implement link_query for bpf iterators
This patch implemented bpf_link callback functions
show_fdinfo and fill_link_info to support link_query
interface.

The general interface for show_fdinfo and fill_link_info
will print/fill the target_name. Each targets can
register show_fdinfo and fill_link_info callbacks
to print/fill more target specific information.

For example, the below is a fdinfo result for a bpf
task iterator.
  $ cat /proc/1749/fdinfo/7
  pos:    0
  flags:  02000000
  mnt_id: 14
  link_type:      iter
  link_id:        11
  prog_tag:       990e1f8152f7e54f
  prog_id:        59
  target_name:    task

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200821184418.574122-1-yhs@fb.com
2020-08-21 14:01:39 -07:00
Alexei Starovoitov
005142b8a1 bpf: Factor out bpf_link_by_id() helper.
Refactor the code a bit to extract bpf_link_by_id() helper.
It's similar to existing bpf_prog_by_id().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200819042759.51280-2-alexei.starovoitov@gmail.com
2020-08-20 16:02:36 +02:00
Yonghong Song
5e7b30205c bpf: Change uapi for bpf iterator map elements
Commit a5cbe05a66 ("bpf: Implement bpf iterator for
map elements") added bpf iterator support for
map elements. The map element bpf iterator requires
info to identify a particular map. In the above
commit, the attr->link_create.target_fd is used
to carry map_fd and an enum bpf_iter_link_info
is added to uapi to specify the target_fd actually
representing a map_fd:
    enum bpf_iter_link_info {
	BPF_ITER_LINK_UNSPEC = 0,
	BPF_ITER_LINK_MAP_FD = 1,

	MAX_BPF_ITER_LINK_INFO,
    };

This is an extensible approach as we can grow
enumerator for pid, cgroup_id, etc. and we can
unionize target_fd for pid, cgroup_id, etc.
But in the future, there are chances that
more complex customization may happen, e.g.,
for tasks, it could be filtered based on
both cgroup_id and user_id.

This patch changed the uapi to have fields
	__aligned_u64	iter_info;
	__u32		iter_info_len;
for additional iter_info for link_create.
The iter_info is defined as
	union bpf_iter_link_info {
		struct {
			__u32   map_fd;
		} map;
	};

So future extension for additional customization
will be easier. The bpf_iter_link_info will be
passed to target callback to validate and generic
bpf_iter framework does not need to deal it any
more.

Note that map_fd = 0 will be considered invalid
and -EBADF will be returned to user space.

Fixes: a5cbe05a66 ("bpf: Implement bpf iterator for map elements")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
2020-08-06 16:39:14 -07:00
Andrii Nakryiko
73b11c2ab0 bpf: Add support for forced LINK_DETACH command
Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.

This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com
2020-08-01 20:38:28 -07:00
Andrii Nakryiko
6cc7d1e8e9 bpf: Make bpf_link API available indepently of CONFIG_BPF_SYSCALL
Similarly to bpf_prog, make bpf_link and related generic API available
unconditionally to make it easier to have bpf_link support in various parts of
the kernel. Stub out init/prime/settle/cleanup and inc/put APIs.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com
2020-07-25 20:37:01 -07:00
Song Liu
7b04d6d60f bpf: Separate bpf_get_[stack|stackid] for perf events BPF
Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.

Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, add separate version of these
two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in
bpf_perf_event_data_kern->data->callchain.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
2020-07-25 20:16:34 -07:00
Yonghong Song
a5cbe05a66 bpf: Implement bpf iterator for map elements
The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
  bpf_iter_meta *meta: the meta data
  bpf_map *map:        the bpf_map whose elements are traversed
  void *key:           the key of one element
  void *value:         the value of the same element

Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.

A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.

In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.

Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
2020-07-25 20:16:32 -07:00
Yonghong Song
afbf21dce6 bpf: Support readonly/readwrite buffers in verifier
Readonly and readwrite buffer register states
are introduced. Totally four states,
PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL]
are supported. As suggested by their respective
names, PTR_TO_RDONLY_BUF[_OR_NULL] are for
readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL]
for read/write buffers.

These new register states will be used
by later bpf map element iterator.

New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.

Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or
PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.

The verifier is also enhanced such that PTR_TO_RDONLY_BUF
can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and
PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or
ARG_PTR_TO_UNINIT_MEM.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
2020-07-25 20:16:32 -07:00
Yonghong Song
f9c7927295 bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t
This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
2020-07-25 20:16:32 -07:00
Yonghong Song
14fc6bd6b7 bpf: Refactor bpf_iter_reg to have separate seq_info member
There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.

This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com
2020-07-25 20:16:32 -07:00
Alexei Starovoitov
a228a64fc1 bpf: Add bpf_prog iterator
It's mostly a copy paste of commit 6086d29def ("bpf: Add bpf_map iterator")
that is use to implement bpf_seq_file opreations to traverse all bpf programs.

v1->v2: Tweak to use build time btf_id

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2020-07-25 20:16:32 -07:00
Yonghong Song
951cf368bc bpf: net: Use precomputed btf_id for bpf iterators
One additional field btf_id is added to struct
bpf_ctx_arg_aux to store the precomputed btf_ids.
The btf_id is computed at build time with
BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions.
All existing bpf iterators are changed to used
pre-compute btf_ids.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Yonghong Song
bc4f0548f6 bpf: Compute bpf_skc_to_*() helper socket btf ids at build time
Currently, socket types (struct tcp_sock, udp_sock, etc.)
used by bpf_skc_to_*() helpers are computed when vmlinux_btf
is first built in the kernel.

Commit 5a2798ab32
("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros")
implemented a mechanism to compute btf_ids at kernel build
time which can simplify kernel implementation and reduce
runtime overhead by removing in-kernel btf_id calculation.
This patch did exactly this, removing in-kernel btf_id
computation and utilizing build-time btf_id computation.

If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will
define an array with size of 5, which is not enough for
btf_sock_ids. So define its own static array if
CONFIG_DEBUG_INFO_BTF is not defined.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Jakub Sitnicki
e9ddbb7707 bpf: Introduce SK_LOOKUP program type with a dedicated attach point
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
Jakub Sitnicki
ce3aa9cc51 bpf, netns: Handle multiple link attachments
Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.

This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
Lorenzo Bianconi
9216477449 bpf: cpumap: Add the possibility to attach an eBPF program to cpumap
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.

This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:

$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Song Liu
fa28dcb82a bpf: Introduce helper bpf_get_task_stack()
Introduce helper bpf_get_task_stack(), which dumps stack trace of given
task. This is different to bpf_get_stack(), which gets stack track of
current task. One potential use case of bpf_get_task_stack() is to call
it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.

bpf_get_task_stack() uses stack_trace_save_tsk() instead of
get_perf_callchain() for kernel stack. The benefit of this choice is that
stack_trace_save_tsk() doesn't require changes in arch/. The downside of
using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
stack trace to unsigned long array. For 32-bit systems, we need to
translate it to u64 array.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630062846.664389-3-songliubraving@fb.com
2020-07-01 08:23:19 -07:00
Lorenz Bauer
bb0de3131f bpf: sockmap: Require attach_bpf_fd when detaching a program
The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.

Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
2020-06-30 10:46:39 -07:00
Yonghong Song
0d4fad3e57 bpf: Add bpf_skc_to_udp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a udp6_sock pointer.
The return value could be NULL if the casting is illegal.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200623230815.3988481-1-yhs@fb.com
2020-06-24 18:37:59 -07:00