49c25af89c6ad02b60a1a2bcbe16ecef330782cf
830 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
49c25af89c |
Revert "bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE"
This reverts commit
|
||
![]() |
477f5e6b9e |
Merge 5.10.188 into android12-5.10-lts
Changes in 5.10.188 media: atomisp: fix "variable dereferenced before check 'asd'" x86/smp: Use dedicated cache-line for mwait_play_dead() can: isotp: isotp_sendmsg(): fix return error fix on TX path video: imsttfb: check for ioremap() failures fbdev: imsttfb: Fix use after free bug in imsttfb_probe HID: wacom: Use ktime_t rather than int when dealing with timestamps HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651. Revert "thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe" scripts/tags.sh: Resolve gtags empty index generation drm/amdgpu: Validate VM ioctl flags. nubus: Partially revert proc_create_single_data() conversion fs: pipe: reveal missing function protoypes x86/resctrl: Only show tasks' pid in current pid namespace blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost md/raid10: check slab-out-of-bounds in md_bitmap_get_counter md/raid10: fix overflow of md/safe_mode_delay md/raid10: fix wrong setting of max_corr_read_errors md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request md/raid10: fix io loss while replacement replace rdev irqchip/jcore-aic: Kill use of irq_create_strict_mappings() irqchip/jcore-aic: Fix missing allocation of IRQ descriptors posix-timers: Prevent RT livelock in itimer_delete() tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode(). clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe PM: domains: fix integer overflow issues in genpd_parse_state() perf/arm-cmn: Fix DTC reset powercap: RAPL: Fix CONFIG_IOSF_MBI dependency ARM: 9303/1: kprobes: avoid missing-declaration warnings cpufreq: intel_pstate: Fix energy_performance_preference for passive thermal/drivers/sun8i: Fix some error handling paths in sun8i_ths_probe() rcuscale: Console output claims too few grace periods rcuscale: Always log error message rcuscale: Move shutdown from wait_event() to wait_event_idle() rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale perf/ibs: Fix interface via core pmu events x86/mm: Fix __swp_entry_to_pte() for Xen PV guests evm: Complete description of evm_inode_setattr() ima: Fix build warnings pstore/ram: Add check for kstrdup igc: Enable and fix RX hash usage by netstack wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx samples/bpf: Fix buffer overflow in tcp_basertt spi: spi-geni-qcom: Correct CS_TOGGLE bit in SPI_TRANS_CFG wifi: wilc1000: fix for absent RSN capabilities WFA testcase wifi: mwifiex: Fix the size of a memory allocation in mwifiex_ret_802_11_scan() bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE sctp: add bpf_bypass_getsockopt proto callback libbpf: fix offsetof() and container_of() to work with CO-RE nfc: constify several pointers to u8, char and sk_buff nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect() bpftool: JIT limited misreported as negative value on aarch64 regulator: core: Fix more error checking for debugfs_create_dir() regulator: core: Streamline debugfs operations wifi: orinoco: Fix an error handling path in spectrum_cs_probe() wifi: orinoco: Fix an error handling path in orinoco_cs_probe() wifi: atmel: Fix an error handling path in atmel_probe() wl3501_cs: Fix misspelling and provide missing documentation net: create netdev->dev_addr assignment helpers wl3501_cs: use eth_hw_addr_set() wifi: wl3501_cs: Fix an error handling path in wl3501_probe() wifi: ray_cs: Utilize strnlen() in parse_addr() wifi: ray_cs: Drop useless status variable in parse_addr() wifi: ray_cs: Fix an error handling path in ray_probe() wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config watchdog/perf: more properly prevent false positives with turbo modes kexec: fix a memory leak in crash_shrink_memory() memstick r592: make memstick_debug_get_tpc_name() static wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key() rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO wifi: iwlwifi: pull from TXQs with softirqs disabled wifi: cfg80211: rewrite merging of inherited elements wifi: ath9k: convert msecs to jiffies where needed igc: Fix race condition in PTP tx code net: stmmac: fix double serdes powerdown netlink: fix potential deadlock in netlink_set_err() netlink: do not hard code device address lenth in fdb dumps selftests: rtnetlink: remove netdevsim device after ipsec offload test gtp: Fix use-after-free in __gtp_encap_destroy(). net: axienet: Move reset before 64-bit DMA detection sfc: fix crash when reading stats while NIC is resetting nfc: llcp: simplify llcp_sock_connect() error paths net: nfc: Fix use-after-free caused by nfc_llcp_find_local lib/ts_bm: reset initial match offset for every block of text netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value. ipvlan: Fix return value of ipvlan_queue_xmit() netlink: Add __sock_i_ino() for __netlink_diag_dump(). radeon: avoid double free in ci_dpm_init() drm/amd/display: Explicitly specify update type per plane info change Input: drv260x - sleep between polling GO bit drm/bridge: tc358768: always enable HS video mode drm/bridge: tc358768: fix PLL parameters computation drm/bridge: tc358768: fix PLL target frequency drm/bridge: tc358768: fix TCLK_ZEROCNT computation drm/bridge: tc358768: Add atomic_get_input_bus_fmts() implementation drm/bridge: tc358768: fix TCLK_TRAILCNT computation drm/bridge: tc358768: fix THS_ZEROCNT computation drm/bridge: tc358768: fix TXTAGOCNT computation drm/bridge: tc358768: fix THS_TRAILCNT computation drm/vram-helper: fix function names in vram helper doc ARM: dts: BCM5301X: Drop "clock-names" from the SPI node ARM: dts: meson8b: correct uart_B and uart_C clock references Input: adxl34x - do not hardcode interrupt trigger type drm: sun4i_tcon: use devm_clk_get_enabled in `sun4i_tcon_init_clocks` drm/panel: sharp-ls043t1le01: adjust mode settings ARM: dts: stm32: Move ethernet MAC EEPROM from SoM to carrier boards bus: ti-sysc: Fix dispc quirk masking bool variables arm64: dts: microchip: sparx5: do not use PSCI on reference boards RDMA/bnxt_re: Disable/kill tasklet only if it is enabled RDMA/bnxt_re: Fix to remove unnecessary return labels RDMA/bnxt_re: Use unique names while registering interrupts RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid RDMA/bnxt_re: Fix to remove an unnecessary log ARM: dts: gta04: Move model property out of pinctrl node arm64: dts: qcom: msm8916: correct camss unit address arm64: dts: qcom: msm8994: correct SPMI unit address arm64: dts: qcom: msm8996: correct camss unit address drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H ARM: ep93xx: fix missing-prototype warnings ARM: omap2: fix missing tick_broadcast() prototype arm64: dts: qcom: apq8096: fix fixed regulator name property ARM: dts: stm32: Shorten the AV96 HDMI sound card name memory: brcmstb_dpfe: fix testing array offset after use ASoC: es8316: Increment max value for ALC Capture Target Volume control ASoC: es8316: Do not set rate constraints for unsupported MCLKs ARM: dts: meson8: correct uart_B and uart_C clock references soc/fsl/qe: fix usb.c build errors IB/hfi1: Use bitmap_zalloc() when applicable IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix wrong mmu_node used for user SDMA packet after invalidate RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions RDMA/hns: Fix coding style issues RDMA/hns: Use refcount_t APIs for HEM RDMA/hns: Clean the hardware related code for HEM RDMA/hns: Fix hns_roce_table_get return value ARM: dts: iwg20d-q7-common: Fix backlight pwm specifier arm64: dts: renesas: ulcb-kf: Remove flow control for SCIF1 fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe() arm64: dts: ti: k3-j7200: Fix physical address of pin ARM: dts: stm32: Fix audio routing on STM32MP15xx DHCOM PDK2 ARM: dts: stm32: fix i2s endpoint format property for stm32mp15xx-dkx hwmon: (gsc-hwmon) fix fan pwm temperature scaling hwmon: (adm1275) enable adm1272 temperature reporting hwmon: (adm1275) Allow setting sample averaging hwmon: (pmbus/adm1275) Fix problems with temperature monitoring on ADM1272 ARM: dts: BCM5301X: fix duplex-full => full-duplex drm/amdkfd: Fix potential deallocation of previously deallocated memory. drm/radeon: fix possible division-by-zero errors amdgpu: validate offset_in_bo of drm_amdgpu_gem_va RDMA/bnxt_re: wraparound mbox producer index RDMA/bnxt_re: Avoid calling wake_up threads from spin_lock context clk: imx: clk-imx8mn: fix memory leak in imx8mn_clocks_probe clk: imx: clk-imx8mp: improve error handling in imx8mp_clocks_probe() clk: tegra: tegra124-emc: Fix potential memory leak ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer drm/msm/dpu: do not enable color-management if DSPPs are not available drm/msm/dp: Free resources after unregistering them clk: vc5: check memory returned by kasprintf() clk: cdce925: check return value of kasprintf() clk: si5341: Allow different output VDD_SEL values clk: si5341: Add sysfs properties to allow checking/resetting device faults clk: si5341: return error if one synth clock registration fails clk: si5341: check return value of {devm_}kasprintf() clk: si5341: free unused memory on probe failure clk: keystone: sci-clk: check return value of kasprintf() clk: ti: clkctrl: check return value of kasprintf() drivers: meson: secure-pwrc: always enable DMA domain ovl: update of dentry revalidate flags after copy up ASoC: imx-audmix: check return value of devm_kasprintf() PCI: cadence: Fix Gen2 Link Retraining process scsi: qedf: Fix NULL dereference in error handling pinctrl: bcm2835: Handle gpiochip_add_pin_range() errors PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe() PCI: pciehp: Cancel bringup sequence if card is not present PCI: ftpci100: Release the clock resources PCI: Add pci_clear_master() stub for non-CONFIG_PCI perf bench: Use unbuffered output when pipe/tee'ing to a file perf bench: Add missing setlocale() call to allow usage of %'d style formatting pinctrl: cherryview: Return correct value if pin in push-pull mode kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures perf script: Fixup 'struct evsel_script' method prefix perf script: Fix allocation of evsel->priv related to per-event dump files perf dwarf-aux: Fix off-by-one in die_get_varname() pinctrl: at91-pio4: check return value of devm_kasprintf() powerpc/powernv/sriov: perform null check on iov before dereferencing iov mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfo powerpc/mm/dax: Fix the condition when checking if altmap vmemap can cross-boundary hwrng: virtio - add an internal buffer hwrng: virtio - don't wait on cleanup hwrng: virtio - don't waste entropy hwrng: virtio - always add a pending request hwrng: virtio - Fix race on data_avail and actual data crypto: nx - fix build warnings when DEBUG_FS is not enabled modpost: fix section mismatch message for R_ARM_ABS32 modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24} crypto: marvell/cesa - Fix type mismatch warning modpost: fix off by one in is_executable_section() ARC: define ASM_NL and __ALIGN(_STR) outside #ifdef __ASSEMBLY__ guard NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION dax: Fix dax_mapping_release() use after free dax: Introduce alloc_dev_dax_id() hwrng: st - keep clock enabled while hwrng is registered io_uring: ensure IOPOLL locks around deferred work USB: serial: option: add LARA-R6 01B PIDs usb: dwc3: gadget: Propagate core init errors to UDC during pullup phy: tegra: xusb: Clear the driver reference in usb-phy dev block: fix signed int overflow in Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h SUNRPC: Fix UAF in svc_tcp_listen_data_ready() w1: w1_therm: fix locking behavior in convert_t w1: fix loop in w1_fini() sh: j2: Use ioremap() to translate device tree address into kernel memory serial: 8250: omap: Fix freeing of resources on failed register clk: qcom: gcc-ipq6018: Use floor ops for sdcc clocks media: usb: Check az6007_read() return value media: videodev2.h: Fix struct v4l2_input tuner index comment media: usb: siano: Fix warning due to null work_func_t function pointer clk: qcom: reset: Allow specifying custom reset delay clk: qcom: reset: support resetting multiple bits clk: qcom: ipq6018: fix networking resets usb: dwc3: qcom: Fix potential memory leak usb: gadget: u_serial: Add null pointer check in gserial_suspend extcon: Fix kernel doc of property fields to avoid warnings extcon: Fix kernel doc of property capability fields to avoid warnings usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe() usb: hide unused usbfs_notify_suspend/resume functions serial: 8250: lock port for stop_rx() in omap8250_irq() serial: 8250: lock port for UART_IER access in omap8250_irq() kernfs: fix missing kernfs_idr_lock to remove an ID from the IDR coresight: Fix loss of connection info when a module is unloaded mfd: rt5033: Drop rt5033-battery sub-device media: venus: helpers: Fix ALIGN() of non power of two media: atomisp: gmin_platform: fix out_len in gmin_get_config_dsm_var() KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes usb: dwc3: qcom: Release the correct resources in dwc3_qcom_remove() usb: dwc3: qcom: Fix an error handling path in dwc3_qcom_probe() usb: common: usb-conn-gpio: Set last role to unknown before initial detection usb: dwc3-meson-g12a: Fix an error handling path in dwc3_meson_g12a_probe() mfd: intel-lpss: Add missing check for platform_get_resource Revert "usb: common: usb-conn-gpio: Set last role to unknown before initial detection" serial: 8250_omap: Use force_suspend and resume for system suspend test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation mfd: stmfx: Fix error path in stmfx_chip_init mfd: stmfx: Nullify stmfx->vdd in case of error KVM: s390: vsie: fix the length of APCB bitmap mfd: stmpe: Only disable the regulators if they are enabled phy: tegra: xusb: check return value of devm_kzalloc() pwm: imx-tpm: force 'real_period' to be zero in suspend pwm: sysfs: Do not apply state to already disabled PWMs rtc: st-lpc: Release some resources in st_rtc_probe() in case of error media: cec: i2c: ch7322: also select REGMAP sctp: fix potential deadlock on &net->sctp.addr_wq_lock Add MODULE_FIRMWARE() for FIRMWARE_TG357766. net: dsa: vsc73xx: fix MTU configuration spi: bcm-qspi: return error if neither hif_mspi nor mspi is available mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0 f2fs: fix error path handling in truncate_dnode() octeontx2-af: Fix mapping for NIX block from CGX connection powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode tcp: annotate data races in __tcp_oow_rate_limited() xsk: Honor SO_BINDTODEVICE on bind net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX pptp: Fix fib lookup calls. net: dsa: tag_sja1105: fix MAC DA patching from meta frames s390/qeth: Fix vipa deletion sh: dma: Fix DMA channel offset calculation apparmor: fix missing error check for rhashtable_insert_fast i2c: xiic: Defer xiic_wakeup() and __xiic_start_xfer() in xiic_process() i2c: xiic: Don't try to handle more interrupt events after error ALSA: jack: Fix mutex call in snd_jack_report() i2c: qup: Add missing unwind goto in qup_i2c_probe() NFSD: add encoding of op_recall flag for write delegation io_uring: wait interruptibly for request completions on exit mmc: core: disable TRIM on Kingston EMMC04G-M627 mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M mmc: mmci: Set PROBE_PREFER_ASYNCHRONOUS mmc: sdhci: fix DMA configure compatibility issue when 64bit DMA mode is used. bcache: fixup btree_cache_wait list damage bcache: Remove unnecessary NULL point check in node allocations bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent um: Use HOST_DIR for mrproper integrity: Fix possible multiple allocation in integrity_inode_get() autofs: use flexible array in ioctl structure shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs jffs2: reduce stack usage in jffs2_build_xattr_subsystem() fs: avoid empty option when generating legacy mount string ext4: Remove ext4 locking of moved directory Revert "f2fs: fix potential corruption when moving a directory" fs: Establish locking order for unrelated directories fs: Lock moved directories btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile btrfs: fix race when deleting quota root from the dirty cow roots list ASoC: mediatek: mt8173: Fix irq error path ASoC: mediatek: mt8173: Fix snd_soc_component_initialize error path ARM: orion5x: fix d2net gpio initialization leds: trigger: netdev: Recheck NETDEV_LED_MODE_LINKUP on dev rename fs: no need to check source fanotify: disallow mount/sb marks on kernel internal pseudo fs tpm, tpm_tis: Claim locality in interrupt handler selftests/bpf: Add verifier test for PTR_TO_MEM spill block: add overflow checks for Amiga partition support sh: pgtable-3level: Fix cast to pointer from integer of different size netfilter: nf_tables: use net_generic infra for transaction data netfilter: nf_tables: add rescheduling points during loop detection walks netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE netfilter: nf_tables: fix chain binding transaction logic netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain netfilter: nf_tables: reject unbound anonymous set before commit phase netfilter: nf_tables: reject unbound chain set before commit phase netfilter: nftables: rename set element data activation/deactivation functions netfilter: nf_tables: drop map element references from preparation phase netfilter: nf_tables: unbind non-anonymous set if rule construction fails netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: conntrack: Avoid nf_ct_helper_hash uses after free netfilter: nf_tables: do not ignore genmask when looking up chain by id netfilter: nf_tables: prevent OOB access in nft_byteorder_eval wireguard: queueing: use saner cpu selection wrapping wireguard: netlink: send staged packets when setting initial private key tty: serial: fsl_lpuart: add earlycon for imx8ulp platform rcu-tasks: Mark ->trc_reader_nesting data races rcu-tasks: Mark ->trc_reader_special.b.need_qs data races rcu-tasks: Simplify trc_read_check_handler() atomic operations block/partition: fix signedness issue for Amiga partitions io_uring: Use io_schedule* in cqring wait io_uring: add reschedule point to handle_tw_list() net: lan743x: Don't sleep in atomic context workqueue: clean up WORK_* constant types, clarify masking drm/panel: simple: Add connector_type for innolux_at043tn24 drm/panel: simple: Add Powertip PH800480T013 drm_display_mode flags igc: Remove delay during TX ring configuration net/mlx5e: fix double free in mlx5e_destroy_flow_table net/mlx5e: Check for NOT_READY flag state after locking igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings scsi: qla2xxx: Fix error code in qla2x00_start_sp() net: mvneta: fix txq_map in case of txq_number==1 net/sched: cls_fw: Fix improper refcount update leads to use-after-free gve: Set default duplex configuration to full ionic: remove WARN_ON to prevent panic_on_warn net: bgmac: postpone turning IRQs off to avoid SoC hangs net: prevent skb corruption on frag list segmentation icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev(). udp6: fix udp6_ehashfn() typo ntb: idt: Fix error handling in idt_pci_driver_init() NTB: amd: Fix error handling in amd_ntb_pci_driver_init() ntb: intel: Fix error handling in intel_ntb_pci_driver_init() NTB: ntb_transport: fix possible memory leak while device_register() fails NTB: ntb_tool: Add check for devm_kcalloc ipv6/addrconf: fix a potential refcount underflow for idev platform/x86: wmi: remove unnecessary argument platform/x86: wmi: use guid_t and guid_equal() platform/x86: wmi: move variables platform/x86: wmi: Break possible infinite loop when parsing GUID igc: Fix launchtime before start of cycle igc: Fix inserting of empty frame for launchtime riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core riscv: bpf: Avoid breaking W^X bpf, riscv: Support riscv jit to provide bpf_line_info riscv, bpf: Fix inconsistent JIT image generation erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF wifi: airo: avoid uninitialized warning in airo_get_rate() net/sched: flower: Ensure both minimum and maximum ports are specified netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write() net/sched: make psched_mtu() RTNL-less safe net/sched: sch_qfq: refactor parsing of netlink parameters net/sched: sch_qfq: account for stab overhead in qfq_enqueue nvme-pci: fix DMA direction of unmapping integrity data f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io() pinctrl: amd: Fix mistake in handling clearing pins at startup pinctrl: amd: Detect internal GPIO0 debounce handling pinctrl: amd: Only use special debounce behavior for GPIO 0 tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation mtd: rawnand: meson: fix unaligned DMA buffers handling net: bcmgenet: Ensure MDIO unregistration has clocks enabled powerpc: Fail build if using recordmcount with binutils v2.37 misc: fastrpc: Create fastrpc scalar with correct buffer count erofs: fix compact 4B support for 16k block size MIPS: Loongson: Fix cpu_probe_loongson() again ext4: Fix reusing stale buffer heads from last failed mounting ext4: fix wrong unit use in ext4_mb_clear_bb ext4: get block from bh in ext4_free_blocks for fast commit replay ext4: fix wrong unit use in ext4_mb_new_blocks ext4: only update i_reserved_data_blocks on successful block allocation jfs: jfs_dmap: Validate db_l2nbperpage while mounting hwrng: imx-rngc - fix the timeout for init and self check PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold PCI: Add function 1 DMA alias quirk for Marvell 88SE9235 PCI: qcom: Disable write access to read only registers for IP v2.3.3 PCI: rockchip: Assert PCI Configuration Enable bit after probe PCI: rockchip: Write PCI Device ID to correct register PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core PCI: rockchip: Use u32 variable to access 32-bit registers PCI: rockchip: Set address alignment for endpoint mode misc: pci_endpoint_test: Free IRQs before removing the device misc: pci_endpoint_test: Re-init completion for every test md/raid0: add discard support for the 'original' layout fs: dlm: return positive pid value for F_GETLK drm/atomic: Allow vblank-enabled + self-refresh "disable" drm/rockchip: vop: Leave vblank enabled in self-refresh drm/amd/display: Correct `DMUB_FW_VERSION` macro serial: atmel: don't enable IRQs prematurely tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk firmware: stratix10-svc: Fix a potential resource leak in svc_create_memory_pool() ceph: don't let check_caps skip sending responses for revoke msgs xhci: Fix resume issue of some ZHAOXIN hosts xhci: Fix TRB prefetch issue of ZHAOXIN hosts xhci: Show ZHAOXIN xHCI root hub speed correctly meson saradc: fix clock divider mask length Revert "8250: add support for ASIX devices with a FIFO bug" s390/decompressor: fix misaligned symbol build error tracing/histograms: Add histograms to hist_vars if they have referenced variables samples: ftrace: Save required argument registers in sample trampolines net: ena: fix shift-out-of-bounds in exponential backoff ring-buffer: Fix deadloop issue on reading trace_pipe xtensa: ISS: fix call to split_if_spec tracing: Fix null pointer dereference in tracing_err_log_open() tracing/probes: Fix not to count error code to total length scsi: qla2xxx: Wait for io return on terminate rport scsi: qla2xxx: Array index may go out of bound scsi: qla2xxx: Fix buffer overrun scsi: qla2xxx: Fix potential NULL pointer dereference scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport() scsi: qla2xxx: Correct the index of array scsi: qla2xxx: Pointer may be dereferenced scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue net/sched: sch_qfq: reintroduce lmax bound check for MTU RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests drm/atomic: Fix potential use-after-free in nonblocking commits ALSA: hda/realtek - remove 3k pull low procedure ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx keys: Fix linking a duplicate key to a keyring's assoc_array perf probe: Add test for regression introduced by switch to die_get_decl_file() btrfs: fix warning when putting transaction with qgroups enabled after abort fuse: revalidate: don't invalidate if interrupted selftests: tc: set timeout to 15 minutes selftests: tc: add 'ct' action kconfig dep regmap: Drop initial version of maximum transfer length fixes regmap: Account for register length in SMBus I/O limits can: bcm: Fix UAF in bcm_proc_show() drm/client: Fix memory leak in drm_client_target_cloned drm/client: Fix memory leak in drm_client_modeset_probe ASoC: fsl_sai: Disable bit clock with transmitter ext4: correct inline offset when handling xattrs in inode body debugobjects: Recheck debug_objects_enabled before reporting nbd: Add the maximum limit of allocated index in nbd_dev_add md: fix data corruption for raid456 when reshape restart while grow up md/raid10: prevent soft lockup while flush writes posix-timers: Ensure timer ID search-loop limit is valid btrfs: add xxhash to fast checksum implementations ACPI: button: Add lid disable DMI quirk for Nextbook Ares 8A ACPI: video: Add backlight=native DMI quirk for Apple iMac11,3 ACPI: video: Add backlight=native DMI quirk for Lenovo ThinkPad X131e (3371 AMD version) arm64: set __exception_irq_entry with __irq_entry as a default arm64: mm: fix VA-range sanity check sched/fair: Don't balance task to its current running CPU wifi: ath11k: fix registration of 6Ghz-only phy without the full channel range bpf: Address KCSAN report on bpf_lru_list devlink: report devlink_port_type_warn source device wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point() wifi: iwlwifi: mvm: avoid baid size integer overflow igb: Fix igb_down hung on surprise removal spi: bcm63xx: fix max prepend length fbdev: imxfb: warn about invalid left/right margin pinctrl: amd: Use amd_pinconf_set() for all config options net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() bridge: Add extack warning when enabling STP in netns. iavf: Fix use-after-free in free_netdev iavf: Fix out-of-bounds when setting channels on remove security: keys: Modify mismatched function name octeontx2-pf: Dont allocate BPIDs for LBK interfaces tcp: annotate data-races around tcp_rsk(req)->ts_recent net: ipv4: Use kfree_sensitive instead of kfree net:ipv6: check return value of pskb_trim() Revert "tcp: avoid the lookup process failing to get sk in ehash table" fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe llc: Don't drop packet from non-root netns. netfilter: nf_tables: fix spurious set element insertion failure netfilter: nf_tables: can't schedule in nft_chain_validate netfilter: nft_set_pipapo: fix improper element removal netfilter: nf_tables: skip bound chain in netns release path netfilter: nf_tables: skip bound chain on rule flush tcp: annotate data-races around tp->tcp_tx_delay tcp: annotate data-races around tp->keepalive_time tcp: annotate data-races around tp->keepalive_intvl tcp: annotate data-races around tp->keepalive_probes net: Introduce net.ipv4.tcp_migrate_req. tcp: Fix data-races around sysctl_tcp_syn(ack)?_retries. tcp: annotate data-races around icsk->icsk_syn_retries tcp: annotate data-races around tp->linger2 tcp: annotate data-races around rskq_defer_accept tcp: annotate data-races around tp->notsent_lowat tcp: annotate data-races around icsk->icsk_user_timeout tcp: annotate data-races around fastopenq.max_qlen net: phy: prevent stale pointer dereference in phy_init() tracing/histograms: Return an error if we fail to add histogram to hist_vars list tracing: Fix memory leak of iter->temp when reading trace_pipe ftrace: Store the order of pages allocated in ftrace_page ftrace: Fix possible warning on checking all pages used in ftrace_process_locs() Linux 5.10.188 Change-Id: Ibcc1adc43df5b8f649b12078eedd5d4f57de4578 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
![]() |
937105d2b0 |
tcp: annotate data-races around tcp_rsk(req)->ts_recent
[ Upstream commit eba20811f32652bc1a52d5e7cc403859b86390d9 ]
TCP request sockets are lockless, tcp_rsk(req)->ts_recent
can change while being read by another cpu as syzbot noticed.
This is harmless, but we should annotate the known races.
Note that tcp_check_req() changes req->ts_recent a bit early,
we might change this in the future.
BUG: KCSAN: data-race in tcp_check_req / tcp_check_req
write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1:
tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762
tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:468 [inline]
ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5493 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
process_backlog+0x21f/0x380 net/core/dev.c:5935
__napi_poll+0x60/0x3b0 net/core/dev.c:6498
napi_poll net/core/dev.c:6565 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6698
__do_softirq+0xc1/0x265 kernel/softirq.c:571
do_softirq+0x7e/0xb0 kernel/softirq.c:472
__local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396
local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33
rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline]
__dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271
dev_queue_xmit include/linux/netdevice.h:3088 [inline]
neigh_hh_output include/net/neighbour.h:528 [inline]
neigh_output include/net/neighbour.h:542 [inline]
ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:292 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431
dst_output include/net/dst.h:458 [inline]
ip_local_out net/ipv4/ip_output.c:126 [inline]
__ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533
ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547
__tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399
tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693
__tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877
tcp_push_pending_frames include/net/tcp.h:1952 [inline]
__tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline]
tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343
rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52
rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422
rds_send_worker+0x42/0x1d0 net/rds/threads.c:200
process_one_work+0x3e6/0x750 kernel/workqueue.c:2408
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555
kthread+0x1d7/0x210 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0:
tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622
tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:468 [inline]
ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5493 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
process_backlog+0x21f/0x380 net/core/dev.c:5935
__napi_poll+0x60/0x3b0 net/core/dev.c:6498
napi_poll net/core/dev.c:6565 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6698
__do_softirq+0xc1/0x265 kernel/softirq.c:571
run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x1cd237f1 -> 0x1cd237f2
Fixes:
|
||
![]() |
08f61a3491 |
bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE
[ Upstream commit 9cacf81f8161111db25f98e78a7a0e32ae142b3f ] Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom call in do_tcp_getsockopt using the on-stack data. This removes 3% overhead for locking/unlocking the socket. Without this patch: 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc With the patch applied: 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern Note, exporting uapi/tcp.h requires removing netinet/tcp.h from test_progs.h because those headers have confliciting definitions. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com Stable-dep-of: 2598619e012c ("sctp: add bpf_bypass_getsockopt proto callback") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
f015c92c49 |
Revert "ipv4/tcp: do not use per netns ctl sockets"
This reverts commit
|
||
![]() |
a4be51e26a |
Revert "net: Find dst with sk's xfrm policy not ctl_sk"
This reverts commit
|
||
![]() |
c86beaeed1 |
Revert "tcp: fix possible sk_priority leak in tcp_v4_send_reset()"
This reverts commit
|
||
![]() |
e7fd68abbb |
tcp: fix possible sk_priority leak in tcp_v4_send_reset()
[ Upstream commit 1e306ec49a1f206fd2cc89a42fac6e6f592a8cc1 ]
When tcp_v4_send_reset() is called with @sk == NULL,
we do not change ctl_sk->sk_priority, which could have been
set from a prior invocation.
Change tcp_v4_send_reset() to set sk_priority and sk_mark
fields before calling ip_send_unicast_reply().
This means tcp_v4_send_reset() and tcp_v4_send_ack()
no longer have to clear ctl_sk->sk_mark after
their call to ip_send_unicast_reply().
Fixes:
|
||
![]() |
788791990d |
net: Find dst with sk's xfrm policy not ctl_sk
[ Upstream commit e22aa14866684f77b4f6b6cae98539e520ddb731 ] If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 1e306ec49a1f ("tcp: fix possible sk_priority leak in tcp_v4_send_reset()") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
a9ef8b2589 |
ipv4/tcp: do not use per netns ctl sockets
[ Upstream commit 37ba017dcc3b1123206808979834655ddcf93251 ]
TCP ipv4 uses per-cpu/per-netns ctl sockets in order to send
RST and some ACK packets (on behalf of TIMEWAIT sockets).
This adds memory and cpu costs, which do not seem needed.
Now typical servers have 256 or more cores, this adds considerable
tax to netns users.
tcp sockets are used from BH context, are not receiving packets,
and do not store any persistent state but the 'struct net' pointer
in order to be able to use IPv4 output functions.
Note that I attempted a related change in the past, that had
to be hot-fixed in commit
|
||
![]() |
c0af4d005a |
dccp/tcp: Reset saddr on failure after inet6?_hash_connect().
[ Upstream commit 77934dc6db0d2b111a8f2759e9ad2fb67f5cffa5 ] When connect() is called on a socket bound to the wildcard address, we change the socket's saddr to a local address. If the socket fails to connect() to the destination, we have to reset the saddr. However, when an error occurs after inet_hash6?_connect() in (dccp|tcp)_v[46]_conect(), we forget to reset saddr and leave the socket bound to the address. From the user's point of view, whether saddr is reset or not varies with errno. Let's fix this inconsistent behaviour. Note that after this patch, the repro [0] will trigger the WARN_ON() in inet_csk_get_port() again, but this patch is not buggy and rather fixes a bug papering over the bhash2's bug for which we need another fix. For the record, the repro causes -EADDRNOTAVAIL in inet_hash6_connect() by this sequence: s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s1.bind(('127.0.0.1', 10000)) s1.sendto(b'hello', MSG_FASTOPEN, (('127.0.0.1', 10000))) # or s1.connect(('127.0.0.1', 10000)) s2 = socket() s2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s2.bind(('0.0.0.0', 10000)) s2.connect(('127.0.0.1', 10000)) # -EADDRNOTAVAIL s2.listen(32) # WARN_ON(inet_csk(sk)->icsk_bind2_hash != tb2); [0]: https://syzkaller.appspot.com/bug?extid=015d756bbd1f8b5c8f09 Fixes: |
||
![]() |
4f23cb2be5 |
tcp: fix a signed-integer-overflow bug in tcp_add_backlog()
[ Upstream commit ec791d8149ff60c40ad2074af3b92a39c916a03f ]
The type of sk_rcvbuf and sk_sndbuf in struct sock is int, and
in tcp_add_backlog(), the variable limit is caculated by adding
sk_rcvbuf, sk_sndbuf and 64 * 1024, it may exceed the max value
of int and overflow. This patch reduces the limit budget by
halving the sndbuf to solve this issue since ACK packets are much
smaller than the payload.
Fixes:
|
||
![]() |
49713d7c38 |
tcp: minor optimization in tcp_add_backlog()
[ Upstream commit d519f350967a60b85a574ad8aeac43f2b4384746 ] If packet is going to be coalesced, sk_sndbuf/sk_rcvbuf values are not used. Defer their access to the point we need them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: ec791d8149ff ("tcp: fix a signed-integer-overflow bug in tcp_add_backlog()") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
![]() |
f039b43cba |
inet: fully convert sk->sk_rx_dst to RCU rules
commit 8f905c0e7354ef261360fb7535ea079b1082c105 upstream.
syzbot reported various issues around early demux,
one being included in this changelog [1]
sk->sk_rx_dst is using RCU protection without clearly
documenting it.
And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
are not following standard RCU rules.
[a] dst_release(dst);
[b] sk->sk_rx_dst = NULL;
They look wrong because a delete operation of RCU protected
pointer is supposed to clear the pointer before
the call_rcu()/synchronize_rcu() guarding actual memory freeing.
In some cases indeed, dst could be freed before [b] is done.
We could cheat by clearing sk_rx_dst before calling
dst_release(), but this seems the right time to stick
to standard RCU annotations and debugging facilities.
[1]
BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
dst_check include/net/dst.h:470 [inline]
tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
__netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
__netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
__netif_receive_skb_list net/core/dev.c:5608 [inline]
netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
gro_normal_list net/core/dev.c:5853 [inline]
gro_normal_list net/core/dev.c:5849 [inline]
napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
__napi_poll+0xaf/0x440 net/core/dev.c:7023
napi_poll net/core/dev.c:7090 [inline]
net_rx_action+0x801/0xb40 net/core/dev.c:7177
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
RIP: 0033:0x7f5e972bfd57
Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
RSP: 002b:00007fff8a413210 EFLAGS: 00000283
RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
</TASK>
Allocated by task 13:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
__kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
kasan_slab_alloc include/linux/kasan.h:259 [inline]
slab_post_alloc_hook mm/slab.h:519 [inline]
slab_alloc_node mm/slub.c:3234 [inline]
slab_alloc mm/slub.c:3242 [inline]
kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
dst_alloc+0x146/0x1f0 net/core/dst.c:92
rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
ip_route_input_rcu net/ipv4/route.c:2470 [inline]
ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
__netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
__netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
__netif_receive_skb_list net/core/dev.c:5608 [inline]
netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
gro_normal_list net/core/dev.c:5853 [inline]
gro_normal_list net/core/dev.c:5849 [inline]
napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
__napi_poll+0xaf/0x440 net/core/dev.c:7023
napi_poll net/core/dev.c:7090 [inline]
net_rx_action+0x801/0xb40 net/core/dev.c:7177
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Freed by task 13:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:1723 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
slab_free mm/slub.c:3513 [inline]
kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
rcu_do_batch kernel/rcu/tree.c:2506 [inline]
rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Last potentially related work creation:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
__kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
__call_rcu kernel/rcu/tree.c:2985 [inline]
call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
dst_release net/core/dst.c:177 [inline]
dst_release+0x79/0xe0 net/core/dst.c:167
tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
sk_backlog_rcv include/net/sock.h:1030 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2768
release_sock+0x54/0x1b0 net/core/sock.c:3300
tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
sock_write_iter+0x289/0x3c0 net/socket.c:1057
call_write_iter include/linux/fs.h:2162 [inline]
new_sync_write+0x429/0x660 fs/read_write.c:503
vfs_write+0x7cd/0xae0 fs/read_write.c:590
ksys_write+0x1ee/0x250 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88807f1cb700
which belongs to the cache ip_dst_cache of size 176
The buggy address is located 58 bytes inside of
176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
The buggy address belongs to the page:
page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
prep_new_page mm/page_alloc.c:2418 [inline]
get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
alloc_slab_page mm/slub.c:1793 [inline]
allocate_slab mm/slub.c:1930 [inline]
new_slab+0x32d/0x4a0 mm/slub.c:1993
___slab_alloc+0x918/0xfe0 mm/slub.c:3022
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
slab_alloc_node mm/slub.c:3200 [inline]
slab_alloc mm/slub.c:3242 [inline]
kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
dst_alloc+0x146/0x1f0 net/core/dst.c:92
rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
__mkroute_output net/ipv4/route.c:2564 [inline]
ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
__ip_route_output_key include/net/route.h:126 [inline]
ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
ip_route_output_key include/net/route.h:142 [inline]
geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
geneve_xmit_skb drivers/net/geneve.c:899 [inline]
geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
__netdev_start_xmit include/linux/netdevice.h:4994 [inline]
netdev_start_xmit include/linux/netdevice.h:5008 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
__dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1338 [inline]
free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
free_unref_page_prepare mm/page_alloc.c:3309 [inline]
free_unref_page+0x19/0x690 mm/page_alloc.c:3388
qlink_free mm/kasan/quarantine.c:146 [inline]
qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
__kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
kasan_slab_alloc include/linux/kasan.h:259 [inline]
slab_post_alloc_hook mm/slab.h:519 [inline]
slab_alloc_node mm/slub.c:3234 [inline]
kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
__alloc_skb+0x215/0x340 net/core/skbuff.c:414
alloc_skb include/linux/skbuff.h:1126 [inline]
alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
Memory state around the buggy address:
ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes:
|
||
![]() |
e4a7acd6b4 |
tcp: Fix data-races around sysctl_tcp_reflect_tos.
[ Upstream commit 870e3a634b6a6cb1543b359007aca73fe6a03ac5 ]
While reading sysctl_tcp_reflect_tos, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes:
|
||
![]() |
b6c189aa80 |
tcp: Fix a data-race around sysctl_tcp_tw_reuse.
[ Upstream commit cbfc6495586a3f09f6f07d9fb3c7cafe807e3c55 ]
While reading sysctl_tcp_tw_reuse, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes:
|
||
![]() |
5f614f5f70 |
tcp: add a missing nf_reset_ct() in 3WHS handling
commit 6f0012e35160cd08a53e46e3b3bbf724b92dfe68 upstream.
When the third packet of 3WHS connection establishment
contains payload, it is added into socket receive queue
without the XFRM check and the drop of connection tracking
context.
This means that if the data is left unread in the socket
receive queue, conntrack module can not be unloaded.
As most applications usually reads the incoming data
immediately after accept(), bug has been hiding for
quite a long time.
Commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") exposed this bug because
even if the application reads this data, the skb
with nfct state could stay in a per-cpu cache for
an arbitrary time, if said cpu no longer process RX softirqs.
Many thanks to Ilya Maximets for reporting this issue,
and for testing various patches:
https://lore.kernel.org/netdev/20220619003919.394622-1-i.maximets@ovn.org/
Note that I also added a missing xfrm4_policy_check() call,
although this is probably not a big issue, as the SYN
packet should have been dropped earlier.
Fixes:
|
||
![]() |
38d984e5e8 |
tcp: md5: Fix overlap between vrf and non-vrf keys
[ Upstream commit 86f1e3a8489f6a0232c1f3bc2bdb379f5ccdecec ]
With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to
accept connection from the same client address in different VRFs. It is
also possible to set different MD5 keys for these clients which differ
only in the tcpm_l3index field.
This appears to work when distinguishing between different VRFs but not
between non-VRF and VRF connections. In particular:
* tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key.
This means that adding a key with l3index != 0 after a key with l3index
== 0 will cause the earlier key to be deleted. Both keys can be present
if the non-vrf key is added later.
* _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This
casues failures if the passwords differ.
Fix this by making tcp_md5_do_lookup_exact perform an actual exact
comparison on l3index and by making __tcp_md5_do_lookup perfer
vrf-bound keys above other considerations like prefixlen.
Fixes:
|
||
![]() |
a7d0a59e21 |
tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
[ Upstream commit 525e2f9fd0229eb10cb460a9e6d978257f24804e ]
st->bucket stores the current bucket number.
st->offset stores the offset within this bucket that is the sk to be
seq_show(). Thus, st->offset only makes sense within the same
st->bucket.
These two variables are an optimization for the common no-lseek case.
When resuming the seq_file iteration (i.e. seq_start()),
tcp_seek_last_pos() tries to continue from the st->offset
at bucket st->bucket.
However, it is possible that the bucket pointed by st->bucket
has changed and st->offset may end up skipping the whole st->bucket
without finding a sk. In this case, tcp_seek_last_pos() currently
continues to satisfy the offset condition in the next (and incorrect)
bucket. Instead, regardless of the offset value, the first sk of the
next bucket should be returned. Thus, "bucket == st->bucket" check is
added to tcp_seek_last_pos().
The chance of hitting this is small and the issue is a decade old,
so targeting for the next tree.
Fixes:
|
||
![]() |
164294d09c |
tcp: disable TFO blackhole logic by default
[ Upstream commit 213ad73d06073b197a02476db3a4998e219ddb06 ]
Multiple complaints have been raised from the TFO users on the internet
stating that the TFO blackhole logic is too aggressive and gets falsely
triggered too often.
(e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/)
Considering that most middleboxes no longer drop TFO packets, we decide
to disable the blackhole logic by setting
/proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.
Fixes:
|
||
![]() |
d60f07bcb7 |
tcp: annotate data races around tp->mtu_info
commit 561022acb1ce62e50f7a8258687a21b84282a4cb upstream.
While tp->mtu_info is read while socket is owned, the write
sides happen from err handlers (tcp_v[46]_mtu_reduced)
which only own the socket spinlock.
Fixes:
|
||
![]() |
e9c4068fb0 |
tcp: Fix potential use-after-free due to double kfree()
commit c89dffc70b340780e5b933832d8c3e045ef3791e upstream. Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct request_sock and then can allocate inet_rsk(req)->ireq_opt. After that, tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full socket into ehash and sets NULL to ireq_opt. Otherwise, tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full socket. The commit |
||
![]() |
981e180774 |
tcp: do not mess with cloned skbs in tcp_add_backlog()
commit b160c28548bc0a87cbd16d5af6d3edcfd70b8c9a upstream.
Heiner Kallweit reported that some skbs were sent with
the following invalid GSO properties :
- gso_size > 0
- gso_type == 0
This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.
Juerg Haefliger was able to reproduce a similar issue using
a lan78xx NIC and a workload mixing TCP incoming traffic
and forwarded packets.
The problem is that tcp_add_backlog() is writing
over gso_segs and gso_size even if the incoming packet will not
be coalesced to the backlog tail packet.
While skb_try_coalesce() would bail out if tail packet is cloned,
this overwriting would lead to corruptions of other packets
cooked by lan78xx, sharing a common super-packet.
The strategy used by lan78xx is to use a big skb, and split
it into all received packets using skb_clone() to avoid copies.
The drawback of this strategy is that all the small skb share a common
struct skb_shared_info.
This patch rewrites TCP gso_size/gso_segs handling to only
happen on the tail skb, since skb_try_coalesce() made sure
it was not cloned.
Fixes:
|
||
![]() |
8ef44b6fe4 |
tcp: Retain ECT bits for tos reflection
For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.
Fixes:
|
||
![]() |
407c85c7dd |
tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN
When a BPF program is used to select between a type of TCP congestion
control algorithm that uses either ECN or not there is a case where the
synack for the frame was coming up without the ECT0 bit set. A bit of
research found that this was due to the final socket being configured to
dctcp while the listener socket was staying in cubic.
To reproduce it all that is needed is to monitor TCP traffic while running
the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
assuming tcp_dctcp module is loaded or compiled in and the traffic matches
the rules in the sample file, is that for all frames with the exception of
the synack the ECT0 bit is set.
To address that it is necessary to make one additional call to
tcp_bpf_ca_needs_ecn using the request socket and then use the output of
that to set the ECT0 bit for the tos/tclass of the packet.
Fixes:
|
||
![]() |
01770a1661 |
tcp: fix race condition when creating child sockets from syncookies
When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
![]() |
861602b577 |
tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.
This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.
One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.
Fixes:
|
||
![]() |
9d49aea13f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Small conflict around locking in rxrpc_process_event() - channel_lock moved to bundle in next, while state lock needs _bh() from net. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
![]() |
86bccd0367 |
tcp: fix receive window update in tcp_add_backlog()
We got reports from GKE customers flows being reset by netfilter
conntrack unless nf_conntrack_tcp_be_liberal is set to 1.
Traces seemed to suggest ACK packet being dropped by the
packet capture, or more likely that ACK were received in the
wrong order.
wscale=7, SYN and SYNACK not shown here.
This ACK allows the sender to send 1871*128 bytes from seq 51359321 :
New right edge of the window -> 51359321+1871*128=51598809
09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408
09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768
09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
Receiver now allows to send 606*128=77568 from seq 51521241 :
New right edge of the window -> 51521241+606*128=51598809
09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
It seems the sender exceeds RWIN allowance, since 51611353 > 51598809
09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728
09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040
09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0
netfilter conntrack is not happy and sends RST
09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0
09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0
Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet.
New right edge of the window -> 51521241+859*128=51631193
Normally TCP stack handles OOO packets just fine, but it
turns out tcp_add_backlog() does not. It can update the window
field of the aggregated packet even if the ACK sequence
of the last received packet is too old.
Many thanks to Alexandre Ferrieux for independently reporting the issue
and suggesting a fix.
Fixes:
|
||
![]() |
ac8f1710c1 |
tcp: reflect tos value received in SYN to the socket
This commit adds a new TCP feature to reflect the tos value received in SYN, and send it out on the SYN-ACK, and eventually set the tos value of the established socket with this reflected tos value. This provides a way to set the traffic class/QoS level for all traffic in the same connection to be the same as the incoming SYN request. It could be useful in data centers to provide equivalent QoS according to the incoming request. This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by default turned off. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
de033b7d15 |
ip: pass tos into ip_build_and_send_pkt()
This commit adds tos as a new passed in parameter to ip_build_and_send_pkt() which will be used in the later commit. This is a pure restructure and does not have any functional change. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
150f29f5e6 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-09-01 The following pull-request contains BPF updates for your *net-next* tree. There are two small conflicts when pulling, resolve as follows: 1) Merge conflict in tools/lib/bpf/libbpf.c between |
||
![]() |
2bdcc73c88 |
net: ipv4: delete repeated words
Drop duplicate words in comments in net/ipv4/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
331fca4315 |
bpf: tcp: Add bpf_skops_hdr_opt_len() and bpf_skops_write_hdr_opt()
The bpf prog needs to parse the SYN header to learn what options have been sent by the peer's bpf-prog before writing its options into SYNACK. This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack(). This syn_skb will eventually be made available (as read-only) to the bpf prog. This will be the only SYN packet available to the bpf prog during syncookie. For other regular cases, the bpf prog can also use the saved_syn. When writing options, the bpf prog will first be called to tell the kernel its required number of bytes. It is done by the new bpf_skops_hdr_opt_len(). The bpf prog will only be called when the new BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags. When the bpf prog returns, the kernel will know how many bytes are needed and then update the "*remaining" arg accordingly. 4 byte alignment will be included in the "*remaining" before this function returns. The 4 byte aligned number of bytes will also be stored into the opts->bpf_opt_len. "bpf_opt_len" is a newly added member to the struct tcp_out_options. Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the header options. The bpf prog is only called if it has reserved spaces before (opts->bpf_opt_len > 0). The bpf prog is the last one getting a chance to reserve header space and writing the header option. These two functions are half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190052.2885316-1-kafai@fb.com |
||
![]() |
f9c7927295 |
bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t
This patch refactored target bpf_iter_init_seq_priv_t callback function to accept additional information. This will be needed in later patches for map element targets since a particular map should be passed to traverse elements for that particular map. In the future, other information may be passed to target as well, e.g., pid, cgroup id, etc. to customize the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com |
||
![]() |
14fc6bd6b7 |
bpf: Refactor bpf_iter_reg to have separate seq_info member
There is no functionality change for this patch. Struct bpf_iter_reg is used to register a bpf_iter target, which includes information for both prog_load, link_create and seq_file creation. This patch puts fields related seq_file creation into a different structure. This will be useful for map elements iterator where one iterator covers different map types and different map types may have different seq_ops, init/fini private_data function and private_data size. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com |
||
![]() |
d4c19c4914 |
net/tcp: switch ->md5_parse to sockptr_t
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
dee72f8a0c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-07-21 The following pull-request contains BPF updates for your *net-next* tree. We've added 46 non-merge commits during the last 6 day(s) which contain a total of 68 files changed, 4929 insertions(+), 526 deletions(-). The main changes are: 1) Run BPF program on socket lookup, from Jakub. 2) Introduce cpumap, from Lorenzo. 3) s390 JIT fixes, from Ilya. 4) teach riscv JIT to emit compressed insns, from Luke. 5) use build time computed BTF ids in bpf iter, from Yonghong. ==================== Purely independent overlapping changes in both filter.h and xdp.h Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
951cf368bc |
bpf: net: Use precomputed btf_id for bpf iterators
One additional field btf_id is added to struct bpf_ctx_arg_aux to store the precomputed btf_ids. The btf_id is computed at build time with BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions. All existing bpf iterators are changed to used pre-compute btf_ids. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com |
||
![]() |
3021ad5299 |
net/ipv6: remove compat_ipv6_{get,set}sockopt
Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
b6238c04c0 |
net/ipv4: remove compat_ip_{get,set}sockopt
Handle the few cases that need special treatment in-line using in_compat_syscall(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
71930d6102 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
e6ced831ef |
tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers
My prior fix went a bit too far, according to Herbert and Mathieu.
Since we accept that concurrent TCP MD5 lookups might see inconsistent
keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()
Clearing all key->key[] is needed to avoid possible KMSAN reports,
if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.
data_race() was added in linux-5.8 and will prevent KCSAN reports,
this can safely be removed in stable backports, if data_race() is
not yet backported.
v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()
Fixes:
|
||
![]() |
6a2febec33 |
tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()
MD5 keys are read with RCU protection, and tcp_md5_do_add()
might update in-place a prior key.
Normally, typical RCU updates would allocate a new piece
of memory. In this case only key->key and key->keylen might
be updated, and we do not care if an incoming packet could
see the old key, the new one, or some intermediate value,
since changing the key on a live flow is known to be problematic
anyway.
We only want to make sure that in the case key->keylen
is changed, cpus in tcp_md5_hash_key() wont try to use
uninitialized data, or crash because key->keylen was
read twice to feed sg_init_one() and ahash_request_set_crypt()
Fixes:
|
||
![]() |
52d87d5f64 |
net: bpf: Implement bpf iterator for tcp
The bpf iterator for tcp is implemented. Both tcp4 and tcp6 sockets will be traversed. It is up to bpf program to filter for tcp4 or tcp6 only, or both families of sockets. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230805.3987959-1-yhs@fb.com |
||
![]() |
b08d4d3b6c |
net: bpf: Add bpf_seq_afinfo in tcp_iter_state
A new field bpf_seq_afinfo is added to tcp_iter_state to provide bpf tcp iterator afinfo. There are two reasons on why we did this. First, the current way to get afinfo from PDE_DATA does not work for bpf iterator as its seq_file inode does not conform to /proc/net/{tcp,tcp6} inode structures. More specifically, anonymous bpf iterator will use an anonymous inode which is shared in the system and we cannot change inode private data structure at all. Second, bpf iterator for tcp/tcp6 wants to traverse all tcp and tcp6 sockets in one pass and bpf program can control whether they want to skip one sk_family or not. Having a different afinfo with family AF_UNSPEC make it easier to understand in the code. This patch does not change /proc/net/{tcp,tcp6} behavior as the bpf_seq_afinfo will be NULL for these two proc files. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230804.3987829-1-yhs@fb.com |
||
![]() |
d29245692a |
tcp: ipv6: support RFC 6069 (TCP-LD)
Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
a12daf13a4 |
tcp: rename tcp_v4_err() skb parameter
This essentially reverts
|
||
![]() |
f745664257 |
tcp: add tcp_ld_RTO_revert() helper
RFC 6069 logic has been implemented for IPv4 only so far, right in the middle of tcp_v4_err() and was error prone. Move this code to one helper, to make tcp_v4_err() more readable and to eventually expand RFC 6069 to IPv6 in the future. Also perform sock_owned_by_user() check a bit sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
![]() |
239174945d |
tcp: tcp_v4_err() icmp skb is named icmp_skb
I missed the fact that tcp_v4_err() differs from tcp_v6_err(). After commit |