Commit Graph

276 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
64414277da Revert "net: Introduce net.ipv4.tcp_migrate_req."
This reverts commit cf6c06ac74 which is
commit f9ac779f881c2ec3d1cdcd7fa9d4f9442bf60e80 upstream.

It breaks the Android abi.  If it is required in the future, it can come
back in an abi-safe way.

Bug: 161946584
Change-Id: Ia4d18c9acdd553b1806ca844c47e34a76f6c6b93
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-08-04 07:24:44 +00:00
Greg Kroah-Hartman
477f5e6b9e Merge 5.10.188 into android12-5.10-lts
Changes in 5.10.188
	media: atomisp: fix "variable dereferenced before check 'asd'"
	x86/smp: Use dedicated cache-line for mwait_play_dead()
	can: isotp: isotp_sendmsg(): fix return error fix on TX path
	video: imsttfb: check for ioremap() failures
	fbdev: imsttfb: Fix use after free bug in imsttfb_probe
	HID: wacom: Use ktime_t rather than int when dealing with timestamps
	HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651.
	Revert "thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe"
	scripts/tags.sh: Resolve gtags empty index generation
	drm/amdgpu: Validate VM ioctl flags.
	nubus: Partially revert proc_create_single_data() conversion
	fs: pipe: reveal missing function protoypes
	x86/resctrl: Only show tasks' pid in current pid namespace
	blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost
	md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
	md/raid10: fix overflow of md/safe_mode_delay
	md/raid10: fix wrong setting of max_corr_read_errors
	md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
	md/raid10: fix io loss while replacement replace rdev
	irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
	irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
	posix-timers: Prevent RT livelock in itimer_delete()
	tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode().
	clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
	PM: domains: fix integer overflow issues in genpd_parse_state()
	perf/arm-cmn: Fix DTC reset
	powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
	ARM: 9303/1: kprobes: avoid missing-declaration warnings
	cpufreq: intel_pstate: Fix energy_performance_preference for passive
	thermal/drivers/sun8i: Fix some error handling paths in sun8i_ths_probe()
	rcuscale: Console output claims too few grace periods
	rcuscale: Always log error message
	rcuscale: Move shutdown from wait_event() to wait_event_idle()
	rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
	rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
	perf/ibs: Fix interface via core pmu events
	x86/mm: Fix __swp_entry_to_pte() for Xen PV guests
	evm: Complete description of evm_inode_setattr()
	ima: Fix build warnings
	pstore/ram: Add check for kstrdup
	igc: Enable and fix RX hash usage by netstack
	wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation
	wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
	samples/bpf: Fix buffer overflow in tcp_basertt
	spi: spi-geni-qcom: Correct CS_TOGGLE bit in SPI_TRANS_CFG
	wifi: wilc1000: fix for absent RSN capabilities WFA testcase
	wifi: mwifiex: Fix the size of a memory allocation in mwifiex_ret_802_11_scan()
	bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE
	sctp: add bpf_bypass_getsockopt proto callback
	libbpf: fix offsetof() and container_of() to work with CO-RE
	nfc: constify several pointers to u8, char and sk_buff
	nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect()
	bpftool: JIT limited misreported as negative value on aarch64
	regulator: core: Fix more error checking for debugfs_create_dir()
	regulator: core: Streamline debugfs operations
	wifi: orinoco: Fix an error handling path in spectrum_cs_probe()
	wifi: orinoco: Fix an error handling path in orinoco_cs_probe()
	wifi: atmel: Fix an error handling path in atmel_probe()
	wl3501_cs: Fix misspelling and provide missing documentation
	net: create netdev->dev_addr assignment helpers
	wl3501_cs: use eth_hw_addr_set()
	wifi: wl3501_cs: Fix an error handling path in wl3501_probe()
	wifi: ray_cs: Utilize strnlen() in parse_addr()
	wifi: ray_cs: Drop useless status variable in parse_addr()
	wifi: ray_cs: Fix an error handling path in ray_probe()
	wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes
	wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled
	wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown
	watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
	watchdog/perf: more properly prevent false positives with turbo modes
	kexec: fix a memory leak in crash_shrink_memory()
	memstick r592: make memstick_debug_get_tpc_name() static
	wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key()
	rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
	wifi: iwlwifi: pull from TXQs with softirqs disabled
	wifi: cfg80211: rewrite merging of inherited elements
	wifi: ath9k: convert msecs to jiffies where needed
	igc: Fix race condition in PTP tx code
	net: stmmac: fix double serdes powerdown
	netlink: fix potential deadlock in netlink_set_err()
	netlink: do not hard code device address lenth in fdb dumps
	selftests: rtnetlink: remove netdevsim device after ipsec offload test
	gtp: Fix use-after-free in __gtp_encap_destroy().
	net: axienet: Move reset before 64-bit DMA detection
	sfc: fix crash when reading stats while NIC is resetting
	nfc: llcp: simplify llcp_sock_connect() error paths
	net: nfc: Fix use-after-free caused by nfc_llcp_find_local
	lib/ts_bm: reset initial match offset for every block of text
	netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
	netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
	ipvlan: Fix return value of ipvlan_queue_xmit()
	netlink: Add __sock_i_ino() for __netlink_diag_dump().
	radeon: avoid double free in ci_dpm_init()
	drm/amd/display: Explicitly specify update type per plane info change
	Input: drv260x - sleep between polling GO bit
	drm/bridge: tc358768: always enable HS video mode
	drm/bridge: tc358768: fix PLL parameters computation
	drm/bridge: tc358768: fix PLL target frequency
	drm/bridge: tc358768: fix TCLK_ZEROCNT computation
	drm/bridge: tc358768: Add atomic_get_input_bus_fmts() implementation
	drm/bridge: tc358768: fix TCLK_TRAILCNT computation
	drm/bridge: tc358768: fix THS_ZEROCNT computation
	drm/bridge: tc358768: fix TXTAGOCNT computation
	drm/bridge: tc358768: fix THS_TRAILCNT computation
	drm/vram-helper: fix function names in vram helper doc
	ARM: dts: BCM5301X: Drop "clock-names" from the SPI node
	ARM: dts: meson8b: correct uart_B and uart_C clock references
	Input: adxl34x - do not hardcode interrupt trigger type
	drm: sun4i_tcon: use devm_clk_get_enabled in `sun4i_tcon_init_clocks`
	drm/panel: sharp-ls043t1le01: adjust mode settings
	ARM: dts: stm32: Move ethernet MAC EEPROM from SoM to carrier boards
	bus: ti-sysc: Fix dispc quirk masking bool variables
	arm64: dts: microchip: sparx5: do not use PSCI on reference boards
	RDMA/bnxt_re: Disable/kill tasklet only if it is enabled
	RDMA/bnxt_re: Fix to remove unnecessary return labels
	RDMA/bnxt_re: Use unique names while registering interrupts
	RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid
	RDMA/bnxt_re: Fix to remove an unnecessary log
	ARM: dts: gta04: Move model property out of pinctrl node
	arm64: dts: qcom: msm8916: correct camss unit address
	arm64: dts: qcom: msm8994: correct SPMI unit address
	arm64: dts: qcom: msm8996: correct camss unit address
	drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H
	ARM: ep93xx: fix missing-prototype warnings
	ARM: omap2: fix missing tick_broadcast() prototype
	arm64: dts: qcom: apq8096: fix fixed regulator name property
	ARM: dts: stm32: Shorten the AV96 HDMI sound card name
	memory: brcmstb_dpfe: fix testing array offset after use
	ASoC: es8316: Increment max value for ALC Capture Target Volume control
	ASoC: es8316: Do not set rate constraints for unsupported MCLKs
	ARM: dts: meson8: correct uart_B and uart_C clock references
	soc/fsl/qe: fix usb.c build errors
	IB/hfi1: Use bitmap_zalloc() when applicable
	IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
	IB/hfi1: Fix wrong mmu_node used for user SDMA packet after invalidate
	RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions
	RDMA/hns: Fix coding style issues
	RDMA/hns: Use refcount_t APIs for HEM
	RDMA/hns: Clean the hardware related code for HEM
	RDMA/hns: Fix hns_roce_table_get return value
	ARM: dts: iwg20d-q7-common: Fix backlight pwm specifier
	arm64: dts: renesas: ulcb-kf: Remove flow control for SCIF1
	fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe()
	arm64: dts: ti: k3-j7200: Fix physical address of pin
	ARM: dts: stm32: Fix audio routing on STM32MP15xx DHCOM PDK2
	ARM: dts: stm32: fix i2s endpoint format property for stm32mp15xx-dkx
	hwmon: (gsc-hwmon) fix fan pwm temperature scaling
	hwmon: (adm1275) enable adm1272 temperature reporting
	hwmon: (adm1275) Allow setting sample averaging
	hwmon: (pmbus/adm1275) Fix problems with temperature monitoring on ADM1272
	ARM: dts: BCM5301X: fix duplex-full => full-duplex
	drm/amdkfd: Fix potential deallocation of previously deallocated memory.
	drm/radeon: fix possible division-by-zero errors
	amdgpu: validate offset_in_bo of drm_amdgpu_gem_va
	RDMA/bnxt_re: wraparound mbox producer index
	RDMA/bnxt_re: Avoid calling wake_up threads from spin_lock context
	clk: imx: clk-imx8mn: fix memory leak in imx8mn_clocks_probe
	clk: imx: clk-imx8mp: improve error handling in imx8mp_clocks_probe()
	clk: tegra: tegra124-emc: Fix potential memory leak
	ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer
	drm/msm/dpu: do not enable color-management if DSPPs are not available
	drm/msm/dp: Free resources after unregistering them
	clk: vc5: check memory returned by kasprintf()
	clk: cdce925: check return value of kasprintf()
	clk: si5341: Allow different output VDD_SEL values
	clk: si5341: Add sysfs properties to allow checking/resetting device faults
	clk: si5341: return error if one synth clock registration fails
	clk: si5341: check return value of {devm_}kasprintf()
	clk: si5341: free unused memory on probe failure
	clk: keystone: sci-clk: check return value of kasprintf()
	clk: ti: clkctrl: check return value of kasprintf()
	drivers: meson: secure-pwrc: always enable DMA domain
	ovl: update of dentry revalidate flags after copy up
	ASoC: imx-audmix: check return value of devm_kasprintf()
	PCI: cadence: Fix Gen2 Link Retraining process
	scsi: qedf: Fix NULL dereference in error handling
	pinctrl: bcm2835: Handle gpiochip_add_pin_range() errors
	PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free
	scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
	PCI: pciehp: Cancel bringup sequence if card is not present
	PCI: ftpci100: Release the clock resources
	PCI: Add pci_clear_master() stub for non-CONFIG_PCI
	perf bench: Use unbuffered output when pipe/tee'ing to a file
	perf bench: Add missing setlocale() call to allow usage of %'d style formatting
	pinctrl: cherryview: Return correct value if pin in push-pull mode
	kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures
	perf script: Fixup 'struct evsel_script' method prefix
	perf script: Fix allocation of evsel->priv related to per-event dump files
	perf dwarf-aux: Fix off-by-one in die_get_varname()
	pinctrl: at91-pio4: check return value of devm_kasprintf()
	powerpc/powernv/sriov: perform null check on iov before dereferencing iov
	mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
	mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
	powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfo
	powerpc/mm/dax: Fix the condition when checking if altmap vmemap can cross-boundary
	hwrng: virtio - add an internal buffer
	hwrng: virtio - don't wait on cleanup
	hwrng: virtio - don't waste entropy
	hwrng: virtio - always add a pending request
	hwrng: virtio - Fix race on data_avail and actual data
	crypto: nx - fix build warnings when DEBUG_FS is not enabled
	modpost: fix section mismatch message for R_ARM_ABS32
	modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
	crypto: marvell/cesa - Fix type mismatch warning
	modpost: fix off by one in is_executable_section()
	ARC: define ASM_NL and __ALIGN(_STR) outside #ifdef __ASSEMBLY__ guard
	NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
	dax: Fix dax_mapping_release() use after free
	dax: Introduce alloc_dev_dax_id()
	hwrng: st - keep clock enabled while hwrng is registered
	io_uring: ensure IOPOLL locks around deferred work
	USB: serial: option: add LARA-R6 01B PIDs
	usb: dwc3: gadget: Propagate core init errors to UDC during pullup
	phy: tegra: xusb: Clear the driver reference in usb-phy dev
	block: fix signed int overflow in Amiga partition support
	block: change all __u32 annotations to __be32 in affs_hardblocks.h
	SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
	w1: w1_therm: fix locking behavior in convert_t
	w1: fix loop in w1_fini()
	sh: j2: Use ioremap() to translate device tree address into kernel memory
	serial: 8250: omap: Fix freeing of resources on failed register
	clk: qcom: gcc-ipq6018: Use floor ops for sdcc clocks
	media: usb: Check az6007_read() return value
	media: videodev2.h: Fix struct v4l2_input tuner index comment
	media: usb: siano: Fix warning due to null work_func_t function pointer
	clk: qcom: reset: Allow specifying custom reset delay
	clk: qcom: reset: support resetting multiple bits
	clk: qcom: ipq6018: fix networking resets
	usb: dwc3: qcom: Fix potential memory leak
	usb: gadget: u_serial: Add null pointer check in gserial_suspend
	extcon: Fix kernel doc of property fields to avoid warnings
	extcon: Fix kernel doc of property capability fields to avoid warnings
	usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe()
	usb: hide unused usbfs_notify_suspend/resume functions
	serial: 8250: lock port for stop_rx() in omap8250_irq()
	serial: 8250: lock port for UART_IER access in omap8250_irq()
	kernfs: fix missing kernfs_idr_lock to remove an ID from the IDR
	coresight: Fix loss of connection info when a module is unloaded
	mfd: rt5033: Drop rt5033-battery sub-device
	media: venus: helpers: Fix ALIGN() of non power of two
	media: atomisp: gmin_platform: fix out_len in gmin_get_config_dsm_var()
	KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes
	usb: dwc3: qcom: Release the correct resources in dwc3_qcom_remove()
	usb: dwc3: qcom: Fix an error handling path in dwc3_qcom_probe()
	usb: common: usb-conn-gpio: Set last role to unknown before initial detection
	usb: dwc3-meson-g12a: Fix an error handling path in dwc3_meson_g12a_probe()
	mfd: intel-lpss: Add missing check for platform_get_resource
	Revert "usb: common: usb-conn-gpio: Set last role to unknown before initial detection"
	serial: 8250_omap: Use force_suspend and resume for system suspend
	test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
	mfd: stmfx: Fix error path in stmfx_chip_init
	mfd: stmfx: Nullify stmfx->vdd in case of error
	KVM: s390: vsie: fix the length of APCB bitmap
	mfd: stmpe: Only disable the regulators if they are enabled
	phy: tegra: xusb: check return value of devm_kzalloc()
	pwm: imx-tpm: force 'real_period' to be zero in suspend
	pwm: sysfs: Do not apply state to already disabled PWMs
	rtc: st-lpc: Release some resources in st_rtc_probe() in case of error
	media: cec: i2c: ch7322: also select REGMAP
	sctp: fix potential deadlock on &net->sctp.addr_wq_lock
	Add MODULE_FIRMWARE() for FIRMWARE_TG357766.
	net: dsa: vsc73xx: fix MTU configuration
	spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
	mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
	f2fs: fix error path handling in truncate_dnode()
	octeontx2-af: Fix mapping for NIX block from CGX connection
	powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
	net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
	tcp: annotate data races in __tcp_oow_rate_limited()
	xsk: Honor SO_BINDTODEVICE on bind
	net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
	pptp: Fix fib lookup calls.
	net: dsa: tag_sja1105: fix MAC DA patching from meta frames
	s390/qeth: Fix vipa deletion
	sh: dma: Fix DMA channel offset calculation
	apparmor: fix missing error check for rhashtable_insert_fast
	i2c: xiic: Defer xiic_wakeup() and __xiic_start_xfer() in xiic_process()
	i2c: xiic: Don't try to handle more interrupt events after error
	ALSA: jack: Fix mutex call in snd_jack_report()
	i2c: qup: Add missing unwind goto in qup_i2c_probe()
	NFSD: add encoding of op_recall flag for write delegation
	io_uring: wait interruptibly for request completions on exit
	mmc: core: disable TRIM on Kingston EMMC04G-M627
	mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
	mmc: mmci: Set PROBE_PREFER_ASYNCHRONOUS
	mmc: sdhci: fix DMA configure compatibility issue when 64bit DMA mode is used.
	bcache: fixup btree_cache_wait list damage
	bcache: Remove unnecessary NULL point check in node allocations
	bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent
	um: Use HOST_DIR for mrproper
	integrity: Fix possible multiple allocation in integrity_inode_get()
	autofs: use flexible array in ioctl structure
	shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs
	jffs2: reduce stack usage in jffs2_build_xattr_subsystem()
	fs: avoid empty option when generating legacy mount string
	ext4: Remove ext4 locking of moved directory
	Revert "f2fs: fix potential corruption when moving a directory"
	fs: Establish locking order for unrelated directories
	fs: Lock moved directories
	btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile
	btrfs: fix race when deleting quota root from the dirty cow roots list
	ASoC: mediatek: mt8173: Fix irq error path
	ASoC: mediatek: mt8173: Fix snd_soc_component_initialize error path
	ARM: orion5x: fix d2net gpio initialization
	leds: trigger: netdev: Recheck NETDEV_LED_MODE_LINKUP on dev rename
	fs: no need to check source
	fanotify: disallow mount/sb marks on kernel internal pseudo fs
	tpm, tpm_tis: Claim locality in interrupt handler
	selftests/bpf: Add verifier test for PTR_TO_MEM spill
	block: add overflow checks for Amiga partition support
	sh: pgtable-3level: Fix cast to pointer from integer of different size
	netfilter: nf_tables: use net_generic infra for transaction data
	netfilter: nf_tables: add rescheduling points during loop detection walks
	netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
	netfilter: nf_tables: fix chain binding transaction logic
	netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
	netfilter: nf_tables: reject unbound anonymous set before commit phase
	netfilter: nf_tables: reject unbound chain set before commit phase
	netfilter: nftables: rename set element data activation/deactivation functions
	netfilter: nf_tables: drop map element references from preparation phase
	netfilter: nf_tables: unbind non-anonymous set if rule construction fails
	netfilter: nf_tables: fix scheduling-while-atomic splat
	netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
	netfilter: nf_tables: do not ignore genmask when looking up chain by id
	netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
	wireguard: queueing: use saner cpu selection wrapping
	wireguard: netlink: send staged packets when setting initial private key
	tty: serial: fsl_lpuart: add earlycon for imx8ulp platform
	rcu-tasks: Mark ->trc_reader_nesting data races
	rcu-tasks: Mark ->trc_reader_special.b.need_qs data races
	rcu-tasks: Simplify trc_read_check_handler() atomic operations
	block/partition: fix signedness issue for Amiga partitions
	io_uring: Use io_schedule* in cqring wait
	io_uring: add reschedule point to handle_tw_list()
	net: lan743x: Don't sleep in atomic context
	workqueue: clean up WORK_* constant types, clarify masking
	drm/panel: simple: Add connector_type for innolux_at043tn24
	drm/panel: simple: Add Powertip PH800480T013 drm_display_mode flags
	igc: Remove delay during TX ring configuration
	net/mlx5e: fix double free in mlx5e_destroy_flow_table
	net/mlx5e: Check for NOT_READY flag state after locking
	igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings
	scsi: qla2xxx: Fix error code in qla2x00_start_sp()
	net: mvneta: fix txq_map in case of txq_number==1
	net/sched: cls_fw: Fix improper refcount update leads to use-after-free
	gve: Set default duplex configuration to full
	ionic: remove WARN_ON to prevent panic_on_warn
	net: bgmac: postpone turning IRQs off to avoid SoC hangs
	net: prevent skb corruption on frag list segmentation
	icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev().
	udp6: fix udp6_ehashfn() typo
	ntb: idt: Fix error handling in idt_pci_driver_init()
	NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
	ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
	NTB: ntb_transport: fix possible memory leak while device_register() fails
	NTB: ntb_tool: Add check for devm_kcalloc
	ipv6/addrconf: fix a potential refcount underflow for idev
	platform/x86: wmi: remove unnecessary argument
	platform/x86: wmi: use guid_t and guid_equal()
	platform/x86: wmi: move variables
	platform/x86: wmi: Break possible infinite loop when parsing GUID
	igc: Fix launchtime before start of cycle
	igc: Fix inserting of empty frame for launchtime
	riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core
	riscv: bpf: Avoid breaking W^X
	bpf, riscv: Support riscv jit to provide bpf_line_info
	riscv, bpf: Fix inconsistent JIT image generation
	erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF
	wifi: airo: avoid uninitialized warning in airo_get_rate()
	net/sched: flower: Ensure both minimum and maximum ports are specified
	netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
	net/sched: make psched_mtu() RTNL-less safe
	net/sched: sch_qfq: refactor parsing of netlink parameters
	net/sched: sch_qfq: account for stab overhead in qfq_enqueue
	nvme-pci: fix DMA direction of unmapping integrity data
	f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io()
	pinctrl: amd: Fix mistake in handling clearing pins at startup
	pinctrl: amd: Detect internal GPIO0 debounce handling
	pinctrl: amd: Only use special debounce behavior for GPIO 0
	tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
	mtd: rawnand: meson: fix unaligned DMA buffers handling
	net: bcmgenet: Ensure MDIO unregistration has clocks enabled
	powerpc: Fail build if using recordmcount with binutils v2.37
	misc: fastrpc: Create fastrpc scalar with correct buffer count
	erofs: fix compact 4B support for 16k block size
	MIPS: Loongson: Fix cpu_probe_loongson() again
	ext4: Fix reusing stale buffer heads from last failed mounting
	ext4: fix wrong unit use in ext4_mb_clear_bb
	ext4: get block from bh in ext4_free_blocks for fast commit replay
	ext4: fix wrong unit use in ext4_mb_new_blocks
	ext4: only update i_reserved_data_blocks on successful block allocation
	jfs: jfs_dmap: Validate db_l2nbperpage while mounting
	hwrng: imx-rngc - fix the timeout for init and self check
	PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9235
	PCI: qcom: Disable write access to read only registers for IP v2.3.3
	PCI: rockchip: Assert PCI Configuration Enable bit after probe
	PCI: rockchip: Write PCI Device ID to correct register
	PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
	PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core
	PCI: rockchip: Use u32 variable to access 32-bit registers
	PCI: rockchip: Set address alignment for endpoint mode
	misc: pci_endpoint_test: Free IRQs before removing the device
	misc: pci_endpoint_test: Re-init completion for every test
	md/raid0: add discard support for the 'original' layout
	fs: dlm: return positive pid value for F_GETLK
	drm/atomic: Allow vblank-enabled + self-refresh "disable"
	drm/rockchip: vop: Leave vblank enabled in self-refresh
	drm/amd/display: Correct `DMUB_FW_VERSION` macro
	serial: atmel: don't enable IRQs prematurely
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk
	firmware: stratix10-svc: Fix a potential resource leak in svc_create_memory_pool()
	ceph: don't let check_caps skip sending responses for revoke msgs
	xhci: Fix resume issue of some ZHAOXIN hosts
	xhci: Fix TRB prefetch issue of ZHAOXIN hosts
	xhci: Show ZHAOXIN xHCI root hub speed correctly
	meson saradc: fix clock divider mask length
	Revert "8250: add support for ASIX devices with a FIFO bug"
	s390/decompressor: fix misaligned symbol build error
	tracing/histograms: Add histograms to hist_vars if they have referenced variables
	samples: ftrace: Save required argument registers in sample trampolines
	net: ena: fix shift-out-of-bounds in exponential backoff
	ring-buffer: Fix deadloop issue on reading trace_pipe
	xtensa: ISS: fix call to split_if_spec
	tracing: Fix null pointer dereference in tracing_err_log_open()
	tracing/probes: Fix not to count error code to total length
	scsi: qla2xxx: Wait for io return on terminate rport
	scsi: qla2xxx: Array index may go out of bound
	scsi: qla2xxx: Fix buffer overrun
	scsi: qla2xxx: Fix potential NULL pointer dereference
	scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport()
	scsi: qla2xxx: Correct the index of array
	scsi: qla2xxx: Pointer may be dereferenced
	scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
	net/sched: sch_qfq: reintroduce lmax bound check for MTU
	RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
	drm/atomic: Fix potential use-after-free in nonblocking commits
	ALSA: hda/realtek - remove 3k pull low procedure
	ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx
	keys: Fix linking a duplicate key to a keyring's assoc_array
	perf probe: Add test for regression introduced by switch to die_get_decl_file()
	btrfs: fix warning when putting transaction with qgroups enabled after abort
	fuse: revalidate: don't invalidate if interrupted
	selftests: tc: set timeout to 15 minutes
	selftests: tc: add 'ct' action kconfig dep
	regmap: Drop initial version of maximum transfer length fixes
	regmap: Account for register length in SMBus I/O limits
	can: bcm: Fix UAF in bcm_proc_show()
	drm/client: Fix memory leak in drm_client_target_cloned
	drm/client: Fix memory leak in drm_client_modeset_probe
	ASoC: fsl_sai: Disable bit clock with transmitter
	ext4: correct inline offset when handling xattrs in inode body
	debugobjects: Recheck debug_objects_enabled before reporting
	nbd: Add the maximum limit of allocated index in nbd_dev_add
	md: fix data corruption for raid456 when reshape restart while grow up
	md/raid10: prevent soft lockup while flush writes
	posix-timers: Ensure timer ID search-loop limit is valid
	btrfs: add xxhash to fast checksum implementations
	ACPI: button: Add lid disable DMI quirk for Nextbook Ares 8A
	ACPI: video: Add backlight=native DMI quirk for Apple iMac11,3
	ACPI: video: Add backlight=native DMI quirk for Lenovo ThinkPad X131e (3371 AMD version)
	arm64: set __exception_irq_entry with __irq_entry as a default
	arm64: mm: fix VA-range sanity check
	sched/fair: Don't balance task to its current running CPU
	wifi: ath11k: fix registration of 6Ghz-only phy without the full channel range
	bpf: Address KCSAN report on bpf_lru_list
	devlink: report devlink_port_type_warn source device
	wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()
	wifi: iwlwifi: mvm: avoid baid size integer overflow
	igb: Fix igb_down hung on surprise removal
	spi: bcm63xx: fix max prepend length
	fbdev: imxfb: warn about invalid left/right margin
	pinctrl: amd: Use amd_pinconf_set() for all config options
	net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
	bridge: Add extack warning when enabling STP in netns.
	iavf: Fix use-after-free in free_netdev
	iavf: Fix out-of-bounds when setting channels on remove
	security: keys: Modify mismatched function name
	octeontx2-pf: Dont allocate BPIDs for LBK interfaces
	tcp: annotate data-races around tcp_rsk(req)->ts_recent
	net: ipv4: Use kfree_sensitive instead of kfree
	net:ipv6: check return value of pskb_trim()
	Revert "tcp: avoid the lookup process failing to get sk in ehash table"
	fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
	llc: Don't drop packet from non-root netns.
	netfilter: nf_tables: fix spurious set element insertion failure
	netfilter: nf_tables: can't schedule in nft_chain_validate
	netfilter: nft_set_pipapo: fix improper element removal
	netfilter: nf_tables: skip bound chain in netns release path
	netfilter: nf_tables: skip bound chain on rule flush
	tcp: annotate data-races around tp->tcp_tx_delay
	tcp: annotate data-races around tp->keepalive_time
	tcp: annotate data-races around tp->keepalive_intvl
	tcp: annotate data-races around tp->keepalive_probes
	net: Introduce net.ipv4.tcp_migrate_req.
	tcp: Fix data-races around sysctl_tcp_syn(ack)?_retries.
	tcp: annotate data-races around icsk->icsk_syn_retries
	tcp: annotate data-races around tp->linger2
	tcp: annotate data-races around rskq_defer_accept
	tcp: annotate data-races around tp->notsent_lowat
	tcp: annotate data-races around icsk->icsk_user_timeout
	tcp: annotate data-races around fastopenq.max_qlen
	net: phy: prevent stale pointer dereference in phy_init()
	tracing/histograms: Return an error if we fail to add histogram to hist_vars list
	tracing: Fix memory leak of iter->temp when reading trace_pipe
	ftrace: Store the order of pages allocated in ftrace_page
	ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
	Linux 5.10.188

Change-Id: Ibcc1adc43df5b8f649b12078eedd5d4f57de4578
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-08-03 11:23:27 +00:00
Kuniyuki Iwashima
cf6c06ac74 net: Introduce net.ipv4.tcp_migrate_req.
[ Upstream commit f9ac779f881c2ec3d1cdcd7fa9d4f9442bf60e80 ]

This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this
option is enabled or eBPF program is attached, we will be able to migrate
child sockets from a listener to another in the same reuseport group after
close() or shutdown() syscalls.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210612123224.12525-2-kuniyu@amazon.co.jp
Stable-dep-of: 3a037f0f3c4b ("tcp: annotate data-races around icsk->icsk_syn_retries")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:44:42 +02:00
Greg Kroah-Hartman
b7321283a9 Merge 5.10.184 into android12-5.10-lts
Changes in 5.10.184
	ata: ahci: fix enum constants for gcc-13
	gcc-plugins: Reorganize gimple includes for GCC 13
	sfc (gcc13): synchronize ef100_enqueue_skb()'s return type
	remove the sx8 block driver
	bonding (gcc13): synchronize bond_{a,t}lb_xmit() types
	f2fs: fix iostat lock protection
	blk-iocost: avoid 64-bit division in ioc_timer_fn
	block/blk-iocost (gcc13): keep large values in a new enum
	i40iw: fix build warning in i40iw_manage_apbvt()
	i40e: fix build warnings in i40e_alloc.h
	i40e: fix build warning in ice_fltr_add_mac_to_list()
	staging: vchiq_core: drop vchiq_status from vchiq_initialise
	spi: qup: Request DMA before enabling clocks
	afs: Fix setting of mtime when creating a file/dir/symlink
	wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
	neighbour: fix unaligned access to pneigh_entry
	net: dsa: lan9303: allow vid != 0 in port_fdb_{add|del} methods
	net/smc: Avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT
	net/sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values
	Bluetooth: Fix l2cap_disconnect_req deadlock
	Bluetooth: L2CAP: Add missing checks for invalid DCID
	qed/qede: Fix scheduling while atomic
	netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
	netfilter: ipset: Add schedule point in call_ad().
	ipv6: rpl: Fix Route of Death.
	rfs: annotate lockless accesses to sk->sk_rxhash
	rfs: annotate lockless accesses to RFS sock flow table
	net: sched: move rtm_tca_policy declaration to include file
	net: sched: fix possible refcount leak in tc_chain_tmplt_add()
	bpf: Add extra path pointer check to d_path helper
	lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
	bnxt_en: Don't issue AP reset during ethtool's reset operation
	bnxt_en: Query default VLAN before VNIC setup on a VF
	bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
	batman-adv: Broken sync while rescheduling delayed work
	Input: xpad - delete a Razer DeathAdder mouse VID/PID entry
	Input: psmouse - fix OOB access in Elantech protocol
	ALSA: hda/realtek: Add a quirk for HP Slim Desktop S01
	ALSA: hda/realtek: Add Lenovo P3 Tower platform
	drm/amdgpu: fix xclk freq on CHIP_STONEY
	can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket
	can: j1939: change j1939_netdev_lock type to mutex
	can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
	ceph: fix use-after-free bug for inodes when flushing capsnaps
	s390/dasd: Use correct lock while counting channel queue length
	Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
	Bluetooth: hci_qca: fix debugfs registration
	tee: amdtee: Add return_origin to 'struct tee_cmd_load_ta'
	rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting
	rbd: get snapshot context after exclusive lock is ensured to be held
	pinctrl: meson-axg: add missing GPIOA_18 gpio group
	usb: usbfs: Enforce page requirements for mmap
	usb: usbfs: Use consistent mmap functions
	staging: vc04_services: fix gcc-13 build warning
	ASoC: codecs: wsa881x: do not set can_multi_write flag
	i2c: sprd: Delete i2c adapter in .remove's error path
	eeprom: at24: also select REGMAP
	riscv: fix kprobe __user string arg print fault issue
	vhost: support PACKED when setting-getting vring_base
	Revert "ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled"
	ext4: only check dquot_initialize_needed() when debugging
	tcp: fix tcp_min_tso_segs sysctl
	xfs: verify buffer contents when we skip log replay
	MIPS: locking/atomic: Fix atomic{_64,}_sub_if_positive
	drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
	btrfs: check return value of btrfs_commit_transaction in relocation
	btrfs: unset reloc control if transaction commit fails in prepare_to_relocate()
	Revert "staging: rtl8192e: Replace macro RTL_PCI_DEVICE with PCI_DEVICE"
	Linux 5.10.184

Change-Id: If2d013f1bba8d713f8935810a5887f80eabae81c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-28 07:16:23 +00:00
Eric Dumazet
58e8cf94de tcp: fix tcp_min_tso_segs sysctl
commit d24f511b04b8b159b705ec32a3b8782667d1b06a upstream.

tcp_min_tso_segs is now stored in u8, so max value is 255.

255 limit is enforced by proc_dou8vec_minmax().

We can therefore remove the gso_max_segs variable.

Fixes: 47996b489bdc ("tcp: convert elligible sysctls to u8")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-14 11:09:59 +02:00
Greg Kroah-Hartman
a4023d8fc3 Revert "ipv4: shrink netns_ipv4 with sysctl conversions"
This reverts commit f662a0786d which is
commit 4b6bbf17d4e1939afa72821879fc033d725e9491 upstream.

It breaks the Android kernel abi, and is not needed at this point in
time.  If it is required, it can come back in an ABI-safe way.

Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I762a8a7eaee1baa7dbedc9c72854b85cfffbe7d9
2023-04-28 14:29:47 +00:00
Greg Kroah-Hartman
e2f3aab65b Revert "tcp: convert elligible sysctls to u8"
This reverts commit cc9f9a49f5 which is
commit 4ecc1baf362c5df2dcabe242511e38ee28486545 upstream.

It breaks the Android kernel abi, and is not needed at this point in
time.  If it is required, it can come back in an ABI-safe way.

Bug: 161946584
Change-Id: Ia44a4d7ef5887b2175038783e918badf464ae51b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-27 11:34:15 +00:00
Greg Kroah-Hartman
036fa20734 Revert "tcp: restrict net.ipv4.tcp_app_win"
This reverts commit a069d4d98c which is
commit dc5110c2d959c1707e12df5f792f41d90614adaa upstream.

It breaks the Android kernel abi, and is not needed at this point in
time.  If it is required, it can come back in an ABI-safe way.

Bug: 161946584
Change-Id: I7ef145a192f15df2179150f9f0087746725d27ea
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-27 11:33:48 +00:00
Greg Kroah-Hartman
2d6a4ad08c Merge 5.10.178 into android12-5.10-lts
Changes in 5.10.178
	gpio: GPIO_REGMAP: select REGMAP instead of depending on it
	Drivers: vmbus: Check for channel allocation before looking up relids
	pwm: cros-ec: Explicitly set .polarity in .get_state()
	pwm: sprd: Explicitly set .polarity in .get_state()
	KVM: s390: pv: fix external interruption loop not always detected
	wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
	net: qrtr: combine nameservice into main module
	net: qrtr: Fix a refcount bug in qrtr_recvmsg()
	icmp: guard against too small mtu
	net: don't let netpoll invoke NAPI if in xmit context
	sctp: check send stream number after wait_for_sndbuf
	net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT
	ipv6: Fix an uninit variable access bug in __ip6_make_skb()
	gpio: davinci: Add irq chip flag to skip set wake
	net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
	net: stmmac: fix up RX flow hash indirection table when setting channels
	sunrpc: only free unix grouplist after RCU settles
	NFSD: callback request does not use correct credential for AUTH_SYS
	usb: xhci: tegra: fix sleep in atomic call
	xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
	USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
	usb: typec: altmodes/displayport: Fix configure initial pin assignment
	USB: serial: option: add Telit FE990 compositions
	USB: serial: option: add Quectel RM500U-CN modem
	iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
	iio: dac: cio-dac: Fix max DAC write value check for 12-bit
	iio: light: cm32181: Unregister second I2C client if present
	tty: serial: sh-sci: Fix transmit end interrupt handler
	tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
	tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
	nilfs2: fix sysfs interface lifetime
	dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
	ALSA: hda/realtek: Add quirk for Clevo X370SNW
	iio: adc: ad7791: fix IRQ flags
	scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
	perf/core: Fix the same task check in perf_event_set_output
	ftrace: Mark get_lock_parent_ip() __always_inline
	ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
	can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
	can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
	tracing: Free error logs of tracing instances
	ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
	drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
	drm/nouveau/disp: Support more modes by checking with lower bpc
	ring-buffer: Fix race while reader and writer are on the same page
	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
	selftests: intel_pstate: ftime() is deprecated
	drm/bridge: lt9611: Fix PLL being unable to lock
	Revert "media: ti: cal: fix possible memory leak in cal_ctx_create()"
	ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
	bpftool: Print newline before '}' for struct with padding only fields
	Revert "pinctrl: amd: Disable and mask interrupts on resume"
	ALSA: emu10k1: fix capture interrupt handler unlinking
	ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
	ALSA: i2c/cs8427: fix iec958 mixer control deactivation
	ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
	ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
	Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
	Bluetooth: Fix race condition in hidp_session_thread
	btrfs: print checksum type and implementation at mount time
	btrfs: fix fast csum implementation detection
	fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
	mtdblock: tolerate corrected bit-flips
	mtd: rawnand: meson: fix bitmask for length in command word
	mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
	mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
	clk: sprd: set max_register according to mapping range
	IB/mlx5: Add support for NDR link speed
	IB/mlx5: Add support for 400G_8X lane speed
	RDMA/cma: Allow UD qp_type to join multicast only
	9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
	niu: Fix missing unwind goto in niu_alloc_channels()
	sysctl: add proc_dou8vec_minmax()
	ipv4: shrink netns_ipv4 with sysctl conversions
	tcp: convert elligible sysctls to u8
	tcp: restrict net.ipv4.tcp_app_win
	drm/armada: Fix a potential double free in an error handling path
	qlcnic: check pci_reset_function result
	net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
	sctp: fix a potential overflow in sctp_ifwdtsn_skip
	RDMA/core: Fix GID entry ref leak when create_ah fails
	udp6: fix potential access to stale information
	net: macb: fix a memory corruption in extended buffer descriptor mode
	libbpf: Fix single-line struct definition output in btf_dump
	power: supply: cros_usbpd: reclassify "default case!" as debug
	wifi: mwifiex: mark OF related data as maybe unused
	i2c: imx-lpi2c: clean rx/tx buffers upon new message
	efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
	drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
	verify_pefile: relax wrapper length check
	asymmetric_keys: log on fatal failures in PE/pkcs7
	riscv: add icache flush for nommu sigreturn trampoline
	net: sfp: initialize sfp->i2c_block_size at sfp allocation
	scsi: ses: Handle enclosure with just a primary component gracefully
	x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
	cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
	ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
	mtd: ubi: wl: Fix a couple of kernel-doc issues
	ubi: Fix deadlock caused by recursively holding work_sem
	powerpc/pseries: rename min_common_depth to primary_domain_index
	powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
	powerpc/pseries: Consolidate different NUMA distance update code paths
	powerpc/pseries: Add a helper for form1 cpu distance
	powerpc/pseries: Add support for FORM2 associativity
	powerpc/papr_scm: Update the NUMA distance table for the target node
	sched/fair: Move calculate of avg_load to a better location
	sched/fair: Fix imbalance overflow
	x86/rtc: Remove __init for runtime functions
	i2c: ocores: generate stop condition after timeout in polling mode
	watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
	coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
	kbuild: check the minimum assembler version in Kconfig
	kbuild: Switch to 'f' variants of integrated assembler flag
	kbuild: check CONFIG_AS_IS_LLVM instead of LLVM_IAS
	riscv: Handle zicsr/zifencei issues between clang and binutils
	kexec: move locking into do_kexec_load
	kexec: turn all kexec_mutex acquisitions into trylocks
	panic, kexec: make __crash_kexec() NMI safe
	sysctl: Fix data-races in proc_dou8vec_minmax().
	Linux 5.10.178

Change-Id: I34107ee680c7b081bb0c2782483cbb7ec62252ca
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-25 16:47:24 +00:00
YueHaibing
a069d4d98c tcp: restrict net.ipv4.tcp_app_win
[ Upstream commit dc5110c2d959c1707e12df5f792f41d90614adaa ]

UBSAN: shift-out-of-bounds in net/ipv4/tcp_input.c:555:23
shift exponent 255 is too large for 32-bit type 'int'
CPU: 1 PID: 7907 Comm: ssh Not tainted 6.3.0-rc4-00161-g62bad54b26db-dirty #206
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x136/0x150
 __ubsan_handle_shift_out_of_bounds+0x21f/0x5a0
 tcp_init_transfer.cold+0x3a/0xb9
 tcp_finish_connect+0x1d0/0x620
 tcp_rcv_state_process+0xd78/0x4d60
 tcp_v4_do_rcv+0x33d/0x9d0
 __release_sock+0x133/0x3b0
 release_sock+0x58/0x1b0

'maxwin' is int, shifting int for 32 or more bits is undefined behaviour.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:10:26 +02:00
Eric Dumazet
cc9f9a49f5 tcp: convert elligible sysctls to u8
[ Upstream commit 4ecc1baf362c5df2dcabe242511e38ee28486545 ]

Many tcp sysctls are either bools or small ints that can fit into u8.

Reducing space taken by sysctls can save few cache line misses
when sending/receiving data while cpu caches are empty,
for example after cpu idle period.

This is hard to measure with typical network performance tests,
but after this patch, struct netns_ipv4 has shrunk
by three cache lines.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: dc5110c2d959 ("tcp: restrict net.ipv4.tcp_app_win")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:10:26 +02:00
Eric Dumazet
f662a0786d ipv4: shrink netns_ipv4 with sysctl conversions
[ Upstream commit 4b6bbf17d4e1939afa72821879fc033d725e9491 ]

These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.

 - icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
 - icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
 - tcp_ecn, tcp_ecn_fallback
 - ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
 - ip_nonlocal_bind, ip_autobind_reuse
 - ip_dynaddr, ip_early_demux, raw_l3mdev_accept
 - nexthop_compat_mode, fwmark_reflect

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: dc5110c2d959 ("tcp: restrict net.ipv4.tcp_app_win")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:10:26 +02:00
Eric Biggers
f466ca1247 Merge 5.10.154 into android12-5.10-lts
Changes in 5.10.154
	serial: 8250: Let drivers request full 16550A feature probing
	serial: ar933x: Deassert Transmit Enable on ->rs485_config()
	KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01
	KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02
	KVM: x86: Trace re-injected exceptions
	KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)
	x86/topology: Set cpu_die_id only if DIE_TYPE found
	x86/topology: Fix multiple packages shown on a single-package system
	x86/topology: Fix duplicated core ID within a package
	KVM: x86: Protect the unused bits in MSR exiting flags
	KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
	KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
	RDMA/cma: Use output interface for net_dev check
	IB/hfi1: Correctly move list in sc_disable()
	NFSv4: Fix a potential state reclaim deadlock
	NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
	NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
	nfs4: Fix kmemleak when allocate slot failed
	net: dsa: Fix possible memory leaks in dsa_loop_init()
	RDMA/core: Fix null-ptr-deref in ib_core_cleanup()
	RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()
	nfc: fdp: drop ftrace-like debugging messages
	nfc: fdp: Fix potential memory leak in fdp_nci_send()
	NFC: nxp-nci: remove unnecessary labels
	nfc: nxp-nci: Fix potential memory leak in nxp_nci_send()
	nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send()
	nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send()
	net: fec: fix improper use of NETDEV_TX_BUSY
	ata: pata_legacy: fix pdc20230_set_piomode()
	net: sched: Fix use after free in red_enqueue()
	net: tun: fix bugs for oversize packet when napi frags enabled
	netfilter: nf_tables: release flow rule object from commit path
	ipvs: use explicitly signed chars
	ipvs: fix WARNING in __ip_vs_cleanup_batch()
	ipvs: fix WARNING in ip_vs_app_net_cleanup()
	rose: Fix NULL pointer dereference in rose_send_frame()
	mISDN: fix possible memory leak in mISDN_register_device()
	isdn: mISDN: netjet: fix wrong check of device registration
	btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
	btrfs: fix inode list leak during backref walking at find_parent_nodes()
	btrfs: fix ulist leaks in error paths of qgroup self tests
	Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
	Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
	net: mdio: fix undefined behavior in bit shift for __mdiobus_register
	net, neigh: Fix null-ptr-deref in neigh_table_clear()
	ipv6: fix WARNING in ip6_route_net_exit_late()
	drm/msm/hdmi: Remove spurious IRQF_ONESHOT flag
	drm/msm/hdmi: fix IRQ lifetime
	mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus
	mmc: sdhci-pci: Avoid comma separated statements
	mmc: sdhci-pci-core: Disable ES for ASUS BIOS on Jasper Lake
	video/fbdev/stifb: Implement the stifb_fillrect() function
	fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
	mtd: parsers: bcm47xxpart: print correct offset on read error
	mtd: parsers: bcm47xxpart: Fix halfblock reads
	xhci-pci: Set runtime PM as default policy on all xHC 1.2 or later devices
	s390/boot: add secure boot trailer
	media: rkisp1: Initialize color space on resizer sink and source pads
	media: rkisp1: Zero v4l2_subdev_format fields in when validating links
	media: s5p_cec: limit msg.len to CEC_MAX_MSG_SIZE
	media: cros-ec-cec: limit msg.len to CEC_MAX_MSG_SIZE
	media: dvb-frontends/drxk: initialize err to 0
	media: meson: vdec: fix possible refcount leak in vdec_probe()
	ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
	scsi: core: Restrict legal sdev_state transitions via sysfs
	HID: saitek: add madcatz variant of MMO7 mouse device ID
	drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
	i2c: xiic: Add platform module alias
	efi/tpm: Pass correct address to memblock_reserve
	ARM: dts: imx6qdl-gw59{10,13}: fix user pushbutton GPIO offset
	firmware: arm_scmi: Suppress the driver's bind attributes
	firmware: arm_scmi: Make Rx chan_setup fail on memory errors
	arm64: dts: juno: Add thermal critical trip points
	i2c: piix4: Fix adapter not be removed in piix4_remove()
	Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
	Bluetooth: L2CAP: Fix attempting to access uninitialized memory
	block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
	ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
	fscrypt: simplify master key locking
	fscrypt: stop using keyrings subsystem for fscrypt_master_key
	fscrypt: fix keyring memory leak on mount failure
	tcp/udp: Fix memory leak in ipv6_renew_options().
	mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times
	memcg: enable accounting of ipc resources
	binder: fix UAF of alloc->vma in race with munmap()
	coresight: cti: Fix hang in cti_disable_hw()
	btrfs: fix type of parameter generation in btrfs_get_dentry
	ftrace: Fix use-after-free for dynamic ftrace_ops
	tcp/udp: Make early_demux back namespacified.
	tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
	kprobe: reverse kp->flags when arm_kprobe failed
	tools/nolibc/string: Fix memcmp() implementation
	tracing/histogram: Update document for KEYS_MAX size
	capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
	fuse: add file_modified() to fallocate
	efi: random: reduce seed size to 32 bytes
	efi: random: Use 'ACPI reclaim' memory for random seed
	perf/x86/intel: Fix pebs event constraints for ICL
	perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
	parisc: Make 8250_gsc driver dependend on CONFIG_PARISC
	parisc: Export iosapic_serial_irq() symbol for serial port driver
	parisc: Avoid printing the hardware path twice
	ext4: fix warning in 'ext4_da_release_space'
	ext4: fix BUG_ON() when directory entry has invalid rec_len
	KVM: x86: Mask off reserved bits in CPUID.80000006H
	KVM: x86: Mask off reserved bits in CPUID.8000001AH
	KVM: x86: Mask off reserved bits in CPUID.80000008H
	KVM: x86: Mask off reserved bits in CPUID.80000001H
	KVM: x86: emulator: em_sysexit should update ctxt->mode
	KVM: x86: emulator: introduce emulator_recalc_and_set_mode
	KVM: x86: emulator: update the emulation mode after CR0 write
	ext4,f2fs: fix readahead of verity data
	drm/rockchip: dsi: Force synchronous probe
	drm/i915/sdvo: Filter out invalid outputs more sensibly
	drm/i915/sdvo: Setup DDC fully before output init
	wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker()
	ipc: remove memcg accounting for sops objects in do_semtimedop()
	Linux 5.10.154

Change-Id: I6965878bf3bad857fbdbcdeb7dd066cc280aa026
Signed-off-by: Eric Biggers <ebiggers@google.com>
2022-11-29 23:38:14 +00:00
Kuniyuki Iwashima
2bf33b5ea4 tcp/udp: Make early_demux back namespacified.
commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d upstream.

Commit e21145a987 ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis.  Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb3 ("net: Add sysctl to toggle early demux for
tcp and udp").  However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.

We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL.  When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable.  Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic.  Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.

This patch namespacifies (tcp|udp)_early_demux again.  For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline.  So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.

Fixes: dddb64bcb3 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10 18:14:26 +01:00
Greg Kroah-Hartman
f6ce9a9115 Merge 5.10.134 into android12-5.10-lts
Changes in 5.10.134
	pinctrl: stm32: fix optional IRQ support to gpios
	riscv: add as-options for modules with assembly compontents
	mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
	lockdown: Fix kexec lockdown bypass with ima policy
	io_uring: Use original task for req identity in io_identity_cow()
	xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
	docs: net: explain struct net_device lifetime
	net: make free_netdev() more lenient with unregistering devices
	net: make sure devices go through netdev_wait_all_refs
	net: move net_set_todo inside rollback_registered()
	net: inline rollback_registered()
	net: move rollback_registered_many()
	net: inline rollback_registered_many()
	Revert "m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch"
	PCI: hv: Fix multi-MSI to allow more than one MSI vector
	PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
	PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
	PCI: hv: Fix interrupt mapping for multi-MSI
	serial: mvebu-uart: correctly report configured baudrate value
	xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup()
	power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
	pinctrl: ralink: Check for null return of devm_kcalloc
	perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
	drm/amdgpu/display: add quirk handling for stutter mode
	igc: Reinstate IGC_REMOVED logic and implement it properly
	ip: Fix data-races around sysctl_ip_no_pmtu_disc.
	ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
	ip: Fix data-races around sysctl_ip_fwd_update_priority.
	ip: Fix data-races around sysctl_ip_nonlocal_bind.
	ip: Fix a data-race around sysctl_ip_autobind_reuse.
	ip: Fix a data-race around sysctl_fwmark_reflect.
	tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
	tcp: Fix data-races around sysctl_tcp_mtu_probing.
	tcp: Fix data-races around sysctl_tcp_base_mss.
	tcp: Fix data-races around sysctl_tcp_min_snd_mss.
	tcp: Fix a data-race around sysctl_tcp_mtu_probe_floor.
	tcp: Fix a data-race around sysctl_tcp_probe_threshold.
	tcp: Fix a data-race around sysctl_tcp_probe_interval.
	net: stmmac: fix unbalanced ptp clock issue in suspend/resume flow
	i2c: cadence: Change large transfer count reset logic to be unconditional
	net: stmmac: fix dma queue left shift overflow issue
	net/tls: Fix race in TLS device down flow
	igmp: Fix data-races around sysctl_igmp_llm_reports.
	igmp: Fix a data-race around sysctl_igmp_max_memberships.
	igmp: Fix data-races around sysctl_igmp_max_msf.
	tcp: Fix data-races around keepalive sysctl knobs.
	tcp: Fix data-races around sysctl_tcp_syncookies.
	tcp: Fix data-races around sysctl_tcp_reordering.
	tcp: Fix data-races around some timeout sysctl knobs.
	tcp: Fix a data-race around sysctl_tcp_notsent_lowat.
	tcp: Fix a data-race around sysctl_tcp_tw_reuse.
	tcp: Fix data-races around sysctl_max_syn_backlog.
	tcp: Fix data-races around sysctl_tcp_fastopen.
	tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout.
	iavf: Fix handling of dummy receive descriptors
	i40e: Fix erroneous adapter reinitialization during recovery process
	ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero
	gpio: pca953x: only use single read/write for No AI mode
	gpio: pca953x: use the correct range when do regmap sync
	gpio: pca953x: use the correct register address when regcache sync during init
	be2net: Fix buffer overflow in be_get_module_eeprom
	drm/imx/dcss: Add missing of_node_put() in fail path
	ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
	ip: Fix data-races around sysctl_ip_prot_sock.
	udp: Fix a data-race around sysctl_udp_l3mdev_accept.
	tcp: Fix data-races around sysctl knobs related to SYN option.
	tcp: Fix a data-race around sysctl_tcp_early_retrans.
	tcp: Fix data-races around sysctl_tcp_recovery.
	tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
	tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
	tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
	tcp: Fix a data-race around sysctl_tcp_stdurg.
	tcp: Fix a data-race around sysctl_tcp_rfc1337.
	tcp: Fix data-races around sysctl_tcp_max_reordering.
	spi: bcm2835: bcm2835_spi_handle_err(): fix NULL pointer deref for non DMA transfers
	KVM: Don't null dereference ops->destroy
	mm/mempolicy: fix uninit-value in mpol_rebind_policy()
	bpf: Make sure mac_header was set before using it
	sched/deadline: Fix BUG_ON condition for deboosted tasks
	x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
	dlm: fix pending remove if msg allocation fails
	drm/imx/dcss: fix unused but set variable warnings
	bitfield.h: Fix "type of reg too small for mask" test
	ALSA: memalloc: Align buffer allocations in page size
	Bluetooth: Add bt_skb_sendmsg helper
	Bluetooth: Add bt_skb_sendmmsg helper
	Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg
	Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg
	Bluetooth: Fix passing NULL to PTR_ERR
	Bluetooth: SCO: Fix sco_send_frame returning skb->len
	Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
	x86/amd: Use IBPB for firmware calls
	x86/alternative: Report missing return thunk details
	watchqueue: make sure to serialize 'wqueue->defunct' properly
	tty: drivers/tty/, stop using tty_schedule_flip()
	tty: the rest, stop using tty_schedule_flip()
	tty: drop tty_schedule_flip()
	tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
	tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
	net: usb: ax88179_178a needs FLAG_SEND_ZLP
	watch-queue: remove spurious double semicolon
	Linux 5.10.134

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I55defdcdd6658e3ec9a3684b7e8cdfe114772a19
2022-08-03 12:42:13 +02:00
Kuniyuki Iwashima
9add240f76 ip: Fix data-races around sysctl_ip_prot_sock.
[ Upstream commit 9b55c20f83369dd54541d9ddbe3a018a8377f451 ]

sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

Fixes: 4548b683b7 ("Introduce a sysctl that modifies the value of PROT_SOCK.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-29 17:19:20 +02:00
Greg Kroah-Hartman
32b16a3a3f Merge 5.10.32 into android12-5.10
Changes in 5.10.32
	net/sctp: fix race condition in sctp_destroy_sock
	mtd: rawnand: mtk: Fix WAITRDY break condition and timeout
	Input: nspire-keypad - enable interrupts only when opened
	gpio: sysfs: Obey valid_mask
	dmaengine: idxd: Fix clobbering of SWERR overflow bit on writeback
	dmaengine: idxd: fix delta_rec and crc size field for completion record
	dmaengine: idxd: fix opcap sysfs attribute output
	dmaengine: idxd: fix wq size store permission state
	dmaengine: dw: Make it dependent to HAS_IOMEM
	dmaengine: Fix a double free in dma_async_device_register
	dmaengine: plx_dma: add a missing put_device() on error path
	dmaengine: idxd: fix wq cleanup of WQCFG registers
	ACPI: x86: Call acpi_boot_table_init() after acpi_table_upgrade()
	ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race
	ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
	lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
	arc: kernel: Return -EFAULT if copy_to_user() fails
	iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd()
	xfrm: BEET mode doesn't support fragments for inner packets
	ASoC: max98373: Changed amp shutdown register as volatile
	ASoC: max98373: Added 30ms turn on/off time delay
	gpu/xen: Fix a use after free in xen_drm_drv_init
	neighbour: Disregard DEAD dst in neigh_update
	ARM: keystone: fix integer overflow warning
	ARM: omap1: fix building with clang IAS
	drm/msm: Fix a5xx/a6xx timestamps
	ASoC: fsl_esai: Fix TDM slot setup for I2S mode
	scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state
	iwlwifi: add support for Qu with AX201 device
	net: ieee802154: stop dump llsec keys for monitors
	net: ieee802154: forbid monitor for add llsec key
	net: ieee802154: forbid monitor for del llsec key
	net: ieee802154: stop dump llsec devs for monitors
	net: ieee802154: forbid monitor for add llsec dev
	net: ieee802154: forbid monitor for del llsec dev
	net: ieee802154: stop dump llsec devkeys for monitors
	net: ieee802154: forbid monitor for add llsec devkey
	net: ieee802154: forbid monitor for del llsec devkey
	net: ieee802154: stop dump llsec seclevels for monitors
	net: ieee802154: forbid monitor for add llsec seclevel
	pcnet32: Use pci_resource_len to validate PCI resource
	mac80211: clear sta->fast_rx when STA removed from 4-addr VLAN
	virt_wifi: Return micros for BSS TSF values
	lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS
	Input: s6sy761 - fix coordinate read bit shift
	Input: i8042 - fix Pegatron C15B ID entry
	HID: wacom: set EV_KEY and EV_ABS only for non-HID_GENERIC type of devices
	dm verity fec: fix misaligned RS roots IO
	readdir: make sure to verify directory entry for legacy interfaces too
	arm64: fix inline asm in load_unaligned_zeropad()
	arm64: alternatives: Move length validation in alternative_{insn, endif}
	vfio/pci: Add missing range check in vfio_pci_mmap
	riscv: Fix spelling mistake "SPARSEMEM" to "SPARSMEM"
	scsi: libsas: Reset num_scatter if libata marks qc as NODATA
	ixgbe: fix unbalanced device enable/disable in suspend/resume
	netfilter: flowtable: fix NAT IPv6 offload mangling
	netfilter: conntrack: do not print icmpv6 as unknown via /proc
	ice: Fix potential infinite loop when using u8 loop counter
	libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC
	netfilter: bridge: add pre_exit hooks for ebtable unregistration
	netfilter: arp_tables: add pre_exit hook for table unregister
	libbpf: Fix potential NULL pointer dereference
	net: macb: fix the restore of cmp registers
	net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
	netfilter: nft_limit: avoid possible divide error in nft_limit_init
	net/mlx5e: Fix setting of RS FEC mode
	net: davicom: Fix regulator not turned off on failed probe
	net: sit: Unregister catch-all devices
	net: ip6_tunnel: Unregister catch-all devices
	mm: ptdump: fix build failure
	net: Make tcp_allowed_congestion_control readonly in non-init netns
	i40e: fix the panic when running bpf in xdpdrv mode
	ethtool: pause: make sure we init driver stats
	ia64: remove duplicate entries in generic_defconfig
	ia64: tools: remove inclusion of ia64-specific version of errno.h header
	ibmvnic: avoid calling napi_disable() twice
	ibmvnic: remove duplicate napi_schedule call in do_reset function
	ibmvnic: remove duplicate napi_schedule call in open function
	ch_ktls: Fix kernel panic
	ch_ktls: fix device connection close
	ch_ktls: tcb close causes tls connection failure
	ch_ktls: do not send snd_una update to TCB in middle
	gro: ensure frag0 meets IP header alignment
	ARM: OMAP2+: Fix warning for omap_init_time_of()
	ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
	ARM: footbridge: fix PCI interrupt mapping
	ARM: OMAP2+: Fix uninitialized sr_inst
	arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
	arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
	bpf: Use correct permission flag for mixed signed bounds arithmetic
	KVM: VMX: Convert vcpu_vmx.exit_reason to a union
	KVM: VMX: Don't use vcpu->run->internal.ndata as an array index
	r8169: tweak max read request size for newer chips also in jumbo mtu mode
	r8169: don't advertise pause in jumbo mode
	bpf: Ensure off_reg has no mixed signed bounds for all types
	bpf: Move off_reg into sanitize_ptr_alu
	ARM: 9071/1: uprobes: Don't hook on thumb instructions
	arm64: mte: Ensure TIF_MTE_ASYNC_FAULT is set atomically
	bpf: Rework ptr_limit into alu_limit and add common error path
	bpf: Improve verifier error messages for users
	bpf: Move sanitize_val_alu out of op switch
	net: phy: marvell: fix detection of PHY on Topaz switches
	Linux 5.10.32

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If247bf8401509195e4f55f03dcc514f80d467966
2021-04-22 11:12:08 +02:00
Jonathon Reinhart
35d7491e2f net: Make tcp_allowed_congestion_control readonly in non-init netns
commit 97684f0970f6e112926de631fdd98d9693c7e5c1 upstream.

Currently, tcp_allowed_congestion_control is global and writable;
writing to it in any net namespace will leak into all other net
namespaces.

tcp_available_congestion_control and tcp_allowed_congestion_control are
the only sysctls in ipv4_net_table (the per-netns sysctl table) with a
NULL data pointer; their handlers (proc_tcp_available_congestion_control
and proc_allowed_congestion_control) have no other way of referencing a
struct net. Thus, they operate globally.

Because ipv4_net_table does not use designated initializers, there is no
easy way to fix up this one "bad" table entry. However, the data pointer
updating logic shouldn't be applied to NULL pointers anyway, so we
instead force these entries to be read-only.

These sysctls used to exist in ipv4_table (init-net only), but they were
moved to the per-net ipv4_net_table, presumably without realizing that
tcp_allowed_congestion_control was writable and thus introduced a leak.

Because the intent of that commit was only to know (i.e. read) "which
congestion algorithms are available or allowed", this read-only solution
should be sufficient.

The logic added in recent commit
31c4d2f160eb: ("net: Ensure net namespace isolation of sysctls")
does not and cannot check for NULL data pointers, because
other table entries (e.g. /proc/sys/net/netfilter/nf_log/) have
.data=NULL but use other methods (.extra2) to access the struct net.

Fixes: 9cb8e048e5 ("net/ipv4/sysctl: show tcp_{allowed, available}_congestion_control in non-initial netns")
Signed-off-by: Jonathon Reinhart <jonathon.reinhart@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-21 13:00:57 +02:00
Maciej Żenczykowski
8a4b8ea595 ANDROID: net: introduce ip_local_unbindable_ports sysctl
and associated inet_is_local_unbindable_port() helper function:
use it to make explicitly binding to an unbindable port return
-EPERM 'Operation not permitted'.

Autobind doesn't honour this new sysctl since:
  (a) you can simply set both if that's the behaviour you desire
  (b) there could be a use for preventing explicit while allowing auto
  (c) it's faster in the relatively critical path of doing port selection
      during connect() to only check one bitmap instead of both

Various ports may have special use cases which are not suitable for
use by general userspace applications. Currently, ports specified in
ip_local_reserved_ports sysctl will not be returned only in case of
automatic port assignment, but nothing prevents you from explicitly
binding to them - even from an entirely unprivileged process.

In certain cases it is desirable to prevent the host from assigning the
ports even in case of explicit binds, even from superuser processes.

Example use cases might be:
 - a port being stolen by the nic for remote serial console, remote
   power management or some other sort of debugging functionality
   (crash collection, gdb, direct access to some other microcontroller
   on the nic or motherboard, remote management of the nic itself).
 - a transparent proxy where packets are being redirected: in case
   a socket matches this connection, packets from this application
   would be incorrectly sent to one of the endpoints.

Initially I wanted to solve this problem via the simple one line:

static inline bool inet_port_requires_bind_service(struct net *net, unsigned short port) {
-       return port < net->ipv4.sysctl_ip_prot_sock;
+       return port < net->ipv4.sysctl_ip_prot_sock || inet_is_local_reserved_port(net, port);
}

However, this doesn't work for two reasons:
  (a) it changes userspace visible behaviour of the existing local
      reserved ports sysctl, and there appears to be enough documentation
      on the internet talking about setting it to make this a bad idea
  (b) it doesn't prevent privileged apps from using these ports,
      CAP_BIND_SERVICE is relatively likely to be available to, for example,
      a recursive DNS server so it can listed on port 53, which also needs
      to do src port randomization for outgoing queries due to security
      reasons (and it thus does manual port binding).

If we *know* that certain ports are simply unusable, then it's better
nothing even gets the opportunity to try to use them.  This way we at
least get a quick failure, instead of some sort of timeout (or possibly
even corruption of the data stream of the non-kernel based use case).

Test:
  vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports

  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))'
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))'
  vm:~# echo 3967 > /proc/sys/net/ipv4/ip_local_unbindable_ports
  vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports
  3967
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))'
  socket.error: (1, 'Operation not permitted')
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))'
  socket.error: (1, 'Operation not permitted')

Cc: Sean Tranchetti <stranche@codeaurora.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Linux SCTP <linux-sctp@vger.kernel.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Bug: 140404597
Change-Id: Ie96207bea90ae1345adf7b45724d0caf4d6e52c2
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
2021-02-09 15:49:37 +00:00
Wei Wang
ac8f1710c1 tcp: reflect tos value received in SYN to the socket
This commit adds a new TCP feature to reflect the tos value received in
SYN, and send it out on the SYN-ACK, and eventually set the tos value of
the established socket with this reflected tos value. This provides a
way to set the traffic class/QoS level for all traffic in the same
connection to be the same as the incoming SYN request. It could be
useful in data centers to provide equivalent QoS according to the
incoming request.
This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
default turned off.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10 13:15:40 -07:00
Jason Baron
f19008e676 tcp: correct read of TFO keys on big endian systems
When TFO keys are read back on big endian systems either via the global
sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
don't match what was written.

For example, on s390x:

# echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
# cat /proc/sys/net/ipv4/tcp_fastopen_key
02000000-01000000-04000000-03000000

Instead of:

# cat /proc/sys/net/ipv4/tcp_fastopen_key
00000001-00000002-00000003-00000004

Fix this by converting to the correct endianness on read. This was
reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
selftest on s390x, which depends on the read value matching what was
written. I've confirmed that the test now passes on big and little endian
systems.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Fixes: 438ac88009 ("net: fastopen: robustness and endianness fixes for SipHash")
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Eric Dumazet <edumazet@google.com>
Reported-and-tested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10 12:12:35 -07:00
David S. Miller
115506fea4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-01 (v2)

The following pull-request contains BPF updates for your *net-next* tree.

We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).

The main changes are:

1) pulled work.sysctl from vfs tree with sysctl bpf changes.

2) bpf_link observability, from Andrii.

3) BTF-defined map in map, from Andrii.

4) asan fixes for selftests, from Andrii.

5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.

6) production cloudflare classifier as a selftes, from Lorenz.

7) bpf_ktime_get_*_ns() helper improvements, from Maciej.

8) unprivileged bpftool feature probe, from Quentin.

9) BPF_ENABLE_STATS command, from Song.

10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.

11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
 from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 17:02:27 -07:00
Eric Dumazet
a70437cc09 tcp: add hrtimer slack to sack compression
Add a sysctl to control hrtimer slack, default of 100 usec.

This gives the opportunity to reduce system overhead,
and help very short RTT flows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 13:24:01 -07:00
Roopa Prabhu
4f80116d3d net: ipv4: add sysctl for nexthop api compatibility mode
Current route nexthop API maintains user space compatibility
with old route API by default. Dumps and netlink notifications
support both new and old API format. In systems which have
moved to the new API, this compatibility mode cancels some
of the performance benefits provided by the new nexthop API.

This patch adds new sysctl nexthop_compat_mode which is on
by default but provides the ability to turn off compatibility
mode allowing systems to run entirely with the new routing
API. Old route API behaviour and support is not modified by this
sysctl.

Uses a single sysctl to cover both ipv4 and ipv6 following
other sysctls. Covers dumps and delete notifications as
suggested by David Ahern.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 12:50:37 -07:00
Christoph Hellwig
32927393dc sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:40 -04:00
Kuniyuki Iwashima
4b01a96742 tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.
Commit aacd9289af ("tcp: bind() use stronger
condition for bind_conflict") introduced a restriction to forbid to bind
SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
assign ports dispersedly so that we can connect to the same remote host.

The change results in accelerating port depletion so that we fail to bind
sockets to the same local port even if we want to connect to the different
remote hosts.

You can reproduce this issue by following instructions below.

  1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
  2. set SO_REUSEADDR to two sockets.
  3. bind two sockets to (localhost, 0) and the latter fails.

Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
the legacy behaviour to enable the SO_REUSEADDR option and make it possible
to connect to different remote (addr, port) tuples.

This patch allows us to bind SO_REUSEADDR enabled sockets to the same
(addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all
ephemeral ports are exhausted. This also allows connect() and listen() to
share ports in the following way and may break some applications. So the
ip_autobind_reuse is 0 by default and disables the feature.

  1. setsockopt(sk1, SO_REUSEADDR)
  2. setsockopt(sk2, SO_REUSEADDR)
  3. bind(sk1, saddr, 0)
  4. bind(sk2, saddr, 0)
  5. connect(sk1, daddr)
  6. listen(sk2)

If it is set 1, we can fully utilize the 4-tuples, but we should use
IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible.

The notable thing is that if all sockets bound to the same port have
both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
ephemeral port and also do listen().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 12:08:09 -07:00
Christian Brauner
9cb8e048e5 net/ipv4/sysctl: show tcp_{allowed, available}_congestion_control in non-initial netns
It is currenty possible to switch the TCP congestion control algorithm
in non-initial network namespaces:

unshare -U --map-root --net --fork --pid --mount-proc
echo "reno" > /proc/sys/net/ipv4/tcp_congestion_control

works just fine. But currently non-initial network namespaces have no
way of kowing which congestion algorithms are available or allowed other
than through trial and error by writing the names of the algorithms into
the aforementioned file.
Since we already allow changing the congestion algorithm in non-initial
network namespaces by exposing the tcp_congestion_control file there is
no reason to not also expose the
tcp_{allowed,available}_congestion_control files to non-initial network
namespaces. After this change a container with a separate network
namespace will show:

root@f1:~# ls -al /proc/sys/net/ipv4/tcp_* | grep congestion
-rw-r--r-- 1 root root 0 Feb 19 11:54 /proc/sys/net/ipv4/tcp_allowed_congestion_control
-r--r--r-- 1 root root 0 Feb 19 11:54 /proc/sys/net/ipv4/tcp_available_congestion_control
-rw-r--r-- 1 root root 0 Feb 19 11:54 /proc/sys/net/ipv4/tcp_congestion_control

Link: https://github.com/lxc/lxc/issues/3267
Reported-by: Haw Loeung <haw.loeung@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 11:04:31 -08:00
Kevin(Yudong) Yang
65e6d90168 net-tcp: Disable TCP ssthresh metrics cache by default
This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save"
that disables TCP ssthresh metrics cache by default. Other parts of TCP
metrics cache, e.g. rtt, cwnd, remain unchanged.

As modern networks becoming more and more dynamic, TCP metrics cache
today often causes more harm than benefits. For example, the same IP
address is often shared by different subscribers behind NAT in residential
networks. Even if the IP address is not shared by different users,
caching the slow-start threshold of a previous short flow using loss-based
congestion control (e.g. cubic) often causes the future longer flows of
the same network path to exit slow-start prematurely with abysmal
throughput.

Caching ssthresh is very risky and can lead to terrible performance.
Therefore it makes sense to make disabling ssthresh caching by
default and opt-in for specific networks by the administrators.
This practice also has worked well for several years of deployment with
CUBIC congestion control at Google.

Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09 20:17:48 -08:00
Jakub Kicinski
a9f852e92e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f5489 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b0 ("s390/qeth: consolidate some duplicated HW cmd code").

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22 16:27:24 -08:00
Hangbin Liu
9bb59a21f5 tcp: warn if offset reach the maxlen limit when using snprintf
snprintf returns the number of chars that would be written, not number
of chars that were actually written. As such, 'offs' may get larger than
'tbl.maxlen', causing the 'tbl.maxlen - offs' being < 0, and since the
parameter is size_t, it would overflow.

Since using scnprintf may hide the limit error, while the buffer is still
enough now, let's just add a WARN_ON_ONCE in case it reach the limit
in future.

v2: Use WARN_ON_ONCE as Jiri and Eric suggested.

Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20 22:23:36 -08:00
Marcelo Ricardo Leitner
ca749bbb10 net/ipv4: fix sysctl max for fib_multipath_hash_policy
Commit eec4844fae ("proc/sysctl: add shared variables for range
check") did:
-               .extra2         = &two,
+               .extra2         = SYSCTL_ONE,
here, which doesn't seem to be intentional, given the changelog.
This patch restores it to the previous, as the value of 2 still makes
sense (used in fib_multipath_hash()).

Fixes: eec4844fae ("proc/sysctl: add shared variables for range check")
Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-18 17:25:36 -08:00
Josh Hunt
c04b79b6cf tcp: add new tcp_mtu_probe_floor sysctl
The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.

Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.

The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:03:30 -07:00
Matteo Croce
eec4844fae proc/sysctl: add shared variables for range check
In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range.  This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data                                         old     new   delta
    sysctl_vals                                    -      12     +12
    __kstrtab_sysctl_vals                          -      12     +12
    max                                           14      10      -4
    int_max                                       16       -     -16
    one                                           68       -     -68
    zero                                         128      28    -100
    Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
  Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
  Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Ard Biesheuvel
438ac88009 net: fastopen: robustness and endianness fixes for SipHash
Some changes to the TCP fastopen code to make it more robust
against future changes in the choice of key/cookie size, etc.

- Instead of keeping the SipHash key in an untyped u8[] buffer
  and casting it to the right type upon use, use the correct
  type directly. This ensures that the key will appear at the
  correct alignment if we ever change the way these data
  structures are allocated. (Currently, they are only allocated
  via kmalloc so they always appear at the correct alignment)

- Use DIV_ROUND_UP when sizing the u64[] array to hold the
  cookie, so it is always of sufficient size, even if
  TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.

- Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
  function, which is no longer used.

- Add endian swabbing when setting the keys and calculating the hash,
  to ensure that cookie values are the same for a given key and
  source/destination address pair regardless of the endianness of
  the server.

Note that none of these are functional changes wrt the current
state of the code, with the exception of the swabbing, which only
affects big endian systems.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22 16:30:37 -07:00
David S. Miller
13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
David S. Miller
4fddbf8a99 Merge branch 'tcp-fixes'
Eric Dumazet says:

====================
tcp: make sack processing more robust

Jonathan Looney brought to our attention multiple problems
in TCP stack at the sender side.

SACK processing can be abused by malicious peers to either
cause overflows, or increase of memory usage.

First two patches fix the immediate problems.

Since the malicious peers abuse senders by advertizing a very
small MSS in their SYN or SYNACK packet, the last two
patches add a new sysctl so that admins can chose a higher
limit for MSS clamping.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 10:39:56 -07:00
Eric Dumazet
2e05fcae83 tcp: fix compile error if !CONFIG_SYSCTL
tcp_tx_skb_cache_key and tcp_rx_skb_cache_key must be available
even if CONFIG_SYSCTL is not set.

Fixes: 0b7d7f6b22 ("tcp: add tcp_tx_skb_cache sysctl")
Fixes: ede61ca474 ("tcp: add tcp_rx_skb_cache sysctl")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16 14:15:07 -07:00
Eric Dumazet
5f3e2bf008 tcp: add tcp_min_snd_mss sysctl
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.

This forces the stack to send packets with a very high network/cpu
overhead.

Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.

In some cases, it can be useful to increase the minimal value
to a saner value.

We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.

Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f1 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.

We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.

CVE-2019-11479 -- tcp mss hardcoded to 48

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15 18:47:31 -07:00
Eric Dumazet
0b7d7f6b22 tcp: add tcp_tx_skb_cache sysctl
Feng Tang reported a performance regression after introduction
of per TCP socket tx/rx caches, for TCP over loopback (netperf)

There is high chance the regression is caused by a change on
how well the 32 KB per-thread page (current->task_frag) can
be recycled, and lack of pcp caches for order-3 pages.

I could not reproduce the regression myself, cpus all being
spinning on the mm spinlocks for page allocs/freeing, regardless
of enabling or disabling the per tcp socket caches.

It seems best to disable the feature by default, and let
admins enabling it.

MM layer either needs to provide scalable order-3 pages
allocations, or could attempt a trylock on zone->lock if
the caller only attempts to get a high-order page and is
able to fallback to order-0 ones in case of pressure.

Tests run on a 56 cores host (112 hyper threads)

-	35.49%	netperf 		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
   - 35.49% queued_spin_lock_slowpath
	  - 18.18% get_page_from_freelist
		 - __alloc_pages_nodemask
			- 18.18% alloc_pages_current
				 skb_page_frag_refill
				 sk_page_frag_refill
				 tcp_sendmsg_locked
				 tcp_sendmsg
				 inet_sendmsg
				 sock_sendmsg
				 __sys_sendto
				 __x64_sys_sendto
				 do_syscall_64
				 entry_SYSCALL_64_after_hwframe
				 __libc_send
	  + 17.31% __free_pages_ok
+	31.43%	swapper 		 [kernel.vmlinux]	  [k] intel_idle
+	 9.12%	netperf 		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
+	 6.53%	netserver		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
+	 0.69%	netserver		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
+	 0.68%	netperf 		 [kernel.vmlinux]	  [k] skb_release_data
+	 0.52%	netperf 		 [kernel.vmlinux]	  [k] tcp_sendmsg_locked
	 0.46%	netperf 		 [kernel.vmlinux]	  [k] _raw_spin_lock_irqsave

Fixes: 472c2e07ee ("tcp: add one skb cache for tx")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14 20:18:28 -07:00
Eric Dumazet
ede61ca474 tcp: add tcp_rx_skb_cache sysctl
Instead of relying on rps_needed, it is safer to use a separate
static key, since we do not want to enable TCP rx_skb_cache
by default. This feature can cause huge increase of memory
usage on hosts with millions of sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14 20:18:28 -07:00
Stephen Suryaputra
363887a2cd ipv4: Support multipath hashing on inner IP pkts for GRE tunnel
Multipath hash policy value of 0 isn't distributing since the outer IP
dest and src aren't varied eventhough the inner ones are. Since the flow
is on the inner ones in the case of tunneled traffic, hashing on them is
desired.

This is done mainly for IP over GRE, hence only tested for that. But
anything else supported by flow dissection should work.

v2: Use skb_flow_dissect_flow_keys() directly so that other tunneling
    can be supported through flow dissection (per Nikolay Aleksandrov).
v3: Remove accidental inclusion of ports in the hash keys and clarify
    the documentation (Nikolay Alexandrov).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14 19:42:35 -07:00
Jason Baron
aa1236cdfa tcp: add support for optional TFO backup key to net.ipv4.tcp_fastopen_key
Add the ability to add a backup TFO key as:

# echo "x-x-x-x,x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key

The key before the comma acks as the primary TFO key and the key after the
comma is the backup TFO key. This change is intended to be backwards
compatible since if only one key is set, userspace will simply read back
that single key as follows:

# echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key
# cat /proc/sys/net/ipv4/tcp_fastopen_key
x-x-x-x

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 13:41:26 -07:00
Jason Baron
9092a76d3c tcp: add backup TFO key infrastructure
We would like to be able to rotate TFO keys while minimizing the number of
client cookies that are rejected. Currently, we have only one key which can
be used to generate and validate cookies, thus if we simply replace this
key clients can easily have cookies rejected upon rotation.

We propose having the ability to have both a primary key and a backup key.
The primary key is used to generate as well as to validate cookies.
The backup is only used to validate cookies. Thus, keys can be rotated as:

1) generate new key
2) add new key as the backup key
3) swap the primary and backup key, thus setting the new key as the primary

We don't simply set the new key as the primary key and move the old key to
the backup slot because the ip may be behind a load balancer and we further
allow for the fact that all machines behind the load balancer will not be
updated simultaneously.

We make use of this infrastructure in subsequent patches.

Suggested-by: Igor Lubashev <ilubashe@akamai.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 13:41:26 -07:00
David S. Miller
8b44836583 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two easy cases of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25 23:52:29 -04:00
ZhangXiaoxu
19fad20d15 ipv4: set the tcp_min_rtt_wlen range from 0 to one day
There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
 <IRQ>
 dump_stack+0x8c/0xba
 ubsan_epilogue+0x11/0x60
 handle_overflow+0x12d/0x170
 ? ttwu_do_wakeup+0x21/0x320
 __ubsan_handle_mul_overflow+0x12/0x20
 tcp_ack_update_rtt+0x76c/0x780
 tcp_clean_rtx_queue+0x499/0x14d0
 tcp_ack+0x69e/0x1240
 ? __wake_up_sync_key+0x2c/0x50
 ? update_group_capacity+0x50/0x680
 tcp_rcv_established+0x4e2/0xe10
 tcp_v4_do_rcv+0x22b/0x420
 tcp_v4_rcv+0xfe8/0x1190
 ip_protocol_deliver_rcu+0x36/0x180
 ip_local_deliver+0x15b/0x1a0
 ip_rcv+0xac/0xd0
 __netif_receive_skb_one_core+0x7f/0xb0
 __netif_receive_skb+0x33/0xc0
 netif_receive_skb_internal+0x84/0x1c0
 napi_gro_receive+0x2a0/0x300
 receive_buf+0x3d4/0x2350
 ? detach_buf_split+0x159/0x390
 virtnet_poll+0x198/0x840
 ? reweight_entity+0x243/0x4b0
 net_rx_action+0x25c/0x770
 __do_softirq+0x19b/0x66d
 irq_exit+0x1eb/0x230
 do_IRQ+0x7a/0x150
 common_interrupt+0xf/0xf
 </IRQ>

It can be reproduced by:
  echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen

Fixes: f672258391 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17 13:57:11 -07:00
David Ahern
9ab948a91b ipv4: Allow amount of dirty memory from fib resizing to be controllable
fib_trie implementation calls synchronize_rcu when a certain amount of
pages are dirty from freed entries. The number of pages was determined
experimentally in 2009 (commit c3059477fc).

At the current setting, synchronize_rcu is called often -- 51 times in a
second in one test with an average of an 8 msec delay adding a fib entry.
The total impact is a lot of slow down modifying the fib. This is seen
in the output of 'time' - the difference between real time and sys+user.
For example, using 720,022 single path routes and 'ip -batch'[1]:

    $ time ./ip -batch ipv4/routes-1-hops
    real    0m14.214s
    user    0m2.513s
    sys     0m6.783s

So roughly 35% of the actual time to install the routes is from the ip
command getting scheduled out, most notably due to synchronize_rcu (this
is observed using 'perf sched timehist').

This patch makes the amount of dirty memory configurable between 64k where
the synchronize_rcu is called often (small, low end systems that are memory
sensitive) to 64M where synchronize_rcu is called rarely during a large
FIB change (for high end systems with lots of memory). The default is 512kB
which corresponds to the current setting of 128 pages with a 4kB page size.

As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
in a second blocking for up to 30 msec in a single instance, and a total
of almost 100 msec across the 4 calls in the second. The trade off is
allowing FIB entries to consume more memory in a given time window but
but with much better fib insertion rates (~30% increase in prefixes/sec).
With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
file runs in:

    $ time ./ip -batch ipv4/routes-1-hops
    real    0m9.692s
    user    0m2.491s
    sys     0m6.769s

So the dead time is reduced to about 1/2 second or <5% of the real time.

[1] 'ip' modified to not request ACK messages which improves route
    insertion times by about 20%

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:29:53 -07:00
Mike Manning
6897445fb1 net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
for datagram sockets. Have this default to enabled for reasons of
backwards compatibility. This is so as to specify the output device
with cmsg and IP_PKTINFO, but using a socket not bound to the
corresponding VRF. This allows e.g. older ping implementations to be
run with specifying the device but without executing it in the VRF.
If the option is disabled, packets received in a VRF context are only
handled by a raw socket bound to the VRF, and correspondingly packets
in the default VRF are only handled by a socket not bound to any VRF.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:12:38 -08:00
Maciej Żenczykowski
d4ce58082f net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int
(fix documentation and sysctl access to treat it as such)

Tested:
  # zcat /proc/config.gz | egrep ^CONFIG_HZ
  CONFIG_HZ_1000=y
  CONFIG_HZ=1000
  # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294968
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
  # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294967
  # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
  # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
  -1
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:33:21 -07:00
Petr Machata
d18c5d1995 net: ipv4: Notify about changes to ip_forward_update_priority
Drivers may make offloading decision based on whether
ip_forward_update_priority is enabled or not. Therefore distribute
netevent notifications to give them a chance to react to a change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:52:30 -07:00
Petr Machata
432e05d328 net: ipv4: Control SKB reprioritization after forwarding
After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.

Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.

Therefore introduce a sysctl that controls this behavior. Keep the
default value at 1 to maintain backward-compatible behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:52:30 -07:00