Commit Graph

218 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
c553d9a246 Merge 5.10.80 into android12-5.10-lts
Changes in 5.10.80
	xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
	usb: xhci: Enable runtime-pm by default on AMD Yellow Carp platform
	binder: use euid from cred instead of using task
	binder: use cred instead of task for selinux checks
	binder: use cred instead of task for getsecid
	Input: iforce - fix control-message timeout
	Input: elantench - fix misreporting trackpoint coordinates
	Input: i8042 - Add quirk for Fujitsu Lifebook T725
	libata: fix read log timeout value
	ocfs2: fix data corruption on truncate
	scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd()
	scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file
	scsi: qla2xxx: Fix use after free in eh_abort path
	mmc: mtk-sd: Add wait dma stop done flow
	mmc: dw_mmc: Dont wait for DRTO on Write RSP error
	exfat: fix incorrect loading of i_blocks for large files
	parisc: Fix set_fixmap() on PA1.x CPUs
	parisc: Fix ptrace check on syscall return
	tpm: Check for integer overflow in tpm2_map_response_body()
	firmware/psci: fix application of sizeof to pointer
	crypto: s5p-sss - Add error handling in s5p_aes_probe()
	media: rkvdec: Do not override sizeimage for output format
	media: ite-cir: IR receiver stop working after receive overflow
	media: rkvdec: Support dynamic resolution changes
	media: ir-kbd-i2c: improve responsiveness of hauppauge zilog receivers
	media: v4l2-ioctl: Fix check_ext_ctrls
	ALSA: hda/realtek: Fix mic mute LED for the HP Spectre x360 14
	ALSA: hda/realtek: Add a quirk for HP OMEN 15 mute LED
	ALSA: hda/realtek: Add quirk for Clevo PC70HS
	ALSA: hda/realtek: Headset fixup for Clevo NH77HJQ
	ALSA: hda/realtek: Add a quirk for Acer Spin SP513-54N
	ALSA: hda/realtek: Add quirk for ASUS UX550VE
	ALSA: hda/realtek: Add quirk for HP EliteBook 840 G7 mute LED
	ALSA: ua101: fix division by zero at probe
	ALSA: 6fire: fix control and bulk message timeouts
	ALSA: line6: fix control and interrupt message timeouts
	ALSA: usb-audio: Line6 HX-Stomp XL USB_ID for 48k-fixed quirk
	ALSA: usb-audio: Add registration quirk for JBL Quantum 400
	ALSA: hda: Free card instance properly at probe errors
	ALSA: synth: missing check for possible NULL after the call to kstrdup
	ALSA: timer: Fix use-after-free problem
	ALSA: timer: Unconditionally unlink slave instances, too
	ext4: fix lazy initialization next schedule time computation in more granular unit
	ext4: ensure enough credits in ext4_ext_shift_path_extents
	ext4: refresh the ext4_ext_path struct after dropping i_data_sem.
	fuse: fix page stealing
	x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
	x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
	x86/irq: Ensure PI wakeup handler is unregistered before module unload
	ASoC: soc-core: fix null-ptr-deref in snd_soc_del_component_unlocked()
	ALSA: hda/realtek: Fixes HP Spectre x360 15-eb1xxx speakers
	cavium: Return negative value when pci_alloc_irq_vectors() fails
	scsi: qla2xxx: Return -ENOMEM if kzalloc() fails
	scsi: qla2xxx: Fix unmap of already freed sgl
	mISDN: Fix return values of the probe function
	cavium: Fix return values of the probe function
	sfc: Export fibre-specific supported link modes
	sfc: Don't use netif_info before net_device setup
	hyperv/vmbus: include linux/bitops.h
	ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode
	reset: socfpga: add empty driver allowing consumers to probe
	mmc: winbond: don't build on M68K
	drm: panel-orientation-quirks: Add quirk for Aya Neo 2021
	fcnal-test: kill hanging ping/nettest binaries on cleanup
	bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
	bpf: Prevent increasing bpf_jit_limit above max
	gpio: mlxbf2.c: Add check for bgpio_init failure
	xen/netfront: stop tx queues during live migration
	nvmet-tcp: fix a memory leak when releasing a queue
	spi: spl022: fix Microwire full duplex mode
	net: multicast: calculate csum of looped-back and forwarded packets
	watchdog: Fix OMAP watchdog early handling
	drm: panel-orientation-quirks: Add quirk for GPD Win3
	block: schedule queue restart after BLK_STS_ZONE_RESOURCE
	nvmet-tcp: fix header digest verification
	r8169: Add device 10ec:8162 to driver r8169
	vmxnet3: do not stop tx queues after netif_device_detach()
	nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
	net/smc: Fix smc_link->llc_testlink_time overflow
	net/smc: Correct spelling mistake to TCPF_SYN_RECV
	rds: stop using dmapool
	btrfs: clear MISSING device status bit in btrfs_close_one_device
	btrfs: fix lost error handling when replaying directory deletes
	btrfs: call btrfs_check_rw_degradable only if there is a missing device
	KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup
	ia64: kprobes: Fix to pass correct trampoline address to the handler
	selinux: fix race condition when computing ocontext SIDs
	hwmon: (pmbus/lm25066) Add offset coefficients
	regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is disabled
	regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-dvs-idx property
	EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
	mwifiex: fix division by zero in fw download path
	ath6kl: fix division by zero in send path
	ath6kl: fix control-message timeout
	ath10k: fix control-message timeout
	ath10k: fix division by zero in send path
	PCI: Mark Atheros QCA6174 to avoid bus reset
	rtl8187: fix control-message timeouts
	evm: mark evm_fixmode as __ro_after_init
	ifb: Depend on netfilter alternatively to tc
	wcn36xx: Fix HT40 capability for 2Ghz band
	wcn36xx: Fix tx_status mechanism
	wcn36xx: Fix (QoS) null data frame bitrate/modulation
	PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
	mwifiex: Read a PCI register after writing the TX ring write pointer
	mwifiex: Try waking the firmware until we get an interrupt
	libata: fix checking of DMA state
	wcn36xx: handle connection loss indication
	rsi: fix occasional initialisation failure with BT coex
	rsi: fix key enabled check causing unwanted encryption for vap_id > 0
	rsi: fix rate mask set leading to P2P failure
	rsi: Fix module dev_oper_mode parameter description
	perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
	perf/x86/intel/uncore: Fix Intel ICX IIO event constraints
	RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
	signal: Remove the bogus sigkill_pending in ptrace_stop
	memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode
	signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
	soc: fsl: dpio: replace smp_processor_id with raw_smp_processor_id
	soc: fsl: dpio: use the combined functions to protect critical zone
	mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines
	power: supply: max17042_battery: Prevent int underflow in set_soc_threshold
	power: supply: max17042_battery: use VFSOC for capacity when no rsns
	KVM: arm64: Extract ESR_ELx.EC only
	KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
	can: j1939: j1939_tp_cmd_recv(): ignore abort message in the BAM transport
	can: j1939: j1939_can_recv(): ignore messages with invalid source address
	powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
	ring-buffer: Protect ring_buffer_reset() from reentrancy
	serial: core: Fix initializing and restoring termios speed
	ifb: fix building without CONFIG_NET_CLS_ACT
	ALSA: mixer: oss: Fix racy access to slots
	ALSA: mixer: fix deadlock in snd_mixer_oss_set_volume
	xen/balloon: add late_initcall_sync() for initial ballooning done
	ovl: fix use after free in struct ovl_aio_req
	PCI: pci-bridge-emul: Fix emulation of W1C bits
	PCI: cadence: Add cdns_plat_pcie_probe() missing return
	PCI: aardvark: Do not clear status bits of masked interrupts
	PCI: aardvark: Fix checking for link up via LTSSM state
	PCI: aardvark: Do not unmask unused interrupts
	PCI: aardvark: Fix reporting Data Link Layer Link Active
	PCI: aardvark: Fix configuring Reference clock
	PCI: aardvark: Fix return value of MSI domain .alloc() method
	PCI: aardvark: Read all 16-bits from PCIE_MSI_PAYLOAD_REG
	PCI: aardvark: Fix support for bus mastering and PCI_COMMAND on emulated bridge
	PCI: aardvark: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
	PCI: aardvark: Set PCI Bridge Class Code to PCI Bridge
	PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge
	quota: check block number when reading the block in quota file
	quota: correct error number in free_dqentry()
	pinctrl: core: fix possible memory leak in pinctrl_enable()
	coresight: cti: Correct the parameter for pm_runtime_put
	iio: dac: ad5446: Fix ad5622_write() return value
	iio: ad5770r: make devicetree property reading consistent
	USB: serial: keyspan: fix memleak on probe errors
	serial: 8250: fix racy uartclk update
	most: fix control-message timeouts
	USB: iowarrior: fix control-message timeouts
	USB: chipidea: fix interrupt deadlock
	power: supply: max17042_battery: Clear status bits in interrupt handler
	dma-buf: WARN on dmabuf release with pending attachments
	drm: panel-orientation-quirks: Update the Lenovo Ideapad D330 quirk (v2)
	drm: panel-orientation-quirks: Add quirk for KD Kurio Smart C15200 2-in-1
	drm: panel-orientation-quirks: Add quirk for the Samsung Galaxy Book 10.6
	Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()
	Bluetooth: fix use-after-free error in lock_sock_nested()
	drm/panel-orientation-quirks: add Valve Steam Deck
	rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
	platform/x86: wmi: do not fail if disabling fails
	MIPS: lantiq: dma: add small delay after reset
	MIPS: lantiq: dma: reset correct number of channel
	locking/lockdep: Avoid RCU-induced noinstr fail
	net: sched: update default qdisc visibility after Tx queue cnt changes
	rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
	smackfs: Fix use-after-free in netlbl_catmap_walk()
	ath11k: Align bss_chan_info structure with firmware
	x86: Increase exception stack sizes
	mwifiex: Run SET_BSS_MODE when changing from P2P to STATION vif-type
	mwifiex: Properly initialize private structure on interface type changes
	fscrypt: allow 256-bit master keys with AES-256-XTS
	drm/amdgpu: Fix MMIO access page fault
	ath11k: Avoid reg rules update during firmware recovery
	ath11k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED
	ath11k: Change DMA_FROM_DEVICE to DMA_TO_DEVICE when map reinjected packets
	ath10k: high latency fixes for beacon buffer
	media: mt9p031: Fix corrupted frame after restarting stream
	media: netup_unidvb: handle interrupt properly according to the firmware
	media: atomisp: Fix error handling in probe
	media: stm32: Potential NULL pointer dereference in dcmi_irq_thread()
	media: uvcvideo: Set capability in s_param
	media: uvcvideo: Return -EIO for control errors
	media: uvcvideo: Set unique vdev name based in type
	media: s5p-mfc: fix possible null-pointer dereference in s5p_mfc_probe()
	media: s5p-mfc: Add checking to s5p_mfc_probe().
	media: imx: set a media_device bus_info string
	media: mceusb: return without resubmitting URB in case of -EPROTO error.
	ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
	rtw88: fix RX clock gate setting while fifo dump
	brcmfmac: Add DMI nvram filename quirk for Cyberbook T116 tablet
	media: rcar-csi2: Add checking to rcsi2_start_receiver()
	ipmi: Disable some operations during a panic
	fs/proc/uptime.c: Fix idle time reporting in /proc/uptime
	ACPICA: Avoid evaluating methods too early during system resume
	media: ipu3-imgu: imgu_fmt: Handle properly try
	media: ipu3-imgu: VIDIOC_QUERYCAP: Fix bus_info
	media: usb: dvd-usb: fix uninit-value bug in dibusb_read_eeprom_byte()
	net-sysfs: try not to restart the syscall if it will fail eventually
	tracefs: Have tracefs directories not set OTH permission bits by default
	ath: dfs_pattern_detector: Fix possible null-pointer dereference in channel_detector_create()
	mmc: moxart: Fix reference count leaks in moxart_probe
	iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
	ACPI: battery: Accept charges over the design capacity as full
	drm/amdkfd: fix resume error when iommu disabled in Picasso
	net: phy: micrel: make *-skew-ps check more lenient
	leaking_addresses: Always print a trailing newline
	drm/msm: prevent NULL dereference in msm_gpu_crashstate_capture()
	block: bump max plugged deferred size from 16 to 32
	md: update superblock after changing rdev flags in state_store
	memstick: r592: Fix a UAF bug when removing the driver
	lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
	lib/xz: Validate the value before assigning it to an enum variable
	workqueue: make sysfs of unbound kworker cpumask more clever
	tracing/cfi: Fix cmp_entries_* functions signature mismatch
	mt76: mt7915: fix an off-by-one bound check
	mwl8k: Fix use-after-free in mwl8k_fw_state_machine()
	block: remove inaccurate requeue check
	media: allegro: ignore interrupt if mailbox is not initialized
	nvmet: fix use-after-free when a port is removed
	nvmet-rdma: fix use-after-free when a port is removed
	nvmet-tcp: fix use-after-free when a port is removed
	nvme: drop scan_lock and always kick requeue list when removing namespaces
	PM: hibernate: Get block device exclusively in swsusp_check()
	selftests: kvm: fix mismatched fclose() after popen()
	selftests/bpf: Fix perf_buffer test on system with offline cpus
	iwlwifi: mvm: disable RX-diversity in powersave
	smackfs: use __GFP_NOFAIL for smk_cipso_doi()
	ARM: clang: Do not rely on lr register for stacktrace
	gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
	gfs2: Cancel remote delete work asynchronously
	gfs2: Fix glock_hash_walk bugs
	ARM: 9136/1: ARMv7-M uses BE-8, not BE-32
	vrf: run conntrack only in context of lower/physdev for locally generated packets
	net: annotate data-race in neigh_output()
	ACPI: AC: Quirk GK45 to skip reading _PSR
	btrfs: reflink: initialize return value to 0 in btrfs_extent_same()
	btrfs: do not take the uuid_mutex in btrfs_rm_device
	spi: bcm-qspi: Fix missing clk_disable_unprepare() on error in bcm_qspi_probe()
	wcn36xx: Correct band/freq reporting on RX
	x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
	drm/amd/display: dcn20_resource_construct reduce scope of FPU enabled
	selftests/core: fix conflicting types compile error for close_range()
	parisc: fix warning in flush_tlb_all
	task_stack: Fix end_of_stack() for architectures with upwards-growing stack
	erofs: don't trigger WARN() when decompression fails
	parisc/unwind: fix unwinder when CONFIG_64BIT is enabled
	parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling
	netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream state
	selftests/bpf: Fix strobemeta selftest regression
	Bluetooth: fix init and cleanup of sco_conn.timeout_work
	rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
	MIPS: lantiq: dma: fix burst length for DEU
	objtool: Add xen_start_kernel() to noreturn list
	x86/xen: Mark cpu_bringup_and_idle() as dead_end_function
	objtool: Fix static_call list generation
	drm/v3d: fix wait for TMU write combiner flush
	virtio-gpu: fix possible memory allocation failure
	lockdep: Let lock_is_held_type() detect recursive read as read
	net: net_namespace: Fix undefined member in key_remove_domain()
	cgroup: Make rebind_subsystems() disable v2 controllers all at once
	wcn36xx: Fix Antenna Diversity Switching
	wilc1000: fix possible memory leak in cfg_scan_result()
	Bluetooth: btmtkuart: fix a memleak in mtk_hci_wmt_sync
	crypto: caam - disable pkc for non-E SoCs
	rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()
	net: dsa: rtl8366rb: Fix off-by-one bug
	ath11k: fix some sleeping in atomic bugs
	ath11k: Avoid race during regd updates
	ath11k: fix packet drops due to incorrect 6 GHz freq value in rx status
	ath11k: Fix memory leak in ath11k_qmi_driver_event_work
	ath10k: Fix missing frame timestamp for beacon/probe-resp
	ath10k: sdio: Add missing BH locking around napi_schdule()
	drm/ttm: stop calling tt_swapin in vm_access
	arm64: mm: update max_pfn after memory hotplug
	drm/amdgpu: fix warning for overflow check
	media: em28xx: add missing em28xx_close_extension
	media: cxd2880-spi: Fix a null pointer dereference on error handling path
	media: dvb-usb: fix ununit-value in az6027_rc_query
	media: v4l2-ioctl: S_CTRL output the right value
	media: TDA1997x: handle short reads of hdmi info frame.
	media: mtk-vpu: Fix a resource leak in the error handling path of 'mtk_vpu_probe()'
	media: radio-wl1273: Avoid card name truncation
	media: si470x: Avoid card name truncation
	media: tm6000: Avoid card name truncation
	media: cx23885: Fix snd_card_free call on null card pointer
	kprobes: Do not use local variable when creating debugfs file
	crypto: ecc - fix CRYPTO_DEFAULT_RNG dependency
	cpuidle: Fix kobject memory leaks in error paths
	media: em28xx: Don't use ops->suspend if it is NULL
	ath9k: Fix potential interrupt storm on queue reset
	PM: EM: Fix inefficient states detection
	EDAC/amd64: Handle three rank interleaving mode
	rcu: Always inline rcu_dynticks_task*_{enter,exit}()
	netfilter: nft_dynset: relax superfluous check on set updates
	media: dvb-frontends: mn88443x: Handle errors of clk_prepare_enable()
	crypto: qat - detect PFVF collision after ACK
	crypto: qat - disregard spurious PFVF interrupts
	hwrng: mtk - Force runtime pm ops for sleep ops
	b43legacy: fix a lower bounds test
	b43: fix a lower bounds test
	gve: Recover from queue stall due to missed IRQ
	mmc: sdhci-omap: Fix NULL pointer exception if regulator is not configured
	mmc: sdhci-omap: Fix context restore
	memstick: avoid out-of-range warning
	memstick: jmb38x_ms: use appropriate free function in jmb38x_ms_alloc_host()
	net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE
	hwmon: Fix possible memleak in __hwmon_device_register()
	hwmon: (pmbus/lm25066) Let compiler determine outer dimension of lm25066_coeff
	ath10k: fix max antenna gain unit
	kernel/sched: Fix sched_fork() access an invalid sched_task_group
	tcp: switch orphan_count to bare per-cpu counters
	drm/msm: potential error pointer dereference in init()
	drm/msm: uninitialized variable in msm_gem_import()
	net: stream: don't purge sk_error_queue in sk_stream_kill_queues()
	media: ir_toy: assignment to be16 should be of correct type
	mmc: mxs-mmc: disable regulator on error and in the remove function
	platform/x86: thinkpad_acpi: Fix bitwise vs. logical warning
	mt76: mt7615: fix endianness warning in mt7615_mac_write_txwi
	mt76: mt76x02: fix endianness warnings in mt76x02_mac.c
	mt76: mt7915: fix possible infinite loop release semaphore
	mt76: mt7915: fix sta_rec_wtbl tag len
	mt76: mt7915: fix muar_idx in mt7915_mcu_alloc_sta_req()
	rsi: stop thread firstly in rsi_91x_init() error handling
	mwifiex: Send DELBA requests according to spec
	net: enetc: unmap DMA in enetc_send_cmd()
	phy: micrel: ksz8041nl: do not use power down mode
	nvme-rdma: fix error code in nvme_rdma_setup_ctrl
	PM: hibernate: fix sparse warnings
	clocksource/drivers/timer-ti-dm: Select TIMER_OF
	x86/sev: Fix stack type check in vc_switch_off_ist()
	drm/msm: Fix potential NULL dereference in DPU SSPP
	smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi
	KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
	KVM: selftests: Fix nested SVM tests when built with clang
	bpftool: Avoid leaking the JSON writer prepared for program metadata
	libbpf: Fix BTF data layout checks and allow empty BTF
	libbpf: Allow loading empty BTFs
	libbpf: Fix overflow in BTF sanity checks
	libbpf: Fix BTF header parsing checks
	s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
	KVM: s390: pv: avoid double free of sida page
	KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
	irq: mips: avoid nested irq_enter()
	tpm: fix Atmel TPM crash caused by too frequent queries
	tpm_tis_spi: Add missing SPI ID
	libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()
	tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
	spi: spi-rpc-if: Check return value of rpcif_sw_init()
	samples/kretprobes: Fix return value if register_kretprobe() failed
	KVM: s390: Fix handle_sske page fault handling
	libertas_tf: Fix possible memory leak in probe and disconnect
	libertas: Fix possible memory leak in probe and disconnect
	wcn36xx: add proper DMA memory barriers in rx path
	wcn36xx: Fix discarded frames due to wrong sequence number
	drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
	selftests: bpf: Convert sk_lookup ctx access tests to PROG_TEST_RUN
	selftests/bpf: Fix fd cleanup in sk_lookup test
	net: amd-xgbe: Toggle PLL settings during rate change
	net: phylink: avoid mvneta warning when setting pause parameters
	crypto: pcrypt - Delay write to padata->info
	selftests/bpf: Fix fclose/pclose mismatch in test_progs
	udp6: allow SO_MARK ctrl msg to affect routing
	ibmvnic: don't stop queue in xmit
	ibmvnic: Process crqs after enabling interrupts
	cgroup: Fix rootcg cpu.stat guest double counting
	bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
	bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
	of: unittest: fix EXPECT text for gpio hog errors
	iio: st_sensors: Call st_sensors_power_enable() from bus drivers
	iio: st_sensors: disable regulators after device unregistration
	RDMA/rxe: Fix wrong port_cap_flags
	ARM: dts: BCM5301X: Fix memory nodes names
	clk: mvebu: ap-cpu-clk: Fix a memory leak in error handling paths
	ARM: s3c: irq-s3c24xx: Fix return value check for s3c24xx_init_intc()
	arm64: dts: rockchip: Fix GPU register width for RK3328
	ARM: dts: qcom: msm8974: Add xo_board reference clock to DSI0 PHY
	RDMA/bnxt_re: Fix query SRQ failure
	arm64: dts: ti: k3-j721e-main: Fix "max-virtual-functions" in PCIe EP nodes
	arm64: dts: ti: k3-j721e-main: Fix "bus-range" upto 256 bus number for PCIe
	arm64: dts: meson-g12a: Fix the pwm regulator supply properties
	arm64: dts: meson-g12b: Fix the pwm regulator supply properties
	bus: ti-sysc: Fix timekeeping_suspended warning on resume
	ARM: dts: at91: tse850: the emac<->phy interface is rmii
	scsi: dc395: Fix error case unwinding
	MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
	JFS: fix memleak in jfs_mount
	arm64: dts: qcom: msm8916: Fix Secondary MI2S bit clock
	arm64: dts: renesas: beacon: Fix Ethernet PHY mode
	arm64: dts: qcom: pm8916: Remove wrong reg-names for rtc@6000
	ALSA: hda: Reduce udelay() at SKL+ position reporting
	ALSA: hda: Release controller display power during shutdown/reboot
	ALSA: hda: Fix hang during shutdown due to link reset
	ALSA: hda: Use position buffer for SKL+ again
	soundwire: debugfs: use controller id and link_id for debugfs
	scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp()
	driver core: Fix possible memory leak in device_link_add()
	arm: dts: omap3-gta04a4: accelerometer irq fix
	ASoC: SOF: topology: do not power down primary core during topology removal
	soc/tegra: Fix an error handling path in tegra_powergate_power_up()
	memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe
	clk: at91: check pmc node status before registering syscore ops
	video: fbdev: chipsfb: use memset_io() instead of memset()
	powerpc: Refactor is_kvm_guest() declaration to new header
	powerpc: Rename is_kvm_guest() to check_kvm_guest()
	powerpc: Reintroduce is_kvm_guest() as a fast-path check
	powerpc: Fix is_kvm_guest() / kvm_para_available()
	powerpc: fix unbalanced node refcount in check_kvm_guest()
	serial: 8250_dw: Drop wrong use of ACPI_PTR()
	usb: gadget: hid: fix error code in do_config()
	power: supply: rt5033_battery: Change voltage values to µV
	power: supply: max17040: fix null-ptr-deref in max17040_probe()
	scsi: csiostor: Uninitialized data in csio_ln_vnp_read_cbfn()
	RDMA/mlx4: Return missed an error if device doesn't support steering
	usb: musb: select GENERIC_PHY instead of depending on it
	staging: most: dim2: do not double-register the same device
	staging: ks7010: select CRYPTO_HASH/CRYPTO_MICHAEL_MIC
	pinctrl: renesas: checker: Fix off-by-one bug in drive register check
	ARM: dts: stm32: Reduce DHCOR SPI NOR frequency to 50 MHz
	ARM: dts: stm32: fix SAI sub nodes register range
	ARM: dts: stm32: fix AV96 board SAI2 pin muxing on stm32mp15
	ASoC: cs42l42: Correct some register default values
	ASoC: cs42l42: Defer probe if request_threaded_irq() returns EPROBE_DEFER
	soc: qcom: rpmhpd: Provide some missing struct member descriptions
	soc: qcom: rpmhpd: Make power_on actually enable the domain
	usb: typec: STUSB160X should select REGMAP_I2C
	iio: adis: do not disabe IRQs in 'adis_init()'
	scsi: ufs: Refactor ufshcd_setup_clocks() to remove skip_ref_clk
	scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
	serial: imx: fix detach/attach of serial console
	usb: dwc2: drd: fix dwc2_force_mode call in dwc2_ovr_init
	usb: dwc2: drd: fix dwc2_drd_role_sw_set when clock could be disabled
	usb: dwc2: drd: reset current session before setting the new one
	firmware: qcom_scm: Fix error retval in __qcom_scm_is_call_available()
	soc: qcom: apr: Add of_node_put() before return
	pinctrl: equilibrium: Fix function addition in multiple groups
	phy: qcom-qusb2: Fix a memory leak on probe
	phy: ti: gmii-sel: check of_get_address() for failure
	phy: qcom-snps: Correct the FSEL_MASK
	serial: xilinx_uartps: Fix race condition causing stuck TX
	clk: at91: sam9x60-pll: use DIV_ROUND_CLOSEST_ULL
	HID: u2fzero: clarify error check and length calculations
	HID: u2fzero: properly handle timeouts in usb_submit_urb
	powerpc/44x/fsp2: add missing of_node_put
	ASoC: cs42l42: Disable regulators if probe fails
	ASoC: cs42l42: Use device_property API instead of of_property
	ASoC: cs42l42: Correct configuring of switch inversion from ts-inv
	virtio_ring: check desc == NULL when using indirect with packed
	mips: cm: Convert to bitfield API to fix out-of-bounds access
	power: supply: bq27xxx: Fix kernel crash on IRQ handler register error
	apparmor: fix error check
	rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
	nfsd: don't alloc under spinlock in rpc_parse_scope_id
	i2c: mediatek: fixing the incorrect register offset
	NFS: Fix dentry verifier races
	pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
	drm/plane-helper: fix uninitialized variable reference
	PCI: aardvark: Don't spam about PIO Response Status
	PCI: aardvark: Fix preserving PCI_EXP_RTCTL_CRSSVE flag on emulated bridge
	opp: Fix return in _opp_add_static_v2()
	NFS: Fix deadlocks in nfs_scan_commit_list()
	fs: orangefs: fix error return code of orangefs_revalidate_lookup()
	mtd: spi-nor: hisi-sfc: Remove excessive clk_disable_unprepare()
	PCI: uniphier: Serialize INTx masking/unmasking and fix the bit operation
	mtd: core: don't remove debugfs directory if device is in use
	remoteproc: Fix a memory leak in an error handling path in 'rproc_handle_vdev()'
	rtc: rv3032: fix error handling in rv3032_clkout_set_rate()
	dmaengine: at_xdmac: fix AT_XDMAC_CC_PERID() macro
	NFS: Fix up commit deadlocks
	NFS: Fix an Oops in pnfs_mark_request_commit()
	Fix user namespace leak
	auxdisplay: img-ascii-lcd: Fix lock-up when displaying empty string
	auxdisplay: ht16k33: Connect backlight to fbdev
	auxdisplay: ht16k33: Fix frame buffer device blanking
	soc: fsl: dpaa2-console: free buffer before returning from dpaa2_console_read
	netfilter: nfnetlink_queue: fix OOB when mac header was cleared
	dmaengine: dmaengine_desc_callback_valid(): Check for `callback_result`
	signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
	m68k: set a default value for MEMORY_RESERVE
	watchdog: f71808e_wdt: fix inaccurate report in WDIOC_GETTIMEOUT
	ar7: fix kernel builds for compiler test
	scsi: qla2xxx: Changes to support FCP2 Target
	scsi: qla2xxx: Relogin during fabric disturbance
	scsi: qla2xxx: Fix gnl list corruption
	scsi: qla2xxx: Turn off target reset during issue_lip
	NFSv4: Fix a regression in nfs_set_open_stateid_locked()
	i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
	xen-pciback: Fix return in pm_ctrl_init()
	net: davinci_emac: Fix interrupt pacing disable
	ethtool: fix ethtool msg len calculation for pause stats
	openrisc: fix SMP tlb flush NULL pointer dereference
	net: vlan: fix a UAF in vlan_dev_real_dev()
	ice: Fix replacing VF hardware MAC to existing MAC filter
	ice: Fix not stopping Tx queues for VFs
	ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
	drm/nouveau/svm: Fix refcount leak bug and missing check against null bug
	net: phy: fix duplex out of sync problem while changing settings
	bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed
	mfd: core: Add missing of_node_put for loop iteration
	can: mcp251xfd: mcp251xfd_chip_start(): fix error handling for mcp251xfd_chip_rx_int_enable()
	mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
	zram: off by one in read_block_state()
	perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
	llc: fix out-of-bound array index in llc_sk_dev_hash()
	nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails
	arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
	bpf, sockmap: Remove unhash handler for BPF sockmap usage
	bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding
	gve: Fix off by one in gve_tx_timeout()
	seq_file: fix passing wrong private data
	net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any
	net: hns3: fix kernel crash when unload VF while it is being reset
	net: hns3: allow configure ETS bandwidth of all TCs
	net: stmmac: allow a tc-taprio base-time of zero
	vsock: prevent unnecessary refcnt inc for nonblocking connect
	net/smc: fix sk_refcnt underflow on linkdown and fallback
	cxgb4: fix eeprom len when diagnostics not implemented
	selftests/net: udpgso_bench_rx: fix port argument
	ARM: 9155/1: fix early early_iounmap()
	ARM: 9156/1: drop cc-option fallbacks for architecture selection
	parisc: Fix backtrace to always include init funtion names
	MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL
	x86/mce: Add errata workaround for Skylake SKX37
	posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
	irqchip/sifive-plic: Fixup EOI failed when masked
	f2fs: should use GFP_NOFS for directory inodes
	net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE
	9p/net: fix missing error check in p9_check_errors
	memcg: prohibit unconditional exceeding the limit of dying tasks
	powerpc/lib: Add helper to check if offset is within conditional branch range
	powerpc/bpf: Validate branch ranges
	powerpc/security: Add a helper to query stf_barrier type
	powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
	mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
	mm, oom: do not trigger out_of_memory from the #PF
	mfd: dln2: Add cell for initializing DLN2 ADC
	video: backlight: Drop maximum brightness override for brightness zero
	s390/cio: check the subchannel validity for dev_busid
	s390/tape: fix timer initialization in tape_std_assign()
	s390/ap: Fix hanging ioctl caused by orphaned replies
	s390/cio: make ccw_device_dma_* more robust
	mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines
	powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
	powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
	drm/sun4i: Fix macros in sun8i_csc.h
	PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros
	PCI: aardvark: Fix PCIe Max Payload Size setting
	SUNRPC: Partial revert of commit 6f9f17287e
	ath10k: fix invalid dma_addr_t token assignment
	mmc: moxart: Fix null pointer dereference on pointer host
	selftests/bpf: Fix also no-alu32 strobemeta selftest
	arch/cc: Introduce a function to check for confidential computing features
	x86/sev: Add an x86 version of cc_platform_has()
	x86/sev: Make the #VC exception stacks part of the default stacks storage
	soc/tegra: pmc: Fix imbalanced clock disabling in error code path
	Linux 5.10.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I21c750863965fbf584251fa2de3c941ae5922d3f
2021-11-19 11:50:41 +01:00
Waiman Long
ba8ba76885 cgroup: Make rebind_subsystems() disable v2 controllers all at once
[ Upstream commit 7ee285395b211cad474b2b989db52666e0430daf ]

It was found that the following warning was displayed when remounting
controllers from cgroup v2 to v1:

[ 8042.997778] WARNING: CPU: 88 PID: 80682 at kernel/cgroup/cgroup.c:3130 cgroup_apply_control_disable+0x158/0x190
   :
[ 8043.091109] RIP: 0010:cgroup_apply_control_disable+0x158/0x190
[ 8043.096946] Code: ff f6 45 54 01 74 39 48 8d 7d 10 48 c7 c6 e0 46 5a a4 e8 7b 67 33 00 e9 41 ff ff ff 49 8b 84 24 e8 01 00 00 0f b7 40 08 eb 95 <0f> 0b e9 5f ff ff ff 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3
[ 8043.115692] RSP: 0018:ffffba8a47c23d28 EFLAGS: 00010202
[ 8043.120916] RAX: 0000000000000036 RBX: ffffffffa624ce40 RCX: 000000000000181a
[ 8043.128047] RDX: ffffffffa63c43e0 RSI: ffffffffa63c43e0 RDI: ffff9d7284ee1000
[ 8043.135180] RBP: ffff9d72874c5800 R08: ffffffffa624b090 R09: 0000000000000004
[ 8043.142314] R10: ffffffffa624b080 R11: 0000000000002000 R12: ffff9d7284ee1000
[ 8043.149447] R13: ffff9d7284ee1000 R14: ffffffffa624ce70 R15: ffffffffa6269e20
[ 8043.156576] FS:  00007f7747cff740(0000) GS:ffff9d7a5fc00000(0000) knlGS:0000000000000000
[ 8043.164663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8043.170409] CR2: 00007f7747e96680 CR3: 0000000887d60001 CR4: 00000000007706e0
[ 8043.177539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8043.184673] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8043.191804] PKRU: 55555554
[ 8043.194517] Call Trace:
[ 8043.196970]  rebind_subsystems+0x18c/0x470
[ 8043.201070]  cgroup_setup_root+0x16c/0x2f0
[ 8043.205177]  cgroup1_root_to_use+0x204/0x2a0
[ 8043.209456]  cgroup1_get_tree+0x3e/0x120
[ 8043.213384]  vfs_get_tree+0x22/0xb0
[ 8043.216883]  do_new_mount+0x176/0x2d0
[ 8043.220550]  __x64_sys_mount+0x103/0x140
[ 8043.224474]  do_syscall_64+0x38/0x90
[ 8043.228063]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It was caused by the fact that rebind_subsystem() disables
controllers to be rebound one by one. If more than one disabled
controllers are originally from the default hierarchy, it means that
cgroup_apply_control_disable() will be called multiple times for the
same default hierarchy. A controller may be killed by css_kill() in
the first round. In the second round, the killed controller may not be
completely dead yet leading to the warning.

To avoid this problem, we collect all the ssid's of controllers that
needed to be disabled from the default hierarchy and then disable them
in one go instead of one by one.

Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:04:02 +01:00
Greg Kroah-Hartman
a739489620 Merge 5.10.77 into android12-5.10-lts
Changes in 5.10.77
	ARM: 9132/1: Fix __get_user_check failure with ARM KASAN images
	ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned
	ARM: 9134/1: remove duplicate memcpy() definition
	ARM: 9138/1: fix link warning with XIP + frame-pointer
	ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype
	ARM: 9141/1: only warn about XIP address when not compile testing
	io_uring: don't take uring_lock during iowq cancel
	powerpc/bpf: Fix BPF_MOD when imm == 1
	arm64: Avoid premature usercopy failure
	ext4: fix possible UAF when remounting r/o a mmp-protected file system
	usbnet: sanity check for maxpacket
	usbnet: fix error return code in usbnet_probe()
	Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode"
	pinctrl: amd: disable and mask interrupts on probe
	ata: sata_mv: Fix the error handling of mv_chip_id()
	tipc: fix size validations for the MSG_CRYPTO type
	nfc: port100: fix using -ERRNO as command type mask
	Revert "net: mdiobus: Fix memory leak in __mdiobus_register"
	net/tls: Fix flipped sign in tls_err_abort() calls
	mmc: vub300: fix control-message timeouts
	mmc: cqhci: clear HALT state after CQE enable
	mmc: mediatek: Move cqhci init behind ungate clock
	mmc: dw_mmc: exynos: fix the finding clock sample value
	mmc: sdhci: Map more voltage level to SDHCI_POWER_330
	mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning circuit
	ocfs2: fix race between searching chunks and release journal_head from buffer_head
	nvme-tcp: fix H2CData PDU send accounting (again)
	cfg80211: scan: fix RCU in cfg80211_add_nontrans_list()
	cfg80211: fix management registrations locking
	net: lan78xx: fix division by zero in send path
	mm, thp: bail out early in collapse_file for writeback page
	drm/ttm: fix memleak in ttm_transfered_destroy
	drm/amdgpu: fix out of bounds write
	cgroup: Fix memory leak caused by missing cgroup_bpf_offline
	riscv, bpf: Fix potential NULL dereference
	tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
	bpf: Fix potential race in tail call compatibility check
	bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()
	IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields
	IB/hfi1: Fix abba locking issue with sc_disable()
	nvmet-tcp: fix data digest pointer calculation
	nvme-tcp: fix data digest pointer calculation
	nvme-tcp: fix possible req->offset corruption
	octeontx2-af: Display all enabled PF VF rsrc_alloc entries.
	RDMA/mlx5: Set user priority for DCT
	arm64: dts: allwinner: h5: NanoPI Neo 2: Fix ethernet node
	reset: brcmstb-rescal: fix incorrect polarity of status bit
	regmap: Fix possible double-free in regcache_rbtree_exit()
	net: batman-adv: fix error handling
	net-sysfs: initialize uid and gid before calling net_ns_get_ownership
	cfg80211: correct bridge/4addr mode check
	net: Prevent infinite while loop in skb_tx_hash()
	RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a string
	gpio: xgs-iproc: fix parsing of ngpios property
	nios2: Make NIOS2_DTB_SOURCE_BOOL depend on !COMPILE_TEST
	mlxsw: pci: Recycle received packet upon allocation failure
	net: ethernet: microchip: lan743x: Fix driver crash when lan743x_pm_resume fails
	net: ethernet: microchip: lan743x: Fix dma allocation failure by using dma_set_mask_and_coherent
	net: nxp: lpc_eth.c: avoid hang when bringing interface down
	net/tls: Fix flipped sign in async_wait.err assignment
	phy: phy_ethtool_ksettings_get: Lock the phy for consistency
	phy: phy_ethtool_ksettings_set: Move after phy_start_aneg
	phy: phy_start_aneg: Add an unlocked version
	phy: phy_ethtool_ksettings_set: Lock the PHY while changing settings
	sctp: use init_tag from inithdr for ABORT chunk
	sctp: fix the processing for INIT_ACK chunk
	sctp: fix the processing for COOKIE_ECHO chunk
	sctp: add vtag check in sctp_sf_violation
	sctp: add vtag check in sctp_sf_do_8_5_1_E_sa
	sctp: add vtag check in sctp_sf_ootb
	lan743x: fix endianness when accessing descriptors
	KVM: s390: clear kicked_mask before sleeping again
	KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
	scsi: ufs: ufs-exynos: Correct timeout value setting registers
	riscv: fix misalgned trap vector base address
	riscv: Fix asan-stack clang build
	perf script: Check session->header.env.arch before using it
	Linux 5.10.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4cd89af4d20b7a8a1a6d9906233d1aaf026659a8
2021-11-02 20:03:12 +01:00
Quanyang Wang
01599bf7cc cgroup: Fix memory leak caused by missing cgroup_bpf_offline
commit 04f8ef5643bcd8bcde25dfdebef998aea480b2ba upstream.

When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running
the command as below:

    $mount -t cgroup -o none,name=foo cgroup cgroup/
    $umount cgroup/

unreferenced object 0xc3585c40 (size 64):
  comm "mount", pid 425, jiffies 4294959825 (age 31.990s)
  hex dump (first 32 bytes):
    01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00  ......(.........
    00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00  ........lC......
  backtrace:
    [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c
    [<1f03679c>] cgroup_setup_root+0x174/0x37c
    [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0
    [<f85b12fd>] vfs_get_tree+0x24/0x108
    [<f55aec5c>] path_mount+0x384/0x988
    [<e2d5e9cd>] do_mount+0x64/0x9c
    [<208c9cfe>] sys_mount+0xfc/0x1f4
    [<06dd06e0>] ret_fast_syscall+0x0/0x48
    [<a8308cb3>] 0xbeb4daa8

This is because that since the commit 2b0d3d3e4f ("percpu_ref: reduce
memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data
is allocated by the function percpu_ref_init in cgroup_bpf_inherit which
is called by cgroup_setup_root when mounting, but not freed along with
root_cgrp when umounting. Adding cgroup_bpf_offline which calls
percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in
umount path.

This patch also fixes the commit 4bfc0bb2c6 ("bpf: decouple the lifetime
of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a
cleanup that frees the resources which are allocated by cgroup_bpf_inherit
in cgroup_setup_root.

And inside cgroup_bpf_offline, cgroup_get() is at the beginning and
cgroup_put is at the end of cgroup_bpf_release which is called by
cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of
cgroup's refcount.

Fixes: 2b0d3d3e4f ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Fixes: 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211018075623.26884-1-quanyang.wang@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02 19:48:21 +01:00
Greg Kroah-Hartman
b1a6760ddf Merge branch 'android12-5.10' into android12-5.10-lts
Sync up with android12-5.10 for the following commits:

dd139186ef ANDROID: usb: gadget: fix NULL pointer dereference in android_setup
07f65598af ANDROID: GKI: Disable kmem cgroup accounting
309aa7e7a2 FROMLIST: mm, memcg: inline swap-related functions to improve disabled memcg config
3ae8e2f183 BACKPORT: FROMLIST: mm, memcg: inline mem_cgroup_{charge/uncharge} to improve disabled memcg config
f73d029485 FROMLIST: mm, memcg: add mem_cgroup_disabled checks in vmpressure and swap-related functions
669df367a9 UPSTREAM: mm/memcg: bail early from swap accounting if memcg disabled
1f0c32a667 UPSTREAM: procfs/dmabuf: add inode number to /proc/*/fdinfo
0c8c125f57 UPSTREAM: procfs: allow reading fdinfo with PTRACE_MODE_READ
2e0476a465 Revert "FROMLIST: procfs: Allow reading fdinfo with PTRACE_MODE_READ"
5ded961aa2 Revert "FROMLIST: BACKPORT: procfs/dmabuf: Add inode number to /..."
3ee5565017 UPSTREAM: f2fs: initialize page->private when using for our internal use
dba79c3af3 ANDROID: mm: page_pinner: report test_page_isolation_failure
13362ab28e ANDROID: mm: page_pinner: add state of page_pinner
3254948484 ANDROID: mm: page_pinner: add more struct page fields
0445b67bee ANDROID: mm: page_pinner: change timestamp format
71da06728c ANDROID: mm: page_pinner: print_page_pinner refactoring
b83e564914 ANDROID: mm: page_pinner: remove shared_count
849f048050 ANDROID: mm: page_pinner: remove WARN_ON_ONCE
9a453100fc ANDROID: mm: page_pinner: fix typos
d012783a86 ANDROID: mm: page_pinner: reset migration failed page
470cce5085 ANDROID: mm: page_pinner: record every put_page
9f47e5fdda ANDROID: mm: page_pinner: change function names
a8385d61f2 ANDROID: Allow vendor module to reclaim a memcg
f41a95eadc ANDROID: Export memcg functions to allow module to add new files
46bf3b94e7 FROMGIT: dt-bindings: usb: dwc3: Update dwc3 TX fifo properties
b36b813e39 UPSTREAM: dt-bindings: usb: Convert DWC USB3 bindings to DT schema
9a80b7b728 FROMGIT: of: Add stub for of_add_property()
2742be5903 ANDROID: fips140: define fips_enabled to 1 to enable FIPS behavior
e886dd4c33 ANDROID: fips140: unregister existing DRBG algorithms
634445a640 ANDROID: fips140: fix deadlock in unregister_existing_fips140_algos()
0af06624ea ANDROID: fips140: check for errors from initcalls
92de53472e ANDROID: fips140: log already-live algorithms
0a7da21583 ANDROID: Update new mtk gki symbol
98085b5dd8 ANDROID: usb: Add vendor hook for usb suspend and resume
956db89e71 BACKPORT: FROMLIST: dma-heap: Let dma heap use dma_map_attrs to map & unmap iova
749d6e7f2c ANDROID: abi_gki_aarch64_qcom: Add vendor hook for shmem_alloc_page
b05bbe48be ANDROID: abi_gki_aarch64_qcom: Add reclaim_shmem_address_space
d80c70d7a8 ANDROID: android: export kernel function arch_mmap_rnd
25c7eb4932 ANDROID: mm: shmem: Fix build break with allnoconfig
1cdcf76b15 ANDROID: vendor_hooks: add hooks in mem_cgroup subsystem
726468dd4a ANDROID: GKI: add vendor padding variable in struct skb_shared_info
fc79c93657 FROMLIST: scsi: ufs: add quirk to enable host controller without interface configuration
2d5ae6b787 FROMLIST: scsi: ufs: add quirk to handle broken UIC command
38abaebab7 ANDROID: syscall_check: add vendor hook for bpf syscall
a7a3b31d58 ANDROID: syscall_check: add vendor hook for open syscall
a5543c9cd7 ANDROID: syscall_check: add vendor hook for mmap syscall
1f0769279f ANDROID: GKI: Add symbol to symbol list
2cff74e08c ANDROID: vendor_hooks: Add vendor hook to the net
25edba0d4d FROMLIST: scsi: ufs: Fix the SCSI abort handler
c0efdc4a5e ANDROID: android: export kernel function vm_unmapped_area
964220d080 ANDROID: shmem: vendor hook in shmem_alloc_page
bd2ca0ba5b FROMLIST: pstore/ram: Rework logic for detecting ramoops reserved memory region
daeabfe7fa ANDROID: mm: add reclaim_shmem_address_space() for faster reclaims
4c3dddf408 ANDROID: Update the generic ABI symbol list
4c4d8cbdef ANDROID: GKI: refresh ABI XML
01e4a037d8 ANDROID: GKI: turn on TIDY_ABI
edf973fd24 ANDROID: Update symbol list for VIVO
1702d2c8b7 FROMGIT: net: cdc_ncm: switch to eth%d interface naming
f4d6e8324c ANDROID: GKI: add allowed GKI symbol for Exynosauto SoC
444a0b7752 ANDROID: mm: add vendor hook for vmpressure
c799c6644b ANDROID: fips140: adjust some log messages
091338cb39 ANDROID: fips140: add missing static keyword to fips140_init()
70bfd6a7e0 ANDROID: GKI: update allowed list for exynosauto SoC
3e3147b280 UPSTREAM: scsi: ufs: ufshcd: Fix some function doc-rot
2c553e754f UPSTREAM: scsi: ufs: Adjust ufshcd_hold() during sending attribute requests
52ccdf90b9 FROMLIST: lockdep: Remove console_verbose when disable lock debugging
4458494476 ANDROID: ABI: qcom: Add symbols for 80211
5c51579fde ANDROID: fork: Export task_newtask tracepoint
e2a90797e8 ANDROID: Fix kernelci warnings for indentation in smp.c
bac33eaebf ANDROID: irqchip: gic-v3: Move struct gic_chip_data to header
bdac4418bf ANDROID: abi_gki_aarch64_qcom: Add android_vh_ufs_clock_scaling
65c1de0f06 ANDROID: Update symbol list for mtk
d4d02ab9b0 UPSTREAM: swiotlb: manipulate orig_addr when tlb_addr has offset
58aa0f2832 ANDROID: qcom: Add net related symbol
2f9f816445 ANDROID: Update the exynos symbol list
b2a9471239 ANDROID: Update symbol list for mtk
7c9599e204 FROMGIT: usb: dwc3: Create helper function getting MDWIDTH
0a24affb86 ANDROID: vendor_hooks: modify the function name
d686d5ffc6 ANDROID: GKI: Add some symbols to symbol list
bdfb11230b ANDROID: cpuidle: Allow for an early exit from cpuidle_enter_state()
f0b280c395 ANDROID: cpuidle: Update cpuidle_uninstall_idle_handler() to wakeup all online CPUs
14dd90ab37 ANDROID: scsi: ufs: Add hook to influence the UFS clock scaling policy
00aec39e2e FROMGIT: bpf: Support all gso types in bpf_skb_change_proto()

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6e5be89f3f02c420237a549f4c6a08b5ed434581
2021-07-13 15:00:50 +02:00
Liujie Xie
f41a95eadc ANDROID: Export memcg functions to allow module to add new files
Export cgroup_add_legacy_cftypes and a helper function to allow vendor module to expose additional files in the memory cgroup hierarchy.

Bug: 192052083

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ie2b936b3e77c7ab6d740d1bb6d70e03c70a326a7
2021-07-12 18:53:29 +00:00
Greg Kroah-Hartman
bc9699030e Merge branch 'android12-5.10' into android12-5.10-lts
Sync up with android12-5.10 for the following commits:

de1a0ea811 ANDROID: logbuf: Remove if directive for vendor hooks
3b6916b4d4 ANDROID: iommu/io-pgtable-arm: Add IOMMU_CACHE_ICACHE_OCACHE_NWA
2e289f3641 FROMGIT: mac80211_hwsim: add concurrent channels scanning support over virtio
5b1baee639 ANDROID: GKI: update allowed symbols for exynosauto soc
961be31178 ANDROID: GKI: initial upload list for exynosauto soc
01f2392e13 ANDROID: logbuf: Add new logbuf vendor hook to support pr_cont()
3cd04ea95a ANDROID: lib: Export show_mem() for vendor module usage
ba085dd70a FROMLIST: remoteproc: core: Export the rproc coredump APIs
1093a9bfdb ANDROID: sched: select fallback rq must check for allowed cpus
8943a2e7a3 ANDROID: android: Export symbols for invoking cpufreq_update_util()
e7cf28a1a4 ANDROID: Update symbol list for mtk
c685777105 ANDROID: kernel: add module info for debug_kinfo
ea53c24cb0 FROMGIT: bpf: Do not change gso_size during bpf_skb_change_proto()
b007e43692 FROMGIT: mm: slub: fix the leak of alloc/free traces debugfs interface
850f11aa85 Revert "KMI: BACKPORT: FROMGIT: scsi: ufs: Optimize host lock on transfer requests send/compl paths"
46575badbb Revert "BACKPORT: FROMGIT: scsi: ufs: Optimize host lock on transfer requests send/compl paths"
83d653257a Revert "FROMGIT: scsi: ufs: Utilize Transfer Request List Completion Notification Register"
9f8f2ea03e ANDROID: power: wakeup_reason: change abort log
d5a092726b ANDROID: GKI: Update abi_gki_aarch64_qcom list for rwsem list add
eabe9707f2 ANDROID: Add hook to show vendor info for transactions
12902c9996 ANDROID: vendor_hooks: Export direct reclaim trace points
cea24faf98 ANDROID: Update the ABI representation
c0e8aae5c5 ANDROID: qcom: Add xfrm and skb related symbols
8102df91f2 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.10.y' into android12-5.10
b70055f85a ANDROID: iommu: Revise vendor hook param for iova free tracking
a985701859 ANDROID: abi_gki_aarch64_qcom: Add additional symbols for 32bit execve
c7c351ab3f ANDROID: sched: add restricted tracehooks for 32bit execve
f502bc761a ANDROID: GKI: Update symbols to symbol list
4198bdb2cf ANDROID: coresight: Update ETE DT yaml file
6f1bd7583f ANDROID: coresight: Update ETE/TRBE to v6 merged upstream
ce71010347 ANDROID: kvm: arm64: Clarify the comment for SPE save context
ee7e80c81b BACKPORT: arm64: KVM: Enable access to TRBE support for host
b6b0927eac BACKPORT: KVM: arm64: Move SPE availability check to VCPU load
dde40c1089 UPSTREAM: KVM: arm64: Handle access to TRFCR_EL1
ee682ec766 ANDROID: GKI: Enable ARCH_SPRD and SPRD_TIMER
39147716e8 UPSTREAM: x86, lto: Pass -stack-alignment only on LLD < 13.0.0
7d3618b8b9 ANDROID: fix permission error of page_pinner
cfe1f5aea6 ANDROID: gki_config: disable per-cgroup pressure tracking
c7186c2c46 FROMGIT: cgroup: make per-cgroup pressure stall tracking configurable
0d054fc5d7 Revert "ANDROID: make per-cgroup PSI tracking configurable"
951bdfd077 FROMLIST: arm: Mark the recheduling IPI as raw interrupt
42f775df72 FROMLIST: arm64: Mark the recheduling IPI as raw interrupt
3f9d45d802 FROMLIST: genirq: Allow an interrupt to be marked as 'raw'
08327b9007 FROMLIST: genirq: Add __irq_modify_status() helper to clear/set special flags
77c9f446b6 ANDROID: GKI: Update abi_gki_aarch64_qcom list for shmem allocations
728626cb04 ANDROID: power: Add vendor hook to qos for GKI purpose.
9d55580966 ANDROID: selinux: modify RTM_GETNEIGH{TBL}
bb51a33182 Revert "f2fs: avoid attaching SB_ACTIVE flag during mount/remount"
c81ac64da1 f2fs: remove false alarm on iget failure during GC
cdeff03989 f2fs: enable extent cache for compression files in read-only
15a475975e f2fs: fix to avoid adding tab before doc section
44e0be85eb f2fs: introduce f2fs_casefolded_name slab cache
eaef955b91 f2fs: swap: support migrating swapfile in aligned write mode
34c703ff04 f2fs: swap: remove dead codes
ec3ea14d2f f2fs: compress: add compress_inode to cache compressed blocks
d90505e519 f2fs: clean up /sys/fs/f2fs/<disk>/features
95f2afc02d f2fs: add pin_file in feature list
d8755821c4 f2fs: Advertise encrypted casefolding in sysfs
7004c47db2 f2fs: Show casefolding support only when supported
832ee33262 f2fs: support RO feature
b5a393c8a8 f2fs: logging neatening

Change-Id: I42f9108240ebf0d1e75b5d2ee24644cb995f8b3c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2021-06-30 15:34:53 +02:00
Suren Baghdasaryan
c7186c2c46 FROMGIT: cgroup: make per-cgroup pressure stall tracking configurable
PSI accounts stalls for each cgroup separately and aggregates it at each
level of the hierarchy. This causes additional overhead with psi_avgs_work
being called for each cgroup in the hierarchy. psi_avgs_work has been
highly optimized, however on systems with large number of cgroups the
overhead becomes noticeable.
Systems which use PSI only at the system level could avoid this overhead
if PSI can be configured to skip per-cgroup stall accounting.
Add "cgroup_disable=pressure" kernel command-line option to allow
requesting system-wide only pressure stall accounting. When set, it
keeps system-wide accounting under /proc/pressure/ but skips accounting
for individual cgroups and does not expose PSI nodes in cgroup hierarchy.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Link:  https://lore.kernel.org/patchwork/patch/1435705
(cherry picked from commit 3958e2d0c34e18c41b60dc01832bd670a59ef70f
 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git tj)

Bug: 178872719
Bug: 191734423
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ifc8fbc52f9a1131d7c2668edbb44c525c76c3360
2021-06-23 18:35:35 +00:00
Greg Kroah-Hartman
82658bfd88 Merge 5.10.44 into android12-5.10-lts
Changes in 5.10.44
	proc: Track /proc/$pid/attr/ opener mm_struct
	ASoC: max98088: fix ni clock divider calculation
	ASoC: amd: fix for pcm_read() error
	spi: Fix spi device unregister flow
	spi: spi-zynq-qspi: Fix stack violation bug
	bpf: Forbid trampoline attach for functions with variable arguments
	net/nfc/rawsock.c: fix a permission check bug
	usb: cdns3: Fix runtime PM imbalance on error
	ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
	ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
	vfio-ccw: Reset FSM state to IDLE inside FSM
	vfio-ccw: Serialize FSM IDLE state with I/O completion
	ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
	spi: sprd: Add missing MODULE_DEVICE_TABLE
	usb: chipidea: udc: assign interrupt number to USB gadget structure
	isdn: mISDN: netjet: Fix crash in nj_probe:
	bonding: init notify_work earlier to avoid uninitialized use
	netlink: disable IRQs for netlink_lock_table()
	net: mdiobus: get rid of a BUG_ON()
	cgroup: disable controllers at parse time
	wq: handle VM suspension in stall detection
	net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
	RDS tcp loopback connection can hang
	net:sfc: fix non-freed irq in legacy irq mode
	scsi: bnx2fc: Return failure if io_req is already in ABTS processing
	scsi: vmw_pvscsi: Set correct residual data length
	scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq
	scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
	net: macb: ensure the device is available before accessing GEMGXL control registers
	net: appletalk: cops: Fix data race in cops_probe1
	net: dsa: microchip: enable phy errata workaround on 9567
	nvme-fabrics: decode host pathing error for connect
	MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
	dm verity: fix require_signatures module_param permissions
	bnx2x: Fix missing error code in bnx2x_iov_init_one()
	nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
	nvmet: fix false keep-alive timeout when a controller is torn down
	powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
	powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
	spi: Don't have controller clean up spi device before driver unbind
	spi: Cleanup on failure of initial setup
	i2c: mpc: Make use of i2c_recover_bus()
	i2c: mpc: implement erratum A-004447 workaround
	ALSA: seq: Fix race of snd_seq_timer_open()
	ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun()
	ALSA: hda/realtek: headphone and mic don't work on an Acer laptop
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly G2
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360 1040 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8
	spi: bcm2835: Fix out-of-bounds access with more than 4 slaves
	Revert "ACPI: sleep: Put the FACS table after using it"
	drm: Fix use-after-free read in drm_getunique()
	drm: Lock pointer access in drm_master_release()
	perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server
	KVM: X86: MMU: Use the correct inherited permissions to get shadow page
	kvm: avoid speculation-based attacks from out-of-range memslot accesses
	staging: rtl8723bs: Fix uninitialized variables
	async_xor: check src_offs is not NULL before updating it
	btrfs: return value from btrfs_mark_extent_written() in case of error
	btrfs: promote debugging asserts to full-fledged checks in validate_super
	cgroup1: don't allow '\n' in renaming
	ftrace: Do not blindly read the ip address in ftrace_bug()
	mmc: renesas_sdhi: abort tuning when timeout detected
	mmc: renesas_sdhi: Fix HS400 on R-Car M3-W+
	USB: f_ncm: ncm_bitrate (speed) is unsigned
	usb: f_ncm: only first packet of aggregate needs to start timer
	usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms
	usb: dwc3-meson-g12a: fix usb2 PHY glue init when phy0 is disabled
	usb: dwc3: meson-g12a: Disable the regulator in the error handling path of the probe
	usb: dwc3: gadget: Bail from dwc3_gadget_exit() if dwc->gadget is NULL
	usb: dwc3: ep0: fix NULL pointer exception
	usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling
	usb: typec: wcove: Use LE to CPU conversion when accessing msg->header
	usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
	usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe()
	usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource()
	usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind
	USB: serial: ftdi_sio: add NovaTech OrionMX product ID
	USB: serial: omninet: add device id for Zyxel Omni 56K Plus
	USB: serial: quatech2: fix control-request directions
	USB: serial: cp210x: fix alternate function for CP2102N QFN20
	usb: gadget: eem: fix wrong eem header operation
	usb: fix various gadgets null ptr deref on 10gbps cabling.
	usb: fix various gadget panics on 10gbps cabling
	usb: typec: tcpm: cancel vdm and state machine hrtimer when unregister tcpm port
	usb: typec: tcpm: cancel frs hrtimer when unregister tcpm port
	regulator: core: resolve supply for boot-on/always-on regulators
	regulator: max77620: Use device_set_of_node_from_dev()
	regulator: bd718x7: Fix the BUCK7 voltage setting on BD71837
	regulator: fan53880: Fix missing n_voltages setting
	regulator: bd71828: Fix .n_voltages settings
	regulator: rtmv20: Fix .set_current_limit/.get_current_limit callbacks
	phy: usb: Fix misuse of IS_ENABLED
	usb: dwc3: gadget: Disable gadget IRQ during pullup disable
	usb: typec: mux: Fix copy-paste mistake in typec_mux_match
	drm/mcde: Fix off by 10^3 in calculation
	drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650
	drm/msm/a6xx: update/fix CP_PROTECT initialization
	drm/msm/a6xx: avoid shadow NULL reference in failure path
	RDMA/ipoib: Fix warning caused by destroying non-initial netns
	RDMA/mlx4: Do not map the core_clock page to user space unless enabled
	ARM: cpuidle: Avoid orphan section warning
	vmlinux.lds.h: Avoid orphan section with !SMP
	tools/bootconfig: Fix error return code in apply_xbc()
	phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe()
	ASoC: core: Fix Null-point-dereference in fmt_single_name()
	ASoC: meson: gx-card: fix sound-dai dt schema
	phy: ti: Fix an error code in wiz_probe()
	gpio: wcd934x: Fix shift-out-of-bounds error
	perf: Fix data race between pin_count increment/decrement
	sched/fair: Keep load_avg and load_sum synced
	sched/fair: Make sure to update tg contrib for blocked load
	sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
	x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
	KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message
	IB/mlx5: Fix initializing CQ fragments buffer
	NFS: Fix a potential NULL dereference in nfs_get_client()
	NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()
	perf session: Correct buffer copying when peeking events
	kvm: fix previous commit for 32-bit builds
	NFS: Fix use-after-free in nfs4_init_client()
	NFSv4: Fix second deadlock in nfs4_evict_inode()
	NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
	scsi: core: Fix error handling of scsi_host_alloc()
	scsi: core: Fix failure handling of scsi_add_host_with_dma()
	scsi: core: Put .shost_dev in failure path if host state changes to RUNNING
	scsi: core: Only put parent device if host state differs from SHOST_CREATED
	tracing: Correct the length check which causes memory corruption
	proc: only require mm_struct for writing
	Linux 5.10.44

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic64172b4e72ccb54d96000b3065dd8b33aa9fef5
2021-06-16 13:14:03 +02:00
Shakeel Butt
5ca472d40e cgroup: disable controllers at parse time
[ Upstream commit 45e1ba40837ac2f6f4d4716bddb8d44bd7e4a251 ]

This patch effectively reverts the commit a3e72739b7 ("cgroup: fix
too early usage of static_branch_disable()"). The commit 6041186a32
("init: initialize jump labels before command line option parsing") has
moved the jump_label_init() before parse_args() which has made the
commit a3e72739b7 unnecessary. On the other hand there are
consequences of disabling the controllers later as there are subsystems
doing the controller checks for different decisions. One such incident
is reported [1] regarding the memory controller and its impact on memory
reclaim code.

[1] https://lore.kernel.org/linux-mm/921e53f3-4b13-aab8-4a9e-e83ff15371e4@nec.com

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: NOMURA JUNICHI(野村 淳一) <junichi.nomura@nec.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Jun'ichi Nomura <junichi.nomura@nec.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-16 12:01:36 +02:00
Jing-Ting Wu
2bb462a3af ANDROID: cgroup: add vendor hook to cgroup .attach()
Add a vendor hook when a set of tasks change the cgroups in
the controller for performance tuning.

Bug: 187972571
Signed-off-by: Jing-Ting Wu <Jing-Ting.Wu@mediatek.com>
Change-Id: If0ea3d089e077a4650507c69b6e60952d102fa1d
2021-05-18 15:41:48 +00:00
Pavankumar Kondeti
29203f8c8f ANDROID: cgroup: Add android_rvh_cgroup_force_kthread_migration
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.

It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.

[1] f08f049de1
[2] https://lore.kernel.org/lkml/1617714261-18111-1-git-send-email-pkondeti@codeaurora.org

Bug: 184594949
Change-Id: I445a170ba797c8bece3b4b59b7a42cdd85438f1f
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
2021-05-04 20:13:09 +00:00
Greg Kroah-Hartman
b129c98dc6 Merge 5.10.17 into android12-5.10
Changes in 5.10.17
	objtool: Fix seg fault with Clang non-section symbols
	Revert "dts: phy: add GPIO number and active state used for phy reset"
	gpio: mxs: GPIO_MXS should not default to y unconditionally
	gpio: ep93xx: fix BUG_ON port F usage
	gpio: ep93xx: Fix single irqchip with multi gpiochips
	tracing: Do not count ftrace events in top level enable output
	tracing: Check length before giving out the filter buffer
	drm/i915: Fix overlay frontbuffer tracking
	arm/xen: Don't probe xenbus as part of an early initcall
	cgroup: fix psi monitor for root cgroup
	Revert "drm/amd/display: Update NV1x SR latency values"
	drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it
	drm/dp_mst: Don't report ports connected if nothing is attached to them
	dmaengine: move channel device_node deletion to driver
	tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
	tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
	soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1
	arm64: dts: rockchip: Fix PCIe DT properties on rk3399
	arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc
	ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled
	arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node
	platform/x86: hp-wmi: Disable tablet-mode reporting by default
	arm64: dts: rockchip: Disable display for NanoPi R2S
	ovl: perform vfs_getxattr() with mounter creds
	cap: fix conversions on getxattr
	ovl: skip getxattr of security labels
	scsi: lpfc: Fix EEH encountering oops with NVMe traffic
	x86/split_lock: Enable the split lock feature on another Alder Lake CPU
	nvme-pci: ignore the subsysem NQN on Phison E16
	drm/amd/display: Fix DPCD translation for LTTPR AUX_RD_INTERVAL
	drm/amd/display: Add more Clock Sources to DCN2.1
	drm/amd/display: Release DSC before acquiring
	drm/amd/display: Fix dc_sink kref count in emulated_link_detect
	drm/amd/display: Free atomic state after drm_atomic_commit
	drm/amd/display: Decrement refcount of dc_sink before reassignment
	riscv: virt_addr_valid must check the address belongs to linear mapping
	bfq-iosched: Revert "bfq: Fix computation of shallow depth"
	ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
	kallsyms: fix nonconverging kallsyms table with lld
	ARM: ensure the signal page contains defined contents
	ARM: kexec: fix oops after TLB are invalidated
	ubsan: implement __ubsan_handle_alignment_assumption
	Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
	x86/efi: Remove EFI PGD build time checks
	lkdtm: don't move ctors to .rodata
	KVM: x86: cleanup CR3 reserved bits checks
	cgroup-v1: add disabled controller check in cgroup1_parse_param()
	dmaengine: idxd: fix misc interrupt completion
	ath9k: fix build error with LEDS_CLASS=m
	mt76: dma: fix a possible memory leak in mt76_add_fragment()
	drm/vc4: hvs: Fix buffer overflow with the dlist handling
	dmaengine: idxd: check device state before issue command
	bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3
	bpf: Check for integer overflow when using roundup_pow_of_two()
	netfilter: xt_recent: Fix attempt to update deleted entry
	selftests: netfilter: fix current year
	netfilter: nftables: fix possible UAF over chains from packet path in netns
	netfilter: flowtable: fix tcp and udp header checksum update
	xen/netback: avoid race in xenvif_rx_ring_slots_available()
	net: hdlc_x25: Return meaningful error code in x25_open
	net: ipa: set error code in gsi_channel_setup()
	hv_netvsc: Reset the RSC count if NVSP_STAT_FAIL in netvsc_receive()
	net: enetc: initialize the RFS and RSS memories
	selftests: txtimestamp: fix compilation issue
	net: stmmac: set TxQ mode back to DCB after disabling CBS
	ibmvnic: Clear failover_pending if unable to schedule
	netfilter: conntrack: skip identical origin tuple in same zone only
	scsi: scsi_debug: Fix a memory leak
	x86/build: Disable CET instrumentation in the kernel for 32-bit too
	net: dsa: felix: implement port flushing on .phylink_mac_link_down
	net: hns3: add a check for queue_id in hclge_reset_vf_queue()
	net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
	net: hns3: add a check for index in hclge_get_rss_key()
	firmware_loader: align .builtin_fw to 8
	drm/sun4i: tcon: set sync polarity for tcon1 channel
	drm/sun4i: dw-hdmi: always set clock rate
	drm/sun4i: Fix H6 HDMI PHY configuration
	drm/sun4i: dw-hdmi: Fix max. frequency for H6
	clk: sunxi-ng: mp: fix parent rate change flag check
	i2c: stm32f7: fix configuration of the digital filter
	h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
	scripts: set proper OpenSSL include dir also for sign-file
	x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init()
	arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
	rxrpc: Fix clearance of Tx/Rx ring when releasing a call
	udp: fix skb_copy_and_csum_datagram with odd segment sizes
	net: dsa: call teardown method on probe failure
	cpufreq: ACPI: Extend frequency tables to cover boost frequencies
	cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
	net: gro: do not keep too many GRO packets in napi->rx_list
	net: fix iteration for sctp transport seq_files
	net/vmw_vsock: fix NULL pointer dereference
	net/vmw_vsock: improve locking in vsock_connect_timeout()
	net: watchdog: hold device global xmit lock during tx disable
	bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
	switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
	vsock/virtio: update credit only if socket is not closed
	vsock: fix locking in vsock_shutdown()
	net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
	net/qrtr: restrict user-controlled length in qrtr_tun_write_iter()
	ovl: expand warning in ovl_d_real()
	kcov, usb: only collect coverage from __usb_hcd_giveback_urb in softirq
	Linux 5.10.17

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id0300681f52b51d3f466f1e66ec3a6c25f65f4d3
2021-02-18 11:21:01 +01:00
Odin Ugedal
e72a65802a cgroup: fix psi monitor for root cgroup
commit 385aac1519417b89cb91b77c22e4ca21db563cd0 upstream.

Fix NULL pointer dereference when adding new psi monitor to the root
cgroup. PSI files for root cgroup was introduced in df5ba5be74 by using
system wide psi struct when reading, but file write/monitor was not
properly fixed. Since the PSI config for the root cgroup isn't
initialized, the current implementation tries to lock a NULL ptr,
resulting in a crash.

Can be triggered by running this as root:
$ tee /sys/fs/cgroup/cpu.pressure <<< "some 10000 1000000"

Signed-off-by: Odin Ugedal <odin@uged.al>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: df5ba5be74 ("kernel/sched/psi.c: expose pressure metrics on root cgroup")
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # 5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:21 +01:00
Abhijeet Dharmapurikar
3975c98baf ANDROID: cgroup: Export functions used by modules
Below symbols would be used by vendor modules for tracking tasks
in various cgroups
	1. cgroup_taskset_first
	2. cgroup_taskset_next

Bug: 175045928
Change-Id: I6d89148eb4c71174a02a27acab196ff940be9082
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
(cherry picked from commit 7333bd73c055e8ebe9862316eaaff691e0eace27)
2020-12-14 11:22:22 +00:00
Shaleen Agrawal
5a920a6503 ANDROID: Sched: Export scheduler symbols needed by vendor modules
Need to export internal scheduler symbols to facilitate vendor module
with scheduler based value-adds.

Bug: 173725277
Change-Id: I021f09097dfc1480abcc998cc8e05e75b2ee828b
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2020-12-03 16:50:04 +00:00
Jouni Roivas
65026da59c cgroup: Zero sized write should be no-op
Do not report failure on zero sized writes, and handle them as no-op.

There's issues for example in case of writev() when there's iovec
containing zero buffer as a first one. It's expected writev() on below
example to successfully perform the write to specified writable cgroup
file expecting integer value, and to return 2. For now it's returning
value -1, and skipping the write:

	int writetest(int fd) {
	  const char *buf1 = "";
	  const char *buf2 = "1\n";
          struct iovec iov[2] = {
                { .iov_base = (void*)buf1, .iov_len = 0 },
                { .iov_base = (void*)buf2, .iov_len = 2 }
          };
	  return writev(fd, iov, 2);
	}

This patch fixes the issue by checking if there's nothing to write,
and handling the write as no-op by just returning 0.

Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-09-30 13:52:06 -04:00
Wei Yang
95d325185c cgroup: remove redundant kernfs_activate in cgroup_setup_root()
This step is already done in rebind_subsystems().

Not necessary to do it again.

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-09-30 12:03:10 -04:00
Cong Wang
ad0f75e5f5 cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-07 13:34:11 -07:00
Linus Torvalds
4a7e89c5ec Merge branch 'for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Just two patches: one to add system-level cpu.stat to the root cgroup
  for convenience and a trivial comment update"

* 'for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add cpu.stat file to root cgroup
  cgroup: Remove stale comments
2020-06-06 09:59:34 -07:00
Boris Burkov
936f2a70f2 cgroup: add cpu.stat file to root cgroup
Currently, the root cgroup does not have a cpu.stat file. Add one which
is consistent with /proc/stat to capture global cpu statistics that
might not fall under cgroup accounting.

We haven't done this in the past because the data are already presented
in /proc/stat and we didn't want to add overhead from collecting root
cgroup stats when cgroups are configured, but no cgroups have been
created.

By keeping the data consistent with /proc/stat, I think we avoid the
first problem, while improving the usability of cgroups stats.
We avoid the second problem by computing the contents of cpu.stat from
existing data collected for /proc/stat anyway.

Signed-off-by: Boris Burkov <boris@bur.io>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-28 10:06:35 -04:00
Zefan Li
6b6ebb3474 cgroup: Remove stale comments
- The default root is where we can create v2 cgroups.
- The __DEVEL__sane_behavior mount option has been removed long long ago.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-26 13:20:24 -04:00
Andrii Nakryiko
f9d041271c bpf: Refactor bpf_link update handling
Make bpf_link update support more generic by making it into another
bpf_link_ops methods. This allows generic syscall handling code to be agnostic
to various conditionally compiled features (e.g., the case of
CONFIG_CGROUP_BPF). This also allows to keep link type-specific code to remain
static within respective code base. Refactor existing bpf_cgroup_link code and
take advantage of this.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-2-andriin@fb.com
2020-04-28 17:27:07 -07:00
Linus Torvalds
d883600523 Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Christian extended clone3 so that processes can be spawned into
   cgroups directly.

   This is not only neat in terms of semantics but also avoids grabbing
   the global cgroup_threadgroup_rwsem for migration.

 - Daniel added !root xattr support to cgroupfs.

   Userland already uses xattrs on cgroupfs for bookkeeping. This will
   allow delegated cgroups to support such usages.

 - Prateek tried to make cpuset hotplug handling synchronous but that
   led to possible deadlock scenarios. Reverted.

 - Other minor changes including release_agent_path handling cleanup.

* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup-v1: Document the cpuset_v2_mode mount option
  Revert "cpuset: Make cpuset hotplug synchronous"
  cgroupfs: Support user xattrs
  kernfs: Add option to enable user xattrs
  kernfs: Add removed_size out param for simple_xattr_set
  kernfs: kvmalloc xattr value instead of kmalloc
  cgroup: Restructure release_agent_path handling
  selftests/cgroup: add tests for cloning into cgroups
  clone3: allow spawning processes into cgroups
  cgroup: add cgroup_may_write() helper
  cgroup: refactor fork helpers
  cgroup: add cgroup_get_from_file() helper
  cgroup: unify attach permission checking
  cpuset: Make cpuset hotplug synchronous
  cgroup.c: Use built-in RCU list checking
  kselftest/cgroup: add cgroup destruction test
  cgroup: Clean up css_set task traversal
2020-04-03 11:30:20 -07:00
Johannes Weiner
8a931f8013 mm: memcontrol: recursive memory.low protection
Right now, the effective protection of any given cgroup is capped by its
own explicit memory.low setting, regardless of what the parent says.  The
reasons for this are mostly historical and ease of implementation: to make
delegation of memory.low safe, effective protection is the min() of all
memory.low up the tree.

Unfortunately, this limitation makes it impossible to protect an entire
subtree from another without forcing the user to make explicit protection
allocations all the way to the leaf cgroups - something that is highly
undesirable in real life scenarios.

Consider memory in a data center host.  At the cgroup top level, we have a
distinction between system management software and the actual workload the
system is executing.  Both branches are further subdivided into individual
services, job components etc.

We want to protect the workload as a whole from the system management
software, but that doesn't mean we want to protect and prioritize
individual workload wrt each other.  Their memory demand can vary over
time, and we'd want the VM to simply cache the hottest data within the
workload subtree.  Yet, the current memory.low limitations force us to
allocate a fixed amount of protection to each workload component in order
to get protection from system management software in general.  This
results in very inefficient resource distribution.

Another concern with mandating downward allocation is that, as the
complexity of the cgroup tree grows, it gets harder for the lower levels
to be informed about decisions made at the host-level.  Consider a
container inside a namespace that in turn creates its own nested tree of
cgroups to run multiple workloads.  It'd be extremely difficult to
configure memory.low parameters in those leaf cgroups that on one hand
balance pressure among siblings as the container desires, while also
reflecting the host-level protection from e.g.  rpm upgrades, that lie
beyond one or more delegation and namespacing points in the tree.

It's highly unusual from a cgroup interface POV that nested levels have to
be aware of and reflect decisions made at higher levels for them to be
effective.

To enable such use cases and scale configurability for complex trees, this
patch implements a resource inheritance model for memory that is similar
to how the CPU and the IO controller implement work-conserving resource
allocations: a share of a resource allocated to a subree always applies to
the entire subtree recursively, while allowing, but not mandating,
children to further specify distribution rules.

That means that if protection is explicitly allocated among siblings,
those configured shares are being followed during page reclaim just like
they are now.  However, if the memory.low set at a higher level is not
fully claimed by the children in that subtree, the "floating" remainder is
applied to each cgroup in the tree in proportion to its size.  Since
reclaim pressure is applied in proportion to size as well, each child in
that tree gets the same boost, and the effect is neutral among siblings -
with respect to each other, they behave as if no memory control was
enabled at all, and the VM simply balances the memory demands optimally
within the subtree.  But collectively those cgroups enjoy a boost over the
cgroups in neighboring trees.

E.g.  a leaf cgroup with a memory.low setting of 0 no longer means that
it's not getting a share of the hierarchically assigned resource, just
that it doesn't claim a fixed amount of it to protect from its siblings.

This allows us to recursively protect one subtree (workload) from another
(system management), while letting subgroups compete freely among each
other - without having to assign fixed shares to each leaf, and without
nested groups having to echo higher-level settings.

The floating protection composes naturally with fixed protection.
Consider the following example tree:

		A            A: low = 2G
               / \          A1: low = 1G
              A1 A2         A2: low = 0G

As outside pressure is applied to this tree, A1 will enjoy a fixed
protection from A2 of 1G, but the remaining, unclaimed 1G from A is split
evenly among A1 and A2, coming out to 1.5G and 0.5G.

There is a slight risk of regressing theoretical setups where the
top-level cgroups don't know about the true budgeting and set bogusly high
"bypass" values that are meaningfully allocated down the tree.  Such
setups would rely on unclaimed protection to be discarded, and
distributing it would change the intended behavior.  Be safe and hide the
new behavior behind a mount option, 'memory_recursiveprot'.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Link: http://lkml.kernel.org/r/20200227195606.46212-4-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:28 -07:00
Andrii Nakryiko
0c991ebc8c bpf: Implement bpf_prog replacement for an active bpf_cgroup_link
Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.

For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.

cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
2020-03-30 17:36:33 -07:00
Andrii Nakryiko
af6eea5743 bpf: Implement bpf_link-based cgroup BPF program attachment
Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. Cgroup bpf_link supports only
BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
attachments can be freely intermixed.

To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.

The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
2020-03-30 17:35:59 -07:00
Daniel Xu
38aca3071c cgroupfs: Support user xattrs
This patch turns on xattr support for cgroupfs. This is useful for
letting non-root owners of delegated subtrees attach metadata to
cgroups.

One use case is for subtree owners to tell a userspace out of memory
killer to bias away from killing specific subtrees.

Tests:

    [/sys/fs/cgroup]# for i in $(seq 0 130); \
        do setfattr workload.slice -n user.name$i -v wow; done
    setfattr: workload.slice: No space left on device
    setfattr: workload.slice: No space left on device
    setfattr: workload.slice: No space left on device

    [/sys/fs/cgroup]# for i in $(seq 0 130); \
        do setfattr workload.slice --remove user.name$i; done
    setfattr: workload.slice: No such attribute
    setfattr: workload.slice: No such attribute
    setfattr: workload.slice: No such attribute

    [/sys/fs/cgroup]# for i in $(seq 0 130); \
        do setfattr workload.slice -n user.name$i -v wow; done
    setfattr: workload.slice: No space left on device
    setfattr: workload.slice: No space left on device
    setfattr: workload.slice: No space left on device

`seq 0 130` is inclusive, and 131 - 128 = 3, which is the number of
errors we expect to see.

    [/data]# cat testxattr.c
    #include <sys/types.h>
    #include <sys/xattr.h>
    #include <stdio.h>
    #include <stdlib.h>

    int main() {
      char name[256];
      char *buf = malloc(64 << 10);
      if (!buf) {
        perror("malloc");
        return 1;
      }

      for (int i = 0; i < 4; ++i) {
        snprintf(name, 256, "user.bigone%d", i);
        if (setxattr("/sys/fs/cgroup/system.slice", name, buf,
                     64 << 10, 0)) {
          printf("setxattr failed on iteration=%d\n", i);
          return 1;
        }
      }

      return 0;
    }

    [/data]# ./a.out
    setxattr failed on iteration=2

    [/data]# ./a.out
    setxattr failed on iteration=0

    [/sys/fs/cgroup]# setfattr -x user.bigone0 system.slice/
    [/sys/fs/cgroup]# setfattr -x user.bigone1 system.slice/

    [/data]# ./a.out
    setxattr failed on iteration=2

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-16 15:53:47 -04:00
Linus Torvalds
1b51f69461 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "It looks like a decent sized set of fixes, but a lot of these are one
  liner off-by-one and similar type changes:

   1) Fix netlink header pointer to calcular bad attribute offset
      reported to user. From Pablo Neira Ayuso.

   2) Don't double clear PHY interrupts when ->did_interrupt is set,
      from Heiner Kallweit.

   3) Add missing validation of various (devlink, nl802154, fib, etc.)
      attributes, from Jakub Kicinski.

   4) Missing *pos increments in various netfilter seq_next ops, from
      Vasily Averin.

   5) Missing break in of_mdiobus_register() loop, from Dajun Jin.

   6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.

   7) Work around FMAN erratum A050385, from Madalin Bucur.

   8) Make sure ARP header is pulled early enough in bonding driver,
      from Eric Dumazet.

   9) Do a cond_resched() during multicast processing of ipvlan and
      macvlan, from Mahesh Bandewar.

  10) Don't attach cgroups to unrelated sockets when in interrupt
      context, from Shakeel Butt.

  11) Fix tpacket ring state management when encountering unknown GSO
      types. From Willem de Bruijn.

  12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
      only in the suspend context. From Heiner Kallweit"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  net: systemport: fix index check to avoid an array out of bounds access
  tc-testing: add ETS scheduler to tdc build configuration
  net: phy: fix MDIO bus PM PHY resuming
  net: hns3: clear port base VLAN when unload PF
  net: hns3: fix RMW issue for VLAN filter switch
  net: hns3: fix VF VLAN table entries inconsistent issue
  net: hns3: fix "tc qdisc del" failed issue
  taprio: Fix sending packets without dequeueing them
  net: mvmdio: avoid error message for optional IRQ
  net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
  net: memcg: fix lockdep splat in inet_csk_accept()
  s390/qeth: implement smarter resizing of the RX buffer pool
  s390/qeth: refactor buffer pool code
  s390/qeth: use page pointers to manage RX buffer pool
  seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
  net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
  net/packet: tpacket_rcv: do not increment ring index on drop
  sxgbe: Fix off by one in samsung driver strncpy size arg
  net: caif: Add lockdep expression to RCU traversal primitive
  MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
  ...
2020-03-12 16:19:19 -07:00
Tejun Heo
a09833f7cd Merge branch 'for-5.6-fixes' into for-5.7 2020-03-12 16:44:18 -04:00
Shakeel Butt
e876ecc67d cgroup: memcg: net: do not associate sock with unrelated cgroup
We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted:  5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
 <IRQ>
 dump_stack+0x57/0x75
 mem_cgroup_sk_alloc+0xe9/0xf0
 sk_clone_lock+0x2a7/0x420
 inet_csk_clone_lock+0x1b/0x110
 tcp_create_openreq_child+0x23/0x3b0
 tcp_v6_syn_recv_sock+0x88/0x730
 tcp_check_req+0x429/0x560
 tcp_v6_rcv+0x72d/0xa40
 ip6_protocol_deliver_rcu+0xc9/0x400
 ip6_input+0x44/0xd0
 ? ip6_protocol_deliver_rcu+0x400/0x400
 ip6_rcv_finish+0x71/0x80
 ipv6_rcv+0x5b/0xe0
 ? ip6_sublist_rcv+0x2e0/0x2e0
 process_backlog+0x108/0x1e0
 net_rx_action+0x26b/0x460
 __do_softirq+0x104/0x2a6
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.19+0x40/0x50
 __local_bh_enable_ip+0x51/0x60
 ip6_finish_output2+0x23d/0x520
 ? ip6table_mangle_hook+0x55/0x160
 __ip6_finish_output+0xa1/0x100
 ip6_finish_output+0x30/0xd0
 ip6_output+0x73/0x120
 ? __ip6_finish_output+0x100/0x100
 ip6_xmit+0x2e3/0x600
 ? ipv6_anycast_cleanup+0x50/0x50
 ? inet6_csk_route_socket+0x136/0x1e0
 ? skb_free_head+0x1e/0x30
 inet6_csk_xmit+0x95/0xf0
 __tcp_transmit_skb+0x5b4/0xb20
 __tcp_send_ack.part.60+0xa3/0x110
 tcp_send_ack+0x1d/0x20
 tcp_rcv_state_process+0xe64/0xe80
 ? tcp_v6_connect+0x5d1/0x5f0
 tcp_v6_do_rcv+0x1b1/0x3f0
 ? tcp_v6_do_rcv+0x1b1/0x3f0
 __release_sock+0x7f/0xd0
 release_sock+0x30/0xa0
 __inet_stream_connect+0x1c3/0x3b0
 ? prepare_to_wait+0xb0/0xb0
 inet_stream_connect+0x3b/0x60
 __sys_connect+0x101/0x120
 ? __sys_getsockopt+0x11b/0x140
 __x64_sys_connect+0x1a/0x20
 do_syscall_64+0x51/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d75807383 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:33:05 -07:00
Qian Cai
190ecb190a cgroup: fix psi_show() crash on 32bit ino archs
Similar to the commit d749534322 ("cgroup: fix incorrect
WARN_ON_ONCE() in cgroup_setup_root()"), cgroup_id(root_cgrp) does not
equal to 1 on 32bit ino archs which triggers all sorts of issues with
psi_show() on s390x. For example,

 BUG: KASAN: slab-out-of-bounds in collect_percpu_times+0x2d0/
 Read of size 4 at addr 000000001e0ce000 by task read_all/3667
 collect_percpu_times+0x2d0/0x798
 psi_show+0x7c/0x2a8
 seq_read+0x2ac/0x830
 vfs_read+0x92/0x150
 ksys_read+0xe2/0x188
 system_call+0xd8/0x2b4

Fix it by using cgroup_ino().

Fixes: 743210386c ("cgroup: use cgrp->kn->id as the cgroup ID")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.5
2020-03-04 11:15:58 -05:00
Christian Brauner
ef2c41cf38 clone3: allow spawning processes into cgroups
This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
  cgroups.
- A process can be directly created in a frozen cgroup and will be
  frozen as well.
- The initial accounting jitter experienced by process supervisors and
  daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
  create a specific cgroup layout where each thread is spawned
  directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
a directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
This global lock makes moving tasks/threads around super expensive. With
clone3() this lock is avoided.

Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
Christian Brauner
f3553220d4 cgroup: add cgroup_may_write() helper
Add a cgroup_may_write() helper which we can use in the
CLONE_INTO_CGROUP patch series to verify that we can write to the
destination cgroup.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
Christian Brauner
5a5cf5cb30 cgroup: refactor fork helpers
This refactors the fork helpers so they can be easily modified in the
next patches. The patch just moves the cgroup threadgroup rwsem grab and
release into the helpers. They don't need to be directly exposed in fork.c.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
Christian Brauner
17703097f3 cgroup: add cgroup_get_from_file() helper
Add a helper cgroup_get_from_file(). The helper will be used in
subsequent patches to retrieve a cgroup while holding a reference to the
struct file it was taken from.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
Christian Brauner
6df970e4f5 cgroup: unify attach permission checking
The core codepaths to check whether a process can be attached to a
cgroup are the same for threads and thread-group leaders. Only a small
piece of code verifying that source and destination cgroup are in the
same domain differentiates the thread permission checking from
thread-group leader permission checking.
Since cgroup_migrate_vet_dst() only matters cgroup2 - it is a noop on
cgroup1 - we can move it out of cgroup_attach_task().
All checks can now be consolidated into a new helper
cgroup_attach_permissions() callable from both cgroup_procs_write() and
cgroup_threads_write().

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
Madhuparna Bhowmik
3010c5b9f5 cgroup.c: Use built-in RCU list checking
list_for_each_entry_rcu has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when  CONFIG_PROVE_RCU_LIST is enabled
by default.

Even though the function css_next_child() already checks if
cgroup_mutex or rcu_read_lock() is held using
cgroup_assert_mutex_or_rcu_locked(), there is a need to pass
cond to list_for_each_entry_rcu() to avoid false positive
lockdep warning.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:12:22 -05:00
Michal Koutný
f43caa2adc cgroup: Clean up css_set task traversal
css_task_iter stores pointer to head of each iterable list, this dates
back to commit 0f0a2b4fa6 ("cgroup: reorganize css_task_iter") when we
did not store cur_cset. Let us utilize list heads directly in cur_cset
and streamline css_task_iter_advance_css_set a bit. This is no
intentional function change.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:11:52 -05:00
Michal Koutný
9c974c7724 cgroup: Iterate tasks that did not finish do_exit()
PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.

Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.

Reported-by: Suren Baghdasaryan <surenb@google.com>
Fixes: c03cd7738a ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:02:53 -05:00
Vasily Averin
2d4ecb030d cgroup: cgroup_procs_next should increase position index
If seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output:

1) dd bs=1 skip output of each 2nd elements
$ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
2
3
4
5
1+0 records in
1+0 records out
8 bytes copied, 0,000267297 s, 29,9 kB/s
[test@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
2
4 <<< NB! 3 was skipped
6 <<<    ... and 5 too
8 <<<    ... and 7
8+0 records in
8+0 records out
8 bytes copied, 5,2123e-05 s, 153 kB/s

 This happen because __cgroup_procs_start() makes an extra
 extra cgroup_procs_next() call

2) read after lseek beyond end of file generates whole last line.
3) read after lseek into middle of last line generates
expected rest of last line and unexpected whole line once again.

Additionally patch removes an extra position index changes in
__cgroup_procs_start()

Cc: stable@vger.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:00:04 -05:00
Linus Torvalds
0a679e13ea Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "I made a mistake while removing cgroup task list lazy init
  optimization making the root cgroup.procs show entries for the
  init_tasks. The zero entries doesn't cause critical failures but does
  make systemd print out warning messages during boot.

  Fix it by omitting init_tasks as they should be"

* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: init_tasks shouldn't be linked to the root cgroup
2020-02-10 17:07:05 -08:00
Linus Torvalds
c9d35ee049 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context->log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
2020-02-08 13:26:41 -08:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Tejun Heo
0cd9d33ace cgroup: init_tasks shouldn't be linked to the root cgroup
5153faac18 ("cgroup: remove cgroup_enable_task_cg_lists()
optimization") removed lazy initialization of css_sets so that new
tasks are always lniked to its css_set. In the process, it incorrectly
ended up adding init_tasks to root css_set. They show up as PID 0's in
root's cgroup.procs triggering warnings in systemd and generally
confusing people.

Fix it by skip css_set linking for init_tasks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: https://github.com/joanbm
Link: https://github.com/systemd/systemd/issues/14682
Fixes: 5153faac18 ("cgroup: remove cgroup_enable_task_cg_lists() optimization")
Cc: stable@vger.kernel.org # v5.5+
2020-01-30 11:37:33 -05:00
Linus Torvalds
bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Michal Koutný
3bc0bb36fa cgroup: Prevent double killing of css when enabling threaded cgroup
The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:

> [  597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [  597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160

Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.

When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.

The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.

The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.

Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-01-15 08:04:29 -08:00
Andrey Ignatov
7dd68b3279 bpf: Support replacing cgroup-bpf program in MULTI mode
The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.

Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.

It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:

* One way to update is to detach all programs first and then attach the
  new version(s) again in the right order. This introduces an
  interruption in the work a program is doing and may not be acceptable
  (e.g. if it's egress firewall);

* Another way is attach the new version of a program first and only then
  detach the old version. This introduces the time interval when two
  versions of same program are working, what may not be acceptable if a
  program is not idempotent. It also imposes additional burden on
  program developers to make sure that two versions of their program can
  co-exist.

Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace

That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.

Details of the new API:

* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
  descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
  error (EINVAL or EBADF).

* If replace_bpf_fd has valid descriptor of BPF program but such a
  program is not attached to specified cgroup, BPF_PROG_ATTACH will
  return ENOENT.

BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Narkyiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com
2019-12-19 21:22:25 -08:00
Linus Torvalds
1b96a41b42 Merge branch 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "There are several notable changes here:

   - Single thread migrating itself has been optimized so that it
     doesn't need threadgroup rwsem anymore.

   - Freezer optimization to avoid unnecessary frozen state changes.

   - cgroup ID unification so that cgroup fs ino is the only unique ID
     used for the cgroup and can be used to directly look up live
     cgroups through filehandle interface on 64bit ino archs. On 32bit
     archs, cgroup fs ino is still the only ID in use but it is only
     unique when combined with gen.

   - selftest and other changes"

* 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
  writeback: fix -Wformat compilation warnings
  docs: cgroup: mm: Fix spelling of "list"
  cgroup: fix incorrect WARN_ON_ONCE() in cgroup_setup_root()
  cgroup: use cgrp->kn->id as the cgroup ID
  kernfs: use 64bit inos if ino_t is 64bit
  kernfs: implement custom exportfs ops and fid type
  kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()
  kernfs: convert kernfs_node->id from union kernfs_node_id to u64
  kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes
  kernfs: use dumber locking for kernfs_find_and_get_node_by_ino()
  netprio: use css ID instead of cgroup ID
  writeback: use ino_t for inodes in tracepoints
  kernfs: fix ino wrap-around detection
  kselftests: cgroup: Avoid the reuse of fd after it is deallocated
  cgroup: freezer: don't change task and cgroups status unnecessarily
  cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency
  cgroup: remove cgroup_enable_task_cg_lists() optimization
  cgroup: pids: use atomic64_t for pids->limit
  selftests: cgroup: Run test_core under interfering stress
  selftests: cgroup: Add task migration tests
  ...
2019-11-25 19:23:46 -08:00