Commit Graph

665 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
d221da1d6f Merge d04937ae94 ("x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT") into android12-5.10-lts
Steps on the way to 5.10.105

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I76951de21f6efca47dab5f20ad20d588f46729d0
2022-03-14 19:56:31 +01:00
Josh Poimboeuf
afc2d635b5 x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
commit 44a3918c8245ab10c6c9719dd12e7a8d291980d8 upstream.

With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable
to Spectre v2 BHB-based attacks.

When both are enabled, print a warning message and report it in the
'spectre_v2' sysfs vulnerabilities file.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[fllinden@amazon.com: backported to 5.10]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11 12:11:49 +01:00
Greg Kroah-Hartman
a1bb21475e Merge 5.10.90 into android12-5.10-lts
Changes in 5.10.90
	Input: i8042 - add deferred probe support
	Input: i8042 - enable deferred probe quirk for ASUS UM325UA
	tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok().
	tomoyo: use hwight16() in tomoyo_domain_quota_is_ok()
	parisc: Clear stale IIR value on instruction access rights trap
	platform/x86: apple-gmux: use resource_size() with res
	memblock: fix memblock_phys_alloc() section mismatch error
	recordmcount.pl: fix typo in s390 mcount regex
	selinux: initialize proto variable in selinux_ip_postroute_compat()
	scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
	net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
	net/mlx5e: Wrap the tx reporter dump callback to extract the sq
	net/mlx5e: Fix ICOSQ recovery flow for XSK
	udp: using datalen to cap ipv6 udp max gso segments
	selftests: Calculate udpgso segment count without header adjustment
	sctp: use call_rcu to free endpoint
	net/smc: fix using of uninitialized completions
	net: usb: pegasus: Do not drop long Ethernet frames
	net: ag71xx: Fix a potential double free in error handling paths
	net: lantiq_xrx200: fix statistics of received bytes
	NFC: st21nfca: Fix memory leak in device probe and remove
	net/smc: improved fix wait on already cleared link
	net/smc: don't send CDC/LLC message if link not ready
	net/smc: fix kernel panic caused by race of smc_sock
	igc: Fix TX timestamp support for non-MSI-X platforms
	ionic: Initialize the 'lif->dbid_inuse' bitmap
	net/mlx5e: Fix wrong features assignment in case of error
	selftests/net: udpgso_bench_tx: fix dst ip argument
	net/ncsi: check for error return from call to nla_put_u32
	fsl/fman: Fix missing put_device() call in fman_port_probe
	i2c: validate user data in compat ioctl
	nfc: uapi: use kernel size_t to fix user-space builds
	uapi: fix linux/nfc.h userspace compilation errors
	drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
	drm/amdgpu: add support for IP discovery gc_info table v2
	xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
	usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
	usb: mtu3: add memory barrier before set GPD's HWO
	usb: mtu3: fix list_head check warning
	usb: mtu3: set interval of FS intr and isoc endpoint
	binder: fix async_free_space accounting for empty parcels
	scsi: vmw_pvscsi: Set residual data length conditionally
	Input: appletouch - initialize work before device registration
	Input: spaceball - fix parsing of movement data packets
	net: fix use-after-free in tw_timer_handler
	perf script: Fix CPU filtering of a script's switch events
	bpf: Add kconfig knob for disabling unpriv bpf by default
	Linux 5.10.90

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I299d1e939d3b01b5d6f34f7b9ec701d624bbfde3
2022-01-05 13:23:32 +01:00
Daniel Borkmann
8c15bfb36a bpf: Add kconfig knob for disabling unpriv bpf by default
commit 08389d888287c3823f80b0216766b71e17f0aba5 upstream.

Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.

This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.

We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.

Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:40:34 +01:00
Charan Teja Reddy
71fdbce075 FROMLIST: mm: compaction: support triggering of proactive compaction by user
The proactive compaction[1] gets triggered for every 500msec and run
compaction on the node for COMPACTION_HPAGE_ORDER (usually order-9)
pages based on the value set to sysctl.compaction_proactiveness.
Triggering the compaction for every 500msec in search of
COMPACTION_HPAGE_ORDER pages is not needed for all applications,
especially on the embedded system usecases which may have few MB's of
RAM. Enabling the proactive compaction in its state will endup in
running almost always on such systems.

Other side, proactive compaction can still be very much useful for
getting a set of higher order pages in some controllable
manner(controlled by using the sysctl.compaction_proactiveness). Thus on
systems where enabling the proactive compaction always may proove not
required, can trigger the same from user space on write to its sysctl
interface. As an example, say app launcher decide to launch the memory
heavy application which can be launched fast if it gets more higher
order pages thus launcher can prepare the system in advance by
triggering the proactive compaction from userspace.

This triggering of proactive compaction is done on a write to
sysctl.compaction_proactiveness by user.

[1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=facdaa917c4d5a376d09d25865f5a863f906234a

Bug: 186387247
Link: https://lore.kernel.org/patchwork/patch/1438211/
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Change-Id: Ie5208e274b9d7e7354471bb98ff1f10becf93595
2021-06-17 14:15:58 -07:00
Greg Kroah-Hartman
3ccfc59f82 Merge 5.10.24 into android12-5.10-lts
Changes in 5.10.24
	uapi: nfnetlink_cthelper.h: fix userspace compilation error
	powerpc/perf: Fix handling of privilege level checks in perf interrupt context
	powerpc/pseries: Don't enforce MSI affinity with kdump
	ethernet: alx: fix order of calls on resume
	crypto: mips/poly1305 - enable for all MIPS processors
	ath9k: fix transmitting to stations in dynamic SMPS mode
	net: Fix gro aggregation for udp encaps with zero csum
	net: check if protocol extracted by virtio_net_hdr_set_proto is correct
	net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
	net: l2tp: reduce log level of messages in receive path, add counter instead
	can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership
	can: flexcan: assert FRZ bit in flexcan_chip_freeze()
	can: flexcan: enable RX FIFO after FRZ/HALT valid
	can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
	can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode
	tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)
	tcp: add sanity tests to TCP_QUEUE_SEQ
	netfilter: nf_nat: undo erroneous tcp edemux lookup
	netfilter: x_tables: gpf inside xt_find_revision()
	net: always use icmp{,v6}_ndo_send from ndo_start_xmit
	net: phy: fix save wrong speed and duplex problem if autoneg is on
	selftests/bpf: Use the last page in test_snprintf_btf on s390
	selftests/bpf: No need to drop the packet when there is no geneve opt
	selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier
	samples, bpf: Add missing munmap in xdpsock
	libbpf: Clear map_info before each bpf_obj_get_info_by_fd
	ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.
	ibmvnic: always store valid MAC address
	mt76: dma: do not report truncated frames to mac80211
	powerpc/603: Fix protection of user pages mapped with PROT_NONE
	mount: fix mounting of detached mounts onto targets that reside on shared mounts
	cifs: return proper error code in statfs(2)
	Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
	docs: networking: drop special stable handling
	net: dsa: tag_rtl4_a: fix egress tags
	sh_eth: fix TRSCER mask for SH771x
	net: enetc: don't overwrite the RSS indirection table when initializing
	net: enetc: take the MDIO lock only once per NAPI poll cycle
	net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets
	net: enetc: don't disable VLAN filtering in IFF_PROMISC mode
	net: enetc: force the RGMII speed and duplex instead of operating in inband mode
	net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr
	net: enetc: keep RX ring consumer index in sync with hardware
	net: ethernet: mtk-star-emac: fix wrong unmap in RX handling
	net/mlx4_en: update moderation when config reset
	net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10
	nexthop: Do not flush blackhole nexthops when loopback goes down
	net: sched: avoid duplicates in classes dump
	net: mscc: ocelot: properly reject destination IP keys in VCAP IS1
	net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10
	net: usb: qmi_wwan: allow qmimux add/del with master up
	netdevsim: init u64 stats for 32bit hardware
	cipso,calipso: resolve a number of problems with the DOI refcounts
	net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII
	stmmac: intel: Fixes clock registration error seen for multiple interfaces
	net: lapbether: Remove netif_start_queue / netif_stop_queue
	net: davicom: Fix regulator not turned off on failed probe
	net: davicom: Fix regulator not turned off on driver removal
	net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
	net: qrtr: fix error return code of qrtr_sendmsg()
	s390/qeth: fix memory leak after failed TX Buffer allocation
	r8169: fix r8168fp_adjust_ocp_cmd function
	ixgbe: fail to create xfrm offload of IPsec tunnel mode SA
	tools/resolve_btfids: Fix build error with older host toolchains
	perf build: Fix ccache usage in $(CC) when generating arch errno table
	net: stmmac: stop each tx channel independently
	net: stmmac: fix watchdog timeout during suspend/resume stress test
	net: stmmac: fix wrongly set buffer2 valid when sph unsupport
	ethtool: fix the check logic of at least one channel for RX/TX
	net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused
	selftests: forwarding: Fix race condition in mirror installation
	mlxsw: spectrum_ethtool: Add an external speed to PTYS register
	perf traceevent: Ensure read cmdlines are null terminated.
	perf report: Fix -F for branch & mem modes
	net: hns3: fix query vlan mask value error for flow director
	net: hns3: fix bug when calculating the TCAM table info
	s390/cio: return -EFAULT if copy_to_user() fails again
	bnxt_en: reliably allocate IRQ table on reset to avoid crash
	gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
	gpiolib: acpi: Allow to find GpioInt() resource by name and index
	gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
	gpio: fix gpio-device list corruption
	drm/compat: Clear bounce structures
	drm/amd/display: Add a backlight module option
	drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp()
	drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()
	drm/amd/pm: bug fix for pcie dpm
	drm/amdgpu/display: simplify backlight setting
	drm/amdgpu/display: don't assert in set backlight function
	drm/amdgpu/display: handle aux backlight in backlight_get_brightness
	drm/shmem-helper: Check for purged buffers in fault handler
	drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
	drm: Use USB controller's DMA mask when importing dmabufs
	drm: meson_drv add shutdown function
	drm/shmem-helpers: vunmap: Don't put pages for dma-buf
	drm/i915: Wedge the GPU if command parser setup fails
	s390/cio: return -EFAULT if copy_to_user() fails
	s390/crypto: return -EFAULT if copy_to_user() fails
	qxl: Fix uninitialised struct field head.surface_id
	sh_eth: fix TRSCER mask for R7S9210
	media: usbtv: Fix deadlock on suspend
	media: rkisp1: params: fix wrong bits settings
	media: v4l: vsp1: Fix uif null pointer access
	media: v4l: vsp1: Fix bru null pointer access
	media: rc: compile rc-cec.c into rc-core
	cifs: fix credit accounting for extra channel
	net: hns3: fix error mask definition of flow director
	s390/qeth: don't replace a fully completed async TX buffer
	s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state
	s390/qeth: improve completion of pending TX buffers
	s390/qeth: fix notification for pending buffers during teardown
	net: dsa: implement a central TX reallocation procedure
	net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
	net: dsa: trailer: don't allocate additional memory for padding/tagging
	net: dsa: tag_qca: let DSA core deal with TX reallocation
	net: dsa: tag_ocelot: let DSA core deal with TX reallocation
	net: dsa: tag_mtk: let DSA core deal with TX reallocation
	net: dsa: tag_lan9303: let DSA core deal with TX reallocation
	net: dsa: tag_edsa: let DSA core deal with TX reallocation
	net: dsa: tag_brcm: let DSA core deal with TX reallocation
	net: dsa: tag_dsa: let DSA core deal with TX reallocation
	net: dsa: tag_gswip: let DSA core deal with TX reallocation
	net: dsa: tag_ar9331: let DSA core deal with TX reallocation
	net: dsa: tag_mtk: fix 802.1ad VLAN egress
	enetc: Fix unused var build warning for CONFIG_OF
	net: enetc: initialize RFS/RSS memories for unused ports too
	ath11k: peer delete synchronization with firmware
	ath11k: start vdev if a bss peer is already created
	ath11k: fix AP mode for QCA6390
	i2c: rcar: faster irq code to minimize HW race condition
	i2c: rcar: optimize cacheline to minimize HW race condition
	scsi: ufs: WB is only available on LUN #0 to #7
	udf: fix silent AED tagLocation corruption
	iommu/vt-d: Clear PRQ overflow only when PRQ is empty
	mmc: mxs-mmc: Fix a resource leak in an error handling path in 'mxs_mmc_probe()'
	mmc: mediatek: fix race condition between msdc_request_timeout and irq
	mmc: sdhci-iproc: Add ACPI bindings for the RPi
	Platform: OLPC: Fix probe error handling
	powerpc/pci: Add ppc_md.discover_phbs()
	spi: stm32: make spurious and overrun interrupts visible
	powerpc: improve handling of unrecoverable system reset
	powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
	HID: logitech-dj: add support for the new lightspeed connection iteration
	powerpc/64: Fix stack trace not displaying final frame
	iommu/amd: Fix performance counter initialization
	clk: qcom: gdsc: Implement NO_RET_PERIPH flag
	sparc32: Limit memblock allocation to low memory
	sparc64: Use arch_validate_flags() to validate ADI flag
	Input: applespi - don't wait for responses to commands indefinitely.
	PCI: xgene-msi: Fix race in installing chained irq handler
	PCI: mediatek: Add missing of_node_put() to fix reference leak
	drivers/base: build kunit tests without structleak plugin
	PCI/LINK: Remove bandwidth notification
	ext4: don't try to processed freed blocks until mballoc is initialized
	kbuild: clamp SUBLEVEL to 255
	PCI: Fix pci_register_io_range() memory leak
	i40e: Fix memory leak in i40e_probe
	kasan: fix memory corruption in kasan_bitops_tags test
	s390/smp: __smp_rescan_cpus() - move cpumask away from stack
	drivers/base/memory: don't store phys_device in memory blocks
	sysctl.c: fix underflow value setting risk in vm_table
	scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
	scsi: target: core: Add cmd length set before cmd complete
	scsi: target: core: Prevent underflow for service actions
	clk: qcom: gpucc-msm8998: Add resets, cxc, fix flags on gpu_gx_gdsc
	mmc: sdhci: Update firmware interface API
	ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler
	ARM: assembler: introduce adr_l, ldr_l and str_l macros
	ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
	ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk
	ALSA: hda/hdmi: Cancel pending works before suspend
	ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5
	ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support
	ALSA: hda: Drop the BATCH workaround for AMD controllers
	ALSA: hda: Flush pending unsolicited events before suspend
	ALSA: hda: Avoid spurious unsol event handling during S3/S4
	ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
	ALSA: usb-audio: Apply the control quirk to Plantronics headsets
	ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()
	ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe
	ALSA: usb-audio: fix use after free in usb_audio_disconnect
	Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")
	block: Discard page cache of zone reset target range
	block: Try to handle busy underlying device on discard
	arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
	arm64: mte: Map hotplugged memory as Normal Tagged
	arm64: perf: Fix 64-bit event counter read truncation
	s390/dasd: fix hanging DASD driver unbind
	s390/dasd: fix hanging IO request during DASD driver unbind
	software node: Fix node registration
	xen/events: reset affinity of 2-level event when tearing it down
	mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants
	mmc: core: Fix partition switch time for eMMC
	mmc: cqhci: Fix random crash when remove mmc module/card
	cifs: do not send close in compound create+close requests
	Goodix Fingerprint device is not a modem
	USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
	USB: gadget: u_ether: Fix a configfs return code
	usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
	usb: gadget: f_uac1: stop playback on function disable
	usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
	usb: dwc3: qcom: add URS Host support for sdm845 ACPI boot
	usb: dwc3: qcom: add ACPI device id for sc8180x
	usb: dwc3: qcom: Honor wakeup enabled/disabled state
	USB: usblp: fix a hang in poll() if disconnected
	usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
	usb: xhci: do not perform Soft Retry for some xHCI hosts
	xhci: Improve detection of device initiated wake signal.
	usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
	xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
	USB: serial: io_edgeport: fix memory leak in edge_startup
	USB: serial: ch341: add new Product ID
	USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
	USB: serial: cp210x: add some more GE USB IDs
	usbip: fix stub_dev to check for stream socket
	usbip: fix vhci_hcd to check for stream socket
	usbip: fix vudc to check for stream socket
	usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
	usbip: fix vhci_hcd attach_store() races leading to gpf
	usbip: fix vudc usbip_sockfd_store races leading to gpf
	Revert "serial: max310x: rework RX interrupt handling"
	misc/pvpanic: Export module FDT device table
	misc: fastrpc: restrict user apps from sending kernel RPC messages
	staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
	staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
	staging: rtl8712: unterminated string leads to read overflow
	staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
	staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
	staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
	staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan
	staging: comedi: addi_apci_1032: Fix endian problem for COS sample
	staging: comedi: addi_apci_1500: Fix endian problem for command sample
	staging: comedi: adv_pci1710: Fix endian problem for AI command data
	staging: comedi: das6402: Fix endian problem for AI command data
	staging: comedi: das800: Fix endian problem for AI command data
	staging: comedi: dmm32at: Fix endian problem for AI command data
	staging: comedi: me4000: Fix endian problem for AI command data
	staging: comedi: pcl711: Fix endian problem for AI command data
	staging: comedi: pcl818: Fix endian problem for AI command data
	sh_eth: fix TRSCER mask for R7S72100
	cpufreq: qcom-hw: fix dereferencing freed memory 'data'
	cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
	arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
	SUNRPC: Set memalloc_nofs_save() for sync tasks
	NFS: Don't revalidate the directory permissions on a lookup failure
	NFS: Don't gratuitously clear the inode cache when lookup failed
	NFSv4.2: fix return value of _nfs4_get_security_label()
	block: rsxx: fix error return code of rsxx_pci_probe()
	nvme-fc: fix racing controller reset and create association
	configfs: fix a use-after-free in __configfs_open_file
	arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
	perf/core: Flush PMU internal buffers for per-CPU events
	perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
	hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
	powerpc/64s/exception: Clean up a missed SRR specifier
	seqlock,lockdep: Fix seqcount_latch_init()
	stop_machine: mark helpers __always_inline
	include/linux/sched/mm.h: use rcu_dereference in in_vfork()
	zram: fix return value on writeback_store
	linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
	sched/membarrier: fix missing local execution of ipi_sync_rq_state()
	efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
	powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
	powerpc: Fix inverted SET_FULL_REGS bitop
	powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
	binfmt_misc: fix possible deadlock in bm_register_write
	x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
	x86/sev-es: Introduce ip_within_syscall_gap() helper
	x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
	x86/entry: Move nmi entry/exit into common code
	x86/sev-es: Correctly track IRQ states in runtime #VC handler
	x86/sev-es: Use __copy_from_user_inatomic()
	x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
	KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
	KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
	KVM: arm64: Fix range alignment when walking page tables
	KVM: arm64: Avoid corrupting vCPU context register in guest exit
	KVM: arm64: nvhe: Save the SPE context early
	KVM: arm64: Reject VM creation when the default IPA size is unsupported
	KVM: arm64: Fix exclusive limit for IPA size
	mm/userfaultfd: fix memory corruption due to writeprotect
	mm/madvise: replace ptrace attach requirement for process_madvise
	KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
	mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
	xen/events: don't unmask an event channel when an eoi is pending
	xen/events: avoid handling the same event on two cpus at the same time
	KVM: arm64: Fix nVHE hyp panic host context restore
	RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
	Linux 5.10.24

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie53a3c1963066a18d41357b6be41cff00690bd40
2021-03-19 09:42:56 +01:00
Lin Feng
f49bdac3e7 sysctl.c: fix underflow value setting risk in vm_table
[ Upstream commit 3b3376f222e3ab58367d9dd405cafd09d5e37b7c ]

Apart from subsystem specific .proc_handler handler, all ctl_tables with
extra1 and extra2 members set should use proc_dointvec_minmax instead of
proc_dointvec, or the limit set in extra* never work and potentially echo
underflow values(negative numbers) is likely make system unstable.

Especially vfs_cache_pressure and zone_reclaim_mode, -1 is apparently not
a valid value, but we can set to them.  And then kernel may crash.

# echo -1 > /proc/sys/vm/vfs_cache_pressure

Link: https://lkml.kernel.org/r/20201223105535.2875-1-linf@wangsu.com
Signed-off-by: Lin Feng <linf@wangsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17 17:06:25 +01:00
Greg Kroah-Hartman
ee385f5df9 Merge 5.9-rc6 into android-mainline
Linux 5.9-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3bccdbb773bfc2c604742e6ff5983bf0b61ba0b5
2020-09-21 12:13:45 +02:00
Linus Torvalds
5ef64cc898 mm: allow a controlled amount of unfairness in the page lock
Commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.

That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.

It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.

Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to.  But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.

There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.

The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.

This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks.  And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.

Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
Reported-and-tested-by: Michael Larabel <Michael@michaellarabel.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-17 10:26:41 -07:00
Greg Kroah-Hartman
3d3ef2a059 Merge 5.9-rc4 into android-mainline
Linux 5.9-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3d041935cae5e8f3421edcdee4892f17e2c776ad
2020-09-07 09:24:58 +02:00
Tobias Klauser
7787b6fc93 bpf, sysctl: Let bpf_stats_handler take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of bpf_stats_handler to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/sysctl.c:226:49: warning: incorrect type in argument 3 (different address spaces)
kernel/sysctl.c:226:49:    expected void *
kernel/sysctl.c:226:49:    got void [noderef] __user *buffer
kernel/sysctl.c:2640:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
kernel/sysctl.c:2640:35:    expected int ( [usertype] *proc_handler )( ... )
kernel/sysctl.c:2640:35:    got int ( * )( ... )

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200824142047.22043-1-tklauser@distanz.ch
2020-08-24 21:11:40 -07:00
Greg Kroah-Hartman
418b4bd4a0 Merge dc06fe51d2 ("Merge tag 'rtc-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux") into android-mainline
Steps on the way to 5.9-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iceded779988ff472863b7e1c54e22a9fa6383a30
2020-08-13 09:09:55 +02:00
Nitin Gupta
d34c0a7599 mm: use unsigned types for fragmentation score
Proactive compaction uses per-node/zone "fragmentation score" which is
always in range [0, 100], so use unsigned type of these scores as well as
for related constants.

Signed-off-by: Nitin Gupta <nigupta@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: http://lkml.kernel.org/r/20200618010319.13159-1-nigupta@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:56 -07:00
Nitin Gupta
facdaa917c mm: proactive compaction
For some applications, we need to allocate almost all memory as hugepages.
However, on a running system, higher-order allocations can fail if the
memory is fragmented.  Linux kernel currently does on-demand compaction as
we request more hugepages, but this style of compaction incurs very high
latency.  Experiments with one-time full memory compaction (followed by
hugepage allocations) show that kernel is able to restore a highly
fragmented memory state to a fairly compacted memory state within <1 sec
for a 32G system.  Such data suggests that a more proactive compaction can
help us allocate a large fraction of memory as hugepages keeping
allocation latencies low.

For a more proactive compaction, the approach taken here is to define a
new sysctl called 'vm.compaction_proactiveness' which dictates bounds for
external fragmentation which kcompactd tries to maintain.

The tunable takes a value in range [0, 100], with a default of 20.

Note that a previous version of this patch [1] was found to introduce too
many tunables (per-order extfrag{low, high}), but this one reduces them to
just one sysctl.  Also, the new tunable is an opaque value instead of
asking for specific bounds of "external fragmentation", which would have
been difficult to estimate.  The internal interpretation of this opaque
value allows for future fine-tuning.

Currently, we use a simple translation from this tunable to [low, high]
"fragmentation score" thresholds (low=100-proactiveness, high=low+10%).
The score for a node is defined as weighted mean of per-zone external
fragmentation.  A zone's present_pages determines its weight.

To periodically check per-node score, we reuse per-node kcompactd threads,
which are woken up every 500 milliseconds to check the same.  If a node's
score exceeds its high threshold (as derived from user-provided
proactiveness value), proactive compaction is started until its score
reaches its low threshold value.  By default, proactiveness is set to 20,
which implies threshold values of low=80 and high=90.

This patch is largely based on ideas from Michal Hocko [2].  See also the
LWN article [3].

Performance data
================

System: x64_64, 1T RAM, 80 CPU threads.
Kernel: 5.6.0-rc3 + this patch

echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Before starting the driver, the system was fragmented from a userspace
program that allocates all memory and then for each 2M aligned section,
frees 3/4 of base pages using munmap.  The workload is mainly anonymous
userspace pages, which are easy to move around.  I intentionally avoided
unmovable pages in this test to see how much latency we incur when
hugepage allocations hit direct compaction.

1. Kernel hugepage allocation latencies

With the system in such a fragmented state, a kernel driver then allocates
as many hugepages as possible and measures allocation latency:

(all latency values are in microseconds)

- With vanilla 5.6.0-rc3

  percentile latency
  –––––––––– –––––––
	   5    7894
	  10    9496
	  25   12561
	  30   15295
	  40   18244
	  50   21229
	  60   27556
	  75   30147
	  80   31047
	  90   32859
	  95   33799

Total 2M hugepages allocated = 383859 (749G worth of hugepages out of 762G
total free => 98% of free memory could be allocated as hugepages)

- With 5.6.0-rc3 + this patch, with proactiveness=20

sysctl -w vm.compaction_proactiveness=20

  percentile latency
  –––––––––– –––––––
	   5       2
	  10       2
	  25       3
	  30       3
	  40       3
	  50       4
	  60       4
	  75       4
	  80       4
	  90       5
	  95     429

Total 2M hugepages allocated = 384105 (750G worth of hugepages out of 762G
total free => 98% of free memory could be allocated as hugepages)

2. JAVA heap allocation

In this test, we first fragment memory using the same method as for (1).

Then, we start a Java process with a heap size set to 700G and request the
heap to be allocated with THP hugepages.  We also set THP to madvise to
allow hugepage backing of this heap.

/usr/bin/time
 java -Xms700G -Xmx700G -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch

The above command allocates 700G of Java heap using hugepages.

- With vanilla 5.6.0-rc3

17.39user 1666.48system 27:37.89elapsed

- With 5.6.0-rc3 + this patch, with proactiveness=20

8.35user 194.58system 3:19.62elapsed

Elapsed time remains around 3:15, as proactiveness is further increased.

Note that proactive compaction happens throughout the runtime of these
workloads.  The situation of one-time compaction, sufficient to supply
hugepages for following allocation stream, can probably happen for more
extreme proactiveness values, like 80 or 90.

In the above Java workload, proactiveness is set to 20.  The test starts
with a node's score of 80 or higher, depending on the delay between the
fragmentation step and starting the benchmark, which gives more-or-less
time for the initial round of compaction.  As t he benchmark consumes
hugepages, node's score quickly rises above the high threshold (90) and
proactive compaction starts again, which brings down the score to the low
threshold level (80).  Repeat.

bpftrace also confirms proactive compaction running 20+ times during the
runtime of this Java benchmark.  kcompactd threads consume 100% of one of
the CPUs while it tries to bring a node's score within thresholds.

Backoff behavior
================

Above workloads produce a memory state which is easy to compact.  However,
if memory is filled with unmovable pages, proactive compaction should
essentially back off.  To test this aspect:

- Created a kernel driver that allocates almost all memory as hugepages
  followed by freeing first 3/4 of each hugepage.
- Set proactiveness=40
- Note that proactive_compact_node() is deferred maximum number of times
  with HPAGE_FRAG_CHECK_INTERVAL_MSEC of wait between each check
  (=> ~30 seconds between retries).

[1] https://patchwork.kernel.org/patch/11098289/
[2] https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/
[3] https://lwn.net/Articles/817905/

Signed-off-by: Nitin Gupta <nigupta@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nitin Gupta <ngupta@nitingupta.dev>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Link: http://lkml.kernel.org/r/20200616204527.19185-1-nigupta@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:56 -07:00
Greg Kroah-Hartman
a17a563d16 Merge 449dc8c970 ("Merge tag 'for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply") into android-mainline
Merges along the way to 5.9-rc1

resolves conflicts in:
	Documentation/ABI/testing/sysfs-class-power
	drivers/power/supply/power_supply_sysfs.c
	fs/crypto/inline_crypt.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia087834f54fb4e5269d68c3c404747ceed240701
2020-08-08 13:07:20 +02:00
Feng Tang
56f3547bfa mm: adjust vm_committed_as_batch according to vm overcommit policy
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Greg Kroah-Hartman
00d6a8a7ee Merge e4cbce4d13 ("Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Baby steps for 5.9-rc1

Resolves some kernel/sched/ merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I88cf5411ac7251f9795d9c50cb18b0df5bf0bcd6
2020-08-07 14:17:39 +02:00
Qais Yousef
13685c4a08 sched/uclamp: Add a new sysctl to control RT default boost value
RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.

This is also referred to as 'the default boost value of RT tasks'.

See commit 1a00d99997 ("sched/uclamp: Set default clamps for RT tasks").

On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.

For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.

Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.

They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.

The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.

Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.

To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.

I anticipate this to be mostly in the form of modifying the init script
of a particular device.

To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.

Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.

With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.

[1] 804d402fb6: ("sched/rt: Make RT capacity-aware")

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
2020-07-29 13:51:47 +02:00
Greg Kroah-Hartman
a253db8915 Merge ad57a1022f ("Merge tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") into android-mainline
Steps on the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4
2020-06-24 17:54:12 +02:00
Greg Kroah-Hartman
1ec3464acb Merge ee01c4d72a ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
Steps along the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6cca4fa48322228c8182201d68dc05f9b72cfc50
2020-06-22 15:13:57 +02:00
Greg Kroah-Hartman
8a8d41512f Merge cb8e59cc87 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline
Steps along the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I280c0a50b5e137596b1c327759c6a18675908179
2020-06-22 14:58:18 +02:00
Greg Kroah-Hartman
035f08016d Merge 039aeb9deb ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") into android-mainline
Baby steps on the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5962e12546d3d215c73c3d74b00ad6263d96f64e
2020-06-20 09:49:29 +02:00
Peter Zijlstra
b4098bfc5e sched/deadline: Impose global limits on sched_attr::sched_period
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726161357.397880775@infradead.org
2020-06-15 14:10:04 +02:00
Rafael Aquini
e77132e758 kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted
Users with SYS_ADMIN capability can add arbitrary taint flags to the
running kernel by writing to /proc/sys/kernel/tainted or issuing the
command 'sysctl -w kernel.tainted=...'.  This interface, however, is
open for any integer value and this might cause an invalid set of flags
being committed to the tainted_mask bitset.

This patch introduces a simple way for proc_taint() to ignore any
eventual invalid bit coming from the user input before committing those
bits to the kernel tainted_mask.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/20200512223946.888020-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:56 -07:00
Guilherme G. Piccoli
60c958d8df panic: add sysctl to dump all CPUs backtraces on oops event
Usually when the kernel reaches an oops condition, it's a point of no
return; in case not enough debug information is available in the kernel
splat, one of the last resorts would be to collect a kernel crash dump
and analyze it.  The problem with this approach is that in order to
collect the dump, a panic is required (to kexec-load the crash kernel).
When in an environment of multiple virtual machines, users may prefer to
try living with the oops, at least until being able to properly shutdown
their VMs / finish their important tasks.

This patch implements a way to collect a bit more debug details when an
oops event is reached, by printing all the CPUs backtraces through the
usage of NMIs (on architectures that support that).  The sysctl added
(and documented) here was called "oops_all_cpu_backtrace", and when set
will (as the name suggests) dump all CPUs backtraces.

Far from ideal, this may be the last option though for users that for
some reason cannot panic on oops.  Most of times oopses are clear enough
to indicate the kernel portion that must be investigated, but in virtual
environments it's possible to observe hypervisor/KVM issues that could
lead to oopses shown in other guests CPUs (like virtual APIC crashes).
This patch hence aims to help debug such complex issues without
resorting to kdump.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:56 -07:00
Guilherme G. Piccoli
0ec9dc9bcb kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected
Commit 401c636a0e ("kernel/hung_task.c: show all hung tasks before
panic") introduced a change in that we started to show all CPUs
backtraces when a hung task is detected _and_ the sysctl/kernel
parameter "hung_task_panic" is set.  The idea is good, because usually
when observing deadlocks (that may lead to hung tasks), the culprit is
another task holding a lock and not necessarily the task detected as
hung.

The problem with this approach is that dumping backtraces is a slightly
expensive task, specially printing that on console (and specially in
many CPU machines, as servers commonly found nowadays).  So, users that
plan to collect a kdump to investigate the hung tasks and narrow down
the deadlock definitely don't need the CPUs backtrace on dmesg/console,
which will delay the panic and pollute the log (crash tool would easily
grab all CPUs traces with 'bt -a' command).

Also, there's the reciprocal scenario: some users may be interested in
seeing the CPUs backtraces but not have the system panic when a hung
task is detected.  The current approach hence is almost as embedding a
policy in the kernel, by forcing the CPUs backtraces' dump (only) on
hung_task_panic.

This patch decouples the panic event on hung task from the CPUs
backtraces dump, by creating (and documenting) a new sysctl called
"hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
lockups, that have both a panic and an "all_cpu_backtrace" sysctl to
allow individual control.  The new mechanism for dumping the CPUs
backtraces on hung task detection respects "hung_task_warnings" by not
dumping the traces in case there's no warnings left.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/20200327223646.20779-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:56 -07:00
Rafael Aquini
db38d5c106 kernel: add panic_on_taint
Analogously to the introduction of panic_on_warn, this patch introduces
a kernel option named panic_on_taint in order to provide a simple and
generic way to stop execution and catch a coredump when the kernel gets
tainted by any given flag.

This is useful for debugging sessions as it avoids having to rebuild the
kernel to explicitly add calls to panic() into the code sites that
introduce the taint flags of interest.

For instance, if one is interested in proceeding with a post-mortem
analysis at the point a given code path is hitting a bad page (i.e.
unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
by rebooting the kernel with 'panic_on_taint=0x20' amended to the
command line.

Another, perhaps less frequent, use for this option would be as a means
for assuring a security policy case where only a subset of taints, or no
single taint (in paranoid mode), is allowed for the running system.  The
optional switch 'nousertaint' is handy in this particular scenario, as
it will avoid userspace induced crashes by writes to sysctl interface
/proc/sys/kernel/tainted causing false positive hits for such policies.

[akpm@linux-foundation.org: tweak kernel-parameters.txt wording]

Suggested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:56 -07:00
Linus Torvalds
ee01c4d72a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
2020-06-03 20:24:15 -07:00
Johannes Weiner
c843966c55 mm: allow swappiness that prefers reclaiming anon over the file workingset
With the advent of fast random IO devices (SSDs, PMEM) and in-memory swap
devices such as zswap, it's possible for swap to be much faster than
filesystems, and for swapping to be preferable over thrashing filesystem
caches.

Allow setting swappiness - which defines the rough relative IO cost of
cache misses between page cache and swap-backed pages - to reflect such
situations by making the swap-preferred range configurable.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/20200520232525.798933-4-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:48 -07:00
Linus Torvalds
cb8e59cc87 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

 2) Add GSO partial support to igc, from Sasha Neftin.

 3) Several cleanups and improvements to r8169 from Heiner Kallweit.

 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

 5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

 8) Add sriov and vf support to hinic, from Luo bin.

 9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
  selftests: net: ip_defrag: ignore EPERM
  net_failover: fixed rollback in net_failover_open()
  Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
  Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
  vmxnet3: allow rx flow hash ops only when rss is enabled
  hinic: add set_channels ethtool_ops support
  selftests/bpf: Add a default $(CXX) value
  tools/bpf: Don't use $(COMPILE.c)
  bpf, selftests: Use bpf_probe_read_kernel
  s390/bpf: Use bcr 0,%0 as tail call nop filler
  s390/bpf: Maintain 8-byte stack alignment
  selftests/bpf: Fix verifier test
  selftests/bpf: Fix sample_cnt shared between two threads
  bpf, selftests: Adapt cls_redirect to call csum_level helper
  bpf: Add csum_level helper for fixing up csum levels
  bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
  sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
  crypto/chtls: IPv6 support for inline TLS
  Crypto/chcr: Fixes a coccinile check error
  Crypto/chcr: Fixes compilations warnings
  ...
2020-06-03 16:27:18 -07:00
Xiaoming Ni
b6522fa409 parisc: add sysctl file interface panic_on_stackoverflow
The variable sysctl_panic_on_stackoverflow is used in
arch/parisc/kernel/irq.c and arch/x86/kernel/irq_32.c, but the sysctl file
interface panic_on_stackoverflow only exists on x86.

Add sysctl file interface panic_on_stackoverflow for parisc

Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2020-05-10 22:47:35 +02:00
Arnd Bergmann
5447e8e01e sysctl: Fix unused function warning
The newly added bpf_stats_handler function has the wrong #ifdef
check around it, leading to an unused-function warning when
CONFIG_SYSCTL is disabled:

kernel/sysctl.c:205:12: error: unused function 'bpf_stats_handler' [-Werror,-Wunused-function]
static int bpf_stats_handler(struct ctl_table *table, int write,

Fix the check to match the reference.

Fixes: d46edd671a ("bpf: Sharing bpf runtime stats with BPF_ENABLE_STATS")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200505140734.503701-1-arnd@arndb.de
2020-05-05 11:59:32 -07:00
Song Liu
d46edd671a bpf: Sharing bpf runtime stats with BPF_ENABLE_STATS
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
Typical userspace tools use kernel.bpf_stats_enabled as follows:

  1. Enable kernel.bpf_stats_enabled;
  2. Check program run_time_ns;
  3. Sleep for the monitoring period;
  4. Check program run_time_ns again, calculate the difference;
  5. Disable kernel.bpf_stats_enabled.

The problem with this approach is that only one userspace tool can toggle
this sysctl. If multiple tools toggle the sysctl at the same time, the
measurement may be inaccurate.

To fix this problem while keep backward compatibility, introduce a new
bpf command BPF_ENABLE_STATS. On success, this command enables stats and
returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently,
only one type, BPF_STATS_RUN_TIME, is supported. We can extend the
command to support other types of stats in the future.

With BPF_ENABLE_STATS, user space tool would have the following flow:

  1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid;
  2. Check program run_time_ns;
  3. Sleep for the monitoring period;
  4. Check program run_time_ns again, calculate the difference;
  5. Close the fd.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200430071506.1408910-2-songliubraving@fb.com
2020-05-01 10:36:32 -07:00
Christoph Hellwig
32927393dc sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:40 -04:00
Christoph Hellwig
f461d2dcd5 sysctl: avoid forward declarations
Move the sysctl tables to the end of the file to avoid lots of pointless
forward declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:26 -04:00
Christoph Hellwig
2374c09b1c sysctl: remove all extern declaration from sysctl.c
Extern declarations in .c files are a bad style and can lead to
mismatches.  Use existing definitions in headers where they exist,
and otherwise move the external declarations to suitable header
files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:06:53 -04:00
Christoph Hellwig
26363af564 mm: remove watermark_boost_factor_sysctl_handler
watermark_boost_factor_sysctl_handler is just a pointless wrapper for
proc_dointvec_minmax, so remove it and use proc_dointvec_minmax
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:06:52 -04:00
Greg Kroah-Hartman
47627513ce Merge e109f50607 ("Merge tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux") into android-mainline
Baby steps on the way to 5.7-rc1

Change-Id: I136ebb5242e3499873dcd5f5178ad7f68512d11c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-04-07 16:21:10 +02:00
Greg Kroah-Hartman
6aea7b7129 Merge 1a323ea535 ("x86: get rid of 'errret' argument to __get_user_xyz() macross") into android-mainline
In a quest to divide up the 5.7-rc1 merge chunks into reviewable pieces.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2e5960415348c06e8f10e10cbefb3ee5c3745e73
2020-04-04 12:17:50 +02:00
Sebastian Andrzej Siewior
6923aa0d8c mm/compaction: Disable compact_unevictable_allowed on RT
Since commit 5bbe3547aa ("mm: allow compaction of unevictable pages")
it is allowed to examine mlocked pages and compact them by default.  On
-RT even minor pagefaults are problematic because it may take a few 100us
to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and issue a warning on RT if
it is changed.

[bigeasy@linutronix.de: v5]
  Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
  Link: http://lkml.kernel.org/r/20200319165536.ovi75tsr2seared4@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Link: http://lkml.kernel.org/r/20200303202225.nhqc3v5gwlb7x6et@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:31 -07:00
Sebastian Andrzej Siewior
964b692daf mm/compaction: really limit compact_unevictable_allowed to 0 and 1
The proc file `compact_unevictable_allowed' should allow 0 and 1 only, the
`extra*' attribues have been set properly but without
proc_dointvec_minmax() as the `proc_handler' the limit will not be
enforced.

Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid
specified range.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Link: http://lkml.kernel.org/r/20200303202054.gsosv7fsx2ma3cic@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:31 -07:00
Dmitry Safonov
eaee41727e sysctl/sysrq: Remove __sysrq_enabled copy
Many embedded boards have a disconnected TTL level serial which can
generate some garbage that can lead to spurious false sysrq detects.

Currently, sysrq can be either completely disabled for serial console
or always disabled (with CONFIG_MAGIC_SYSRQ_SERIAL), since
commit 732dbf3a61 ("serial: do not accept sysrq characters via serial port")

At Arista, we have such boards that can generate BREAK and random
garbage. While disabling sysrq for serial console would solve
the problem with spurious false sysrq triggers, it's also desirable
to have a way to enable sysrq back.

Having the way to enable sysrq was beneficial to debug lockups with
a manual investigation in field and on the other side preventing false
sysrq detections.

As a preparation to add sysrq_toggle_support() call into uart,
remove a private copy of sysrq_enabled from sysctl - it should reflect
the actual status of sysrq.

Furthermore, the private copy isn't correct already in case
sysrq_always_enabled is true. So, remove __sysrq_enabled and use a
getter-helper sysrq_mask() to check sysrq_key_op enabled status.

Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200302175135.269397-2-dima@arista.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-07 09:52:02 +01:00
Greg Kroah-Hartman
5fba1b18cf Merge 5.6-rc3 into android-mainline
Linux 5.6-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I930be7b064a8e084de12640e14a8e8b37873872a
2020-02-24 09:22:04 +01:00
Stephen Kitt
140588bfed s390: remove obsolete ieee_emulation_warnings
s390 math emulation was removed with commit 5a79859ae0 ("s390:
remove 31 bit support"), rendering ieee_emulation_warnings useless.
The code still built because it was protected by CONFIG_MATHEMU, which
was no longer selectable.

This patch removes the sysctl_ieee_emulation_warnings declaration and
the sysctl entry declaration.

Link: https://lkml.kernel.org/r/20200214172628.3598516-1-steve@sk2.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-19 13:51:46 +01:00
Greg Kroah-Hartman
28b159de8e Merge b5f7ab6b1c ("Merge tag 'fs-dedupe-last-block-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux") into android-mainline
Baby steps in the 5.6-rc1 merge cycle to make things easier to review
and debug.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I005e68433be6b1d66bd56d7e1c8f44ab8e78bebe
2020-01-30 07:03:50 +01:00
Lai Jiangshan
b3e627d3d5 rcu: Make PREEMPT_RCU be a modifier to TREE_RCU
Currently PREEMPT_RCU and TREE_RCU are mutually exclusive Kconfig
options.  But PREEMPT_RCU actually specifies a kind of TREE_RCU,
namely a preemptible TREE_RCU. This commit therefore makes PREEMPT_RCU
be a modifer to the TREE_RCU Kconfig option.  This has the benefit of
simplifying several of the #if expressions that formerly needed to
check both, but now need only check one or the other.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:51 -08:00
Greg Kroah-Hartman
d3a196a371 Merge 5.5-rc1 into android-mainline
Linux 5.5-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237
2019-12-09 12:12:00 +01:00
Johannes Weiner
204cb79ad4 kernel: sysctl: make drop_caches write-only
Currently, the drop_caches proc file and sysctl read back the last value
written, suggesting this is somehow a stateful setting instead of a
one-time command.  Make it write-only, like e.g.  compact_memory.

While mitigating a VM problem at scale in our fleet, there was confusion
about whether writing to this file will permanently switch the kernel into
a non-caching mode.  This influences the decision making in a tense
situation, where tens of people are trying to fix tens of thousands of
affected machines: Do we need a rollback strategy?  What are the
performance implications of operating in a non-caching state for several
days?  It also caused confusion when the kernel team said we may need to
write the file several times to make sure it's effective ("But it already
reads back 3?").

Link: http://lkml.kernel.org/r/20191031221602.9375-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:07 -08:00
Greg Kroah-Hartman
9be46ff3b6 Merge 5.4-rc4 into android-mainline
Linux 5.4-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0edccd72fad8b6443b24c8c1005b66d6b8f532ce
2019-10-26 19:24:41 +02:00
Helge Deller
b67114db64 parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define
Signed-off-by: Helge Deller <deller@gmx.de>
2019-10-14 21:43:54 +02:00