154900b51bc320a4361dfa0de0302e7056cd6a44
272 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e79e029826 |
Merge android-5.4.5 (9cdc723) into msm-5.4
* refs/heads/tmp-9cdc723:
Revert "usb: dwc3: gadget: Fix logical condition"
Revert "FROMLIST: scsi: ufs-qcom: Adjust bus bandwidth voting and unvoting"
Linux 5.4.5
r8169: add missing RX enabling for WoL on RTL8125
net: mscc: ocelot: unregister the PTP clock on deinit
ionic: keep users rss hash across lif reset
xdp: obtain the mem_id mutex before trying to remove an entry.
page_pool: do not release pool until inflight == 0.
net/mlx5e: ethtool, Fix analysis of speed setting
net/mlx5e: Fix translation of link mode into speed
net/mlx5e: Fix freeing flow with kfree() and not kvfree()
net/mlx5e: Fix SFF 8472 eeprom length
act_ct: support asymmetric conntrack
net/mlx5e: Fix TXQ indices to be sequential
net: Fixed updating of ethertype in skb_mpls_push()
hsr: fix a NULL pointer dereference in hsr_dev_xmit()
Fixed updating of ethertype in function skb_mpls_pop
gre: refetch erspan header from skb->data after pskb_may_pull()
cls_flower: Fix the behavior using port ranges with hw-offload
net: sched: allow indirect blocks to bind to clsact in TC
net: core: rename indirect block ingress cb function
tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
tcp: tighten acceptance of ACKs not matching a child socket
tcp: fix rejected syncookies due to stale timestamps
net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
net: ipv6: add net argument to ip6_dst_lookup_flow
net/mlx5e: Query global pause state before setting prio2buffer
tipc: fix ordering of tipc module init and exit routine
tcp: md5: fix potential overestimation of TCP option space
openvswitch: support asymmetric conntrack
net/tls: Fix return values to avoid ENOTSUPP
net: thunderx: start phy before starting autonegotiation
net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues
net: ethernet: ti: cpsw: fix extra rx interrupt
net: dsa: fix flow dissection on Tx path
net: bridge: deny dev_set_mac_address() when unregistering
mqprio: Fix out-of-bounds access in mqprio_dump
inet: protect against too small mtu values.
ANDROID: add initial ABI whitelist for android-5.4
ANDROID: abi update for 5.4.4
ANDROID: mm: Throttle rss_stat tracepoint
FROMLIST: vsprintf: Inline call to ptr_to_hashval
UPSTREAM: rss_stat: Add support to detect RSS updates of external mm
UPSTREAM: mm: emit tracepoint when RSS changes
Linux 5.4.4
EDAC/ghes: Do not warn when incrementing refcount on 0
r8169: fix rtl_hw_jumbo_disable for RTL8168evl
workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
blk-mq: make sure that line break can be printed
ext4: fix leak of quota reservations
ext4: fix a bug in ext4_wait_for_tail_page_commit
splice: only read in as much information as there is pipe buffer space
rtc: disable uie before setting time and enable after
USB: dummy-hcd: increase max number of devices to 32
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
mm/shmem.c: cast the type of unmap_start to u64
s390/kaslr: store KASLR offset for early dumps
s390/smp,vdso: fix ASCE handling
firmware: qcom: scm: Ensure 'a0' status code is treated as signed
ext4: work around deleting a file with i_nlink == 0 safely
mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
mfd: rk808: Fix RK818 ID template
mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings
powerpc: Fix vDSO clock_getres()
powerpc: Avoid clang warnings around setjmp and longjmp
omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
omap: pdata-quirks: revert pandora specific gpiod additions
iio: ad7949: fix channels mixups
iio: ad7949: kill pointless "readback"-handling code
Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
scsi: qla2xxx: Fix a dma_pool_free() call
scsi: qla2xxx: Fix SRB leak on switch command timeout
reiserfs: fix extended attributes on the root directory
ext4: Fix credit estimate for final inode freeing
quota: fix livelock in dquot_writeback_dquots
seccomp: avoid overflow in implicit constant conversion
ext2: check err when partial != NULL
quota: Check that quota is not dirty before release
video/hdmi: Fix AVI bar unpack
powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
powerpc: Allow flush_icache_range to work across ranges >4GB
powerpc/xive: Prevent page fault issues in the machine crash handler
powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
coresight: Serialize enabling/disabling a link device.
stm class: Lose the protocol driver when dropping its reference
ppdev: fix PPGETTIME/PPSETTIME ioctls
RDMA/core: Fix ib_dma_max_seg_size()
ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
pinctrl: samsung: Fix device node refcount leaks in init code
pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
pinctrl: samsung: Fix device node refcount leaks in Exynos wakeup controller init
pinctrl: samsung: Add of_node_put() before return in error path
pinctrl: armada-37xx: Fix irq mask access in armada_37xx_irq_set_type()
pinctrl: rza2: Fix gpio name typos
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
ACPI: EC: Rework flushing of pending work
ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
ACPI: OSL: only free map once in osl.c
ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge
ACPI: LPSS: Add dmi quirk for skipping _DEP check for some device-links
ACPI: LPSS: Add LNXVIDEO -> BYT I2C1 to lpss_device_links
ACPI: LPSS: Add LNXVIDEO -> BYT I2C7 to lpss_device_links
ACPI / utils: Move acpi_dev_get_first_match_dev() under CONFIG_ACPI
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
ALSA: oxfw: fix return value in error path of isochronous resources reservation
ALSA: fireface: fix return value in error path of isochronous resources reservation
cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
PM / devfreq: Lock devfreq in trans_stat_show
intel_th: pci: Add Tiger Lake CPU support
intel_th: pci: Add Ice Lake CPU support
intel_th: Fix a double put_device() in error path
powerpc/perf: Disable trace_imc pmu
drm/panfrost: Open/close the perfcnt BO
perf tests: Fix out of bounds memory access
erofs: zero out when listxattr is called with no xattr
cpuidle: use first valid target residency as poll time
cpuidle: teo: Fix "early hits" handling for disabled idle states
cpuidle: teo: Consider hits and misses metrics of disabled states
cpuidle: teo: Rename local variable in teo_select()
cpuidle: teo: Ignore disabled idle states that are too deep
cpuidle: Do not unset the driver if it is there already
media: cec.h: CEC_OP_REC_FLAG_ values were swapped
media: radio: wl1273: fix interrupt masking on release
media: bdisp: fix memleak on release
media: vimc: sen: remove unused kthread_sen field
media: hantro: Fix picture order count table enable
media: hantro: Fix motion vectors usage condition
media: hantro: Fix s_fmt for dynamic resolution changes
s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
ar5523: check NULL before memcpy() in ar5523_cmd()
wil6210: check len before memcpy() calls
cgroup: pids: use atomic64_t for pids->limit
blk-mq: avoid sysfs buffer overflow with too many CPU cores
md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
ASoC: fsl_audmix: Add spin lock to protect tdms
ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
ASoC: rt5645: Fixed typo for buddy jack support.
ASoC: rt5645: Fixed buddy jack support.
workqueue: Fix pwq ref leak in rescuer_thread()
workqueue: Fix spurious sanity check failures in destroy_workqueue()
dm zoned: reduce overhead of backing device checks
dm writecache: handle REQ_FUA
hwrng: omap - Fix RNG wait loop timeout
ovl: relax WARN_ON() on rename to self
ovl: fix corner case of non-unique st_dev;st_ino
ovl: fix lookup failure on multi lower squashfs
lib: raid6: fix awk build warnings
rtlwifi: rtl8192de: Fix missing enable interrupt flag
rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
btrfs: record all roots for rename exchange on a subvol
Btrfs: send, skip backreference walking for extents with many references
btrfs: Remove btrfs_bio::flags member
btrfs: Avoid getting stuck during cyclic writebacks
Btrfs: fix negative subv_writers counter and data space leak after buffered write
Btrfs: fix metadata space leak on fixup worker failure to set range as delalloc
btrfs: use refcount_inc_not_zero in kill_all_nodes
btrfs: use btrfs_block_group_cache_done in update_block_group
btrfs: check page->mapping when loading free space cache
iwlwifi: pcie: fix support for transmitting SKBs with fraglist
usb: typec: fix use after free in typec_register_port()
phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
usb: dwc3: ep0: Clear started flag on completion
usb: dwc3: gadget: Clear started flag for non-IOC
usb: dwc3: gadget: Fix logical condition
usb: dwc3: pci: add ID for the Intel Comet Lake -H variant
virtio-balloon: fix managed page counts when migrating pages between zones
virt_wifi: fix use-after-free in virt_wifi_newlink()
mtd: rawnand: Change calculating of position page containing BBM
mtd: spear_smi: Fix Write Burst mode
brcmfmac: disable PCIe interrupts before bus reset
EDAC/altera: Use fast register IO for S10 IRQs
tpm: Switch to platform_get_irq_optional()
tpm: add check after commands attribs tab allocation
usb: mon: Fix a deadlock in usbmon between mmap and read
usb: core: urb: fix URB structure initialization function
USB: adutux: fix interface sanity check
usb: roles: fix a potential use after free
USB: serial: io_edgeport: fix epic endpoint lookup
USB: idmouse: fix interface sanity checks
USB: atm: ueagle-atm: add missing endpoint check
iio: adc: ad7124: Enable internal reference
iio: adc: ad7606: fix reading unnecessary data from device
iio: imu: inv_mpu6050: fix temperature reporting using bad unit
iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
iio: adis16480: Fix scales factors
iio: imu: st_lsm6dsx: fix ODR check in st_lsm6dsx_write_raw
iio: adis16480: Add debugfs_reg_access entry
ARM: dts: pandora-common: define wl1251 as child node of mmc3
usb: common: usb-conn-gpio: Don't log an error on probe deferral
interconnect: qcom: qcs404: Walk the list safely on node removal
interconnect: qcom: sdm845: Walk the list safely on node removal
xhci: make sure interrupts are restored to correct state
xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
xhci: Increase STS_HALT timeout in xhci_suspend()
xhci: fix USB3 device initiated resume race with roothub autosuspend
xhci: Fix memory leak in xhci_add_in_port()
usb: xhci: only set D3hot for pci device
staging: gigaset: add endpoint-type sanity check
staging: gigaset: fix illegal free on probe errors
staging: gigaset: fix general protection fault on probe
staging: vchiq: call unregister_chrdev_region() when driver registration fails
staging: rtl8712: fix interface sanity check
staging: rtl8188eu: fix interface sanity check
staging: exfat: fix multiple definition error of `rename_file'
binder: fix incorrect calculation for num_valid
usb: host: xhci-tegra: Correct phy enable sequence
usb: Allow USB device to be warm reset in suspended state
USB: documentation: flags on usb-storage versus UAS
USB: uas: heed CAPACITY_HEURISTICS
USB: uas: honor flag to avoid CAPACITY16
media: venus: remove invalid compat_ioctl32 handler
ceph: fix compat_ioctl for ceph_dir_operations
compat_ioctl: add compat_ptr_ioctl()
scsi: qla2xxx: Fix memory leak when sending I/O fails
scsi: qla2xxx: Fix double scsi_done for abort path
scsi: qla2xxx: Fix driver unload hang
scsi: qla2xxx: Do command completion on abort timeout
scsi: zfcp: trace channel log even for FCP command responses
scsi: lpfc: Fix bad ndlp ptr in xri aborted handling
Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
nvme: Namepace identification descriptor list is optional
usb: gadget: pch_udc: fix use after free
usb: gadget: configfs: Fix missing spin_lock_init()
BACKPORT: FROMLIST: scsi: ufs: Export query request interfaces
ANDROID: update abi with unbindable_ports sysctl
BACKPORT: FROMLIST: net: introduce ip_local_unbindable_ports sysctl
ANDROID: update abi for 5.4.3 merge
ANDROID: update abi_gki_aarch64.xml for ion, drm changes
ANDROID: drivers: gpu: drm: export drm_mode_convert_umode symbol
ANDROID: ion: flush cache before exporting non-cached buffers
Linux 5.4.3
kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
perf script: Fix invalid LBR/binary mismatch error
EDAC/ghes: Fix locking and memory barrier issues
watchdog: aspeed: Fix clock behaviour for ast2600
drm/mcde: Fix an error handling path in 'mcde_probe()'
md/raid0: Fix an error message in raid0_make_request()
cpufreq: imx-cpufreq-dt: Correct i.MX8MN's default speed grade value
ALSA: hda - Fix pending unsol events at shutdown
KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
binder: Handle start==NULL in binder_update_page_range()
binder: Prevent repeated use of ->mmap() via NULL mapping
binder: Fix race between mmap() and binder_alloc_print_pages()
Revert "serial/8250: Add support for NI-Serial PXI/PXIe+485 devices"
vcs: prevent write access to vcsu devices
thermal: Fix deadlock in thermal thermal_zone_device_check
iomap: Fix pipe page leakage during splicing
bdev: Refresh bdev size for disks without partitioning
bdev: Factor out bdev revalidation into a common helper
rfkill: allocate static minor
RDMA/qib: Validate ->show()/store() callbacks before calling them
can: ucan: fix non-atomic allocation in completion handler
spi: Fix NULL pointer when setting SPI_CS_HIGH for GPIO CS
spi: Fix SPI_CS_HIGH setting when using native and GPIO CS
spi: atmel: Fix CS high support
spi: stm32-qspi: Fix kernel oops when unbinding driver
spi: spi-fsl-qspi: Clear TDH bits in FLSHCR register
crypto: user - fix memory leak in crypto_reportstat
crypto: user - fix memory leak in crypto_report
crypto: ecdh - fix big endian bug in ECC library
crypto: ccp - fix uninitialized list head
crypto: geode-aes - switch to skcipher for cbc(aes) fallback
crypto: af_alg - cast ki_complete ternary op to int
crypto: atmel-aes - Fix IV handling when req->nbytes < ivsize
crypto: crypto4xx - fix double-free in crypto4xx_destroy_sdr
KVM: x86: Grab KVM's srcu lock when setting nested state
KVM: x86: Remove a spurious export of a static function
KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
KVM: x86: do not modify masked bits of shared MSRs
KVM: arm/arm64: vgic: Don't rely on the wrong pending table
KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated
KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
arm64: dts: exynos: Revert "Remove unneeded address space mapping for soc node"
arm64: Validate tagged addresses in access_ok() called from kernel threads
drm/i810: Prevent underflow in ioctl
drm: damage_helper: Fix race checking plane->state->fb
drm/msm: fix memleak on release
jbd2: Fix possible overflow in jbd2_log_space_left()
kernfs: fix ino wrap-around detection
nfsd: restore NFSv3 ACL support
nfsd: Ensure CLONE persists data and metadata changes to the target file
can: slcan: Fix use-after-free Read in slcan_open
tty: vt: keyboard: reject invalid keycodes
CIFS: Fix SMB2 oplock break processing
CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect
x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
media: rc: mark input device as pointing stick
Input: Fix memory leak in psxpad_spi_probe
coresight: etm4x: Fix input validation for sysfs.
Input: goodix - add upside-down quirk for Teclast X89 tablet
Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
soc: mediatek: cmdq: fixup wrong input order of write api
ALSA: hda: Modify stream stripe mask only when needed
ALSA: hda - Add mute led support for HP ProBook 645 G4
ALSA: pcm: oss: Avoid potential buffer overflows
ALSA: hda/realtek - Fix inverted bass GPIO pin on Acer 8951G
ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236
ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
ALSA: hda/realtek - Enable internal speaker of ASUS UX431FLC
SUNRPC: Avoid RPC delays when exiting suspend
io_uring: ensure req->submit is copied when req is deferred
io_uring: fix missing kmap() declaration on powerpc
fuse: verify attributes
fuse: verify write return
fuse: verify nlink
fuse: fix leak of fuse_io_priv
io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR
io_uring: fix dead-hung for non-iter fixed rw
mwifiex: Re-work support for SDIO HW reset
serial: ifx6x60: add missed pm_runtime_disable
serial: 8250_dw: Avoid double error messaging when IRQ absent
serial: stm32: fix clearing interrupt error flags
serial: serial_core: Perform NULL checks for break_ctl ops
serial: pl011: Fix DMA ->flush_buffer()
tty: serial: msm_serial: Fix flow control
tty: serial: fsl_lpuart: use the sg count from dma_map_sg
serial: 8250-mtk: Use platform_get_irq_optional() for optional irq
usb: gadget: u_serial: add missing port entry locking
staging/octeon: Use stubs for MIPS && !CAVIUM_OCTEON_SOC
mailbox: tegra: Fix superfluous IRQ error message
time: Zero the upper 32-bits in __kernel_timespec on 32-bit
lp: fix sparc64 LPSETTIMEOUT ioctl
sparc64: implement ioremap_uc
perf scripts python: exported-sql-viewer.py: Fix use of TRUE with SQLite
arm64: tegra: Fix 'active-low' warning for Jetson Xavier regulator
arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
rsi: release skb if rsi_prepare_beacon fails
FROMLIST: scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic
FROMLIST: scsi: ufs: Add dev ref clock gating wait time support
FROMLIST: scsi: ufs-qcom: Adjust bus bandwidth voting and unvoting
FROMLIST: scsi: ufs: Remove the check before call setup clock notify vops
FROMLIST: scsi: ufs: set load before setting voltage in regulators
FROMLIST: scsi: ufs: Flush exception event before suspend
FROMLIST: scsi: ufs: Do not rely on prefetched data
FROMLIST: scsi: ufs: Fix up clock scaling
FROMGIT: scsi: ufs: Do not free irq in suspend
FROMGIT: scsi: ufs: Do not clear the DL layer timers
FROMGIT: scsi: ufs: Release clock if DMA map fails
FROMGIT: scsi: ufs: Use DBD setting in mode sense
FROMGIT: scsi: core: Adjust DBD setting in MODE SENSE for caching mode page per LLD
FROMGIT: scsi: ufs: Complete pending requests in host reset and restore path
FROMGIT: scsi: ufs: Avoid messing up the compl_time_stamp of lrbs
FROMGIT: scsi: ufs: Update VCCQ2 and VCCQ min/max voltage hard codes
FROMGIT: scsi: ufs: Recheck bkops level if bkops is disabled
ANDROID: update abi_gki_aarch64.xml for LTO, CFI, and SCS
ANDROID: gki_defconfig: enable LTO, CFI, and SCS
ANDROID: update abi_gki_aarch64.xml for CONFIG_GNSS
ANDROID: cuttlefish_defconfig: Enable CONFIG_GNSS
ANDROID: gki_defconfig: enable HID configs
UPSTREAM: arm64: Validate tagged addresses in access_ok() called from kernel threads
ANDROID: kbuild: limit LTO inlining
ANDROID: kbuild: merge module sections with LTO
ANDROID: f2fs: fix possible merge of unencrypted with encrypted I/O
ANDROID: gki_defconfig: Enable UCLAMP by default
ANDROID: make sure proc mount options are applied
ANDROID: sound: usb: Add helper APIs to enable audio stream
ANDROID: Update ABI representation
ANDROID: Don't base allmodconfig on gki_defconfig
ANDROID: Disable UNWINDER_ORC for allmodconfig
ANDROID: ASoC: Fix 'allmodconfig' build break
Linux 5.4.2
platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
HID: core: check whether Usage Page item is after Usage ID items
crypto: talitos - Fix build error by selecting LIB_DES
Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()"
ext4: add more paranoia checking in ext4_expand_extra_isize handling
r8169: fix resume on cable plug-in
r8169: fix jumbo configuration for RTL8168evl
selftests: pmtu: use -oneline for ip route list cache
tipc: fix link name length check
selftests: bpf: correct perror strings
selftests: bpf: test_sockmap: handle file creation failures gracefully
net/tls: use sg_next() to walk sg entries
net/tls: remove the dead inplace_crypto code
selftests/tls: add a test for fragmented messages
net: skmsg: fix TLS 1.3 crash with full sk_msg
net/tls: free the record on encryption error
net/tls: take into account that bpf_exec_tx_verdict() may free the record
openvswitch: remove another BUG_ON()
openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
sctp: cache netns in sctp_ep_common
slip: Fix use-after-free Read in slip_open
sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
openvswitch: fix flow command message size
net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
net: psample: fix skb_over_panic
net: macb: add missed tasklet_kill
net: dsa: sja1105: fix sja1105_parse_rgmii_delays()
mdio_bus: don't use managed reset-controller
macvlan: schedule bc_work even if error
gve: Fix the queue page list allocated pages count
x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
thunderbolt: Power cycle the router if NVM authentication fails
mei: me: add comet point V device id
mei: bus: prefix device names on bus with the bus name
USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
staging: rtl8723bs: Drop ACPI device ids
staging: rtl8192e: fix potential use after free
staging: wilc1000: fix illegal memory access in wilc_parse_join_bss_param()
usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
driver core: platform: use the correct callback type for bus_find_device
crypto: inside-secure - Fix stability issue with Macchiatobin
net: disallow ancillary data for __sys_{send,recv}msg_file()
net: separate out the msghdr copy from ___sys_{send,recv}msg()
io_uring: async workers should inherit the user creds
ANDROID: Update ABI representation
UPSTREAM: of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
UPSTREAM: of: property: Fix the semantics of of_is_ancestor_of()
UPSTREAM: i2c: of: Populate fwnode in of_i2c_get_board_info()
UPSTREAM: regulator: core: Don't try to remove device links if add failed
UPSTREAM: driver core: Clarify documentation for fwnode_operations.add_links()
ANDROID: Update ABI representation
ANDROID: gki_defconfig: IIO=y
ANDROID: Update ABI representation
ANDROID: ASoC: core - add hostless DAI support
ANDROID: gki_defconfig: =m's applied for virtio configs in arm64
ANDROID: Update ABI representation after 5.4.1 merge
Linux 5.4.1
KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
powerpc/book3s64: Fix link stack flush on context switch
staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error
USB: serial: option: add support for Foxconn T77W968 LTE modules
USB: serial: option: add support for DW5821e with eSIM support
USB: serial: mos7840: fix remote wakeup
USB: serial: mos7720: fix remote wakeup
USB: serial: mos7840: add USB ID to support Moxa UPort 2210
appledisplay: fix error handling in the scheduled work
USB: chaoskey: fix error case of a timeout
usb-serial: cp201x: support Mark-10 digital force gauge
usbip: Fix uninitialized symbol 'nents' in stub_recv_cmd_submit()
usbip: tools: fix fd leakage in the function of read_attr_usbip_status
USBIP: add config dependency for SGL_ALLOC
ALSA: hda - Disable audio component for legacy Nvidia HDMI codecs
media: mceusb: fix out of bounds read in MCE receiver buffer
media: imon: invalid dereference in imon_touch_event
media: cxusb: detect cxusb_ctrl_msg error in query
media: b2c2-flexcop-usb: add sanity checking
media: uvcvideo: Fix error path in control parsing failure
futex: Prevent exit livelock
futex: Provide distinct return value when owner is exiting
futex: Add mutex around futex exit
futex: Provide state handling for exec() as well
futex: Sanitize exit state handling
futex: Mark the begin of futex exit explicitly
futex: Set task::futex_state to DEAD right after handling futex exit
futex: Split futex_mm_release() for exit/exec
exit/exec: Seperate mm_release()
futex: Replace PF_EXITPIDONE with a state
futex: Move futex exit handling into futex code
cpufreq: Add NULL checks to show() and store() methods of cpufreq
media: usbvision: Fix races among open, close, and disconnect
media: usbvision: Fix invalid accesses after device disconnect
media: vivid: Fix wrong locking that causes race conditions on streaming stop
media: vivid: Set vid_cap_streaming and vid_out_streaming to true
ALSA: usb-audio: Fix Scarlett 6i6 Gen 2 port data
ALSA: usb-audio: Fix NULL dereference at parsing BADD
futex: Prevent robust futex exit race
x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel
selftests/x86/mov_ss_trap: Fix the SYSENTER test
x86/entry/32: Fix NMI vs ESPFIX
x86/entry/32: Unwind the ESPFIX stack earlier on exception entry
x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL
x86/entry/32: Use %ss segment where required
x86/entry/32: Fix IRET exception
x86/cpu_entry_area: Add guard page for entry stack on 32bit
x86/pti/32: Size initial_page_table correctly
x86/doublefault/32: Fix stack canaries in the double fault handler
x86/xen/32: Simplify ring check in xen_iret_crit_fixup()
x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout
x86/stackframe/32: Repair 32-bit Xen PV
nbd: prevent memory leak
x86/speculation: Fix redundant MDS mitigation message
x86/speculation: Fix incorrect MDS/TAA mitigation status
x86/insn: Fix awk regexp warnings
md/raid10: prevent access of uninitialized resync_pages offset
Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
Revert "Bluetooth: hci_ll: set operational frequency earlier"
ath10k: restore QCA9880-AR1A (v1) detection
ath10k: Fix HOST capability QMI incompatibility
ath10k: Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe
ath9k_hw: fix uninitialized variable data
Bluetooth: Fix invalid-free in bcsp_close()
ANDROID: gki_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
FROMLIST: crypto: arm64/sha: fix function types
ANDROID: arm64: kvm: disable CFI
ANDROID: arm64: add __nocfi to __apply_alternatives
ANDROID: arm64: add __pa_function
ANDROID: arm64: add __nocfi to functions that jump to a physical address
ANDROID: arm64: bpf: implement arch_bpf_jit_check_func
ANDROID: bpf: validate bpf_func when BPF_JIT is enabled with CFI
ANDROID: add support for Clang's Control Flow Integrity (CFI)
ANDROID: arm64: allow LTO_CLANG and THINLTO to be selected
FROMLIST: arm64: fix alternatives with LLVM's integrated assembler
FROMLIST: arm64: lse: fix LSE atomics with LLVM's integrated assembler
ANDROID: arm64: disable HAVE_ARCH_PREL32_RELOCATIONS with LTO_CLANG
ANDROID: arm64: vdso: disable LTO
ANDROID: irqchip/gic-v3: rename gic_of_init to work around a ThinLTO+CFI bug
ANDROID: soc/tegra: disable ARCH_TEGRA_210_SOC with LTO
ANDROID: init: ensure initcall ordering with LTO
ANDROID: drivers/misc/lkdtm: disable LTO for rodata.o
ANDROID: efi/libstub: disable LTO
ANDROID: scripts/mod: disable LTO for empty.c
ANDROID: kbuild: fix dynamic ftrace with clang LTO
ANDROID: kbuild: add support for Clang LTO
ANDROID: kbuild: add CONFIG_LD_IS_LLD
FROMGIT: driver core: platform: use the correct callback type for bus_find_device
FROMLIST: arm64: implement Shadow Call Stack
FROMLIST: arm64: disable SCS for hypervisor code
FROMLIST: arm64: vdso: disable Shadow Call Stack
FROMLIST: arm64: efi: restore x18 if it was corrupted
FROMLIST: arm64: preserve x18 when CPU is suspended
FROMLIST: arm64: reserve x18 from general allocation with SCS
FROMLIST: arm64: disable function graph tracing with SCS
FROMLIST: scs: add support for stack usage debugging
FROMLIST: scs: add accounting
FROMLIST: add support for Clang's Shadow Call Stack (SCS)
FROMLIST: arm64: kernel: avoid x18 in __cpu_soft_restart
FROMLIST: arm64: kvm: stop treating register x18 as caller save
FROMLIST: arm64/lib: copy_page: avoid x18 register in assembler code
FROMLIST: arm64: mm: avoid x18 in idmap_kpti_install_ng_mappings
ANDROID: clang: update to 10.0.1
ANDROID: update ABI representation
Conflicts:
Documentation/devicetree/bindings
Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
arch/arm64/Kconfig
drivers/firmware/qcom_scm-64.c
drivers/hwtracing/coresight/coresight.c
drivers/scsi/ufs/ufs.h
drivers/scsi/ufs/ufshcd.c
drivers/scsi/ufs/ufshcd.h
drivers/scsi/ufs/unipro.h
drivers/staging/android/ion/heaps/ion_cma_heap.c
drivers/staging/android/ion/heaps/ion_system_heap.c
drivers/usb/dwc3/ep0.c
drivers/usb/dwc3/gadget.c
include/sound/pcm.h
include/sound/soc.h
kernel/exit.c
kernel/sched/core.c
Change-Id: I66ea973ddcafd352ba999a1dc98e04df33397e3b
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
|
||
|
|
75258ab7ca |
mm: vmstat: add pageoutclean
vmstat events currently count pgpgout, but that includes only the writebacks, and not the reclaim of clean pages. Add an event to count clean page evictions. This is helpful to evaluate page thrashing cases. Change-Id: Icfb797877a544a58c289074bdc290dfbc1384514 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> [isaacm@codeaurora.org: Add Kconfig entry to prevent ABI breakages in GKI] Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> |
||
|
|
201ea48219 |
kernel: Add snapshot of changes to support cpu isolation
This snapshot is taken from msm-4.19 as of commit 5debecbe7195
("trace: filter out spurious preemption and IRQs disable traces").
Change-Id: I222aa448ac68f7365065f62dba9db94925da38a0
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
|
||
|
|
7f498a4b7b |
FROMLIST: scs: add accounting
This change adds accounting for the memory allocated for shadow stacks. Bug: 145210207 Change-Id: Iee94c22abefcabb63a3bcd4db8ba952130f30a82 (am from https://lore.kernel.org/patchwork/patch/1149055/) Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> |
||
|
|
93b3a67448 |
mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo
pagetypeinfo_showfree_print is called by zone->lock held in irq mode. This is not really nice because it blocks both any interrupts on that cpu and the page allocator. On large machines this might even trigger the hard lockup detector. Considering the pagetypeinfo is a debugging tool we do not really need exact numbers here. The primary reason to look at the outuput is to see how pageblocks are spread among different migratetypes and low number of pages is much more interesting therefore putting a bound on the number of pages on the free_list sounds like a reasonable tradeoff. The new output will simply tell [...] Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648 instead of Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648 The limit has been chosen arbitrary and it is a subject of a future change should there be a need for that. While we are at it, also drop the zone lock after each free_list iteration which will help with the IRQ and page allocator responsiveness even further as the IRQ lock held time is always bound to those 100k pages. [akpm@linux-foundation.org: tweak comment text, per David Hildenbrand] Link: http://lkml.kernel.org/r/20191025072610.18526-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@suse.de> Cc: Roman Gushchin <guro@fb.com> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
abaed0112c |
mm, vmstat: hide /proc/pagetypeinfo from normal users
/proc/pagetypeinfo is a debugging tool to examine internal page
allocator state wrt to fragmentation. It is not very useful for any
other use so normal users really do not need to read this file.
Waiman Long has noticed that reading this file can have negative side
effects because zone->lock is necessary for gathering data and that a)
interferes with the page allocator and its users and b) can lead to hard
lockups on large machines which have very long free_list.
Reduce both issues by simply not exporting the file to regular users.
Link: http://lkml.kernel.org/r/20191025072610.18526-2-mhocko@kernel.org
Fixes:
|
||
|
|
60fbf0ab5d |
mm,thp: stats for file backed THP
In preparation for non-shmem THP, this patch adds a few stats and exposes them in /proc/meminfo, /sys/bus/node/devices/<node>/meminfo, and /proc/<pid>/task/<tid>/smaps. This patch is mostly a rewrite of Kirill A. Shutemov's earlier version: https://lkml.kernel.org/r/20170126115819.58875-5-kirill.shutemov@linux.intel.com/ Link: http://lkml.kernel.org/r/20190801184244.3169074-5-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
457c899653 |
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
e8277b3b52 |
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
Commit |
||
|
|
d9f7979c92 |
mm: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Link: http://lkml.kernel.org/r/20190122152151.16139-14-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9705bea5f8 |
mm: convert zone->managed_pages to atomic variable
totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. This patch converts zone->managed_pages. Subsequent patches will convert totalram_panges, totalhigh_pages and eventually managed_page_count_lock will be removed. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. Link: http://lkml.kernel.org/r/1542090790-21750-3-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
13c9aaf7fa |
mm/vmstat.c: fix NUMA statistics updates
Scan through the whole array to see if an update is needed. While we're
at it, use sizeof() to be safe against any possible type changes in the
future.
The bug here is that we wouldn't sync per-cpu counters into global ones
if there was an update of numa_stats for higher cpus. Highly
theoretical one though because it is much more probable that zone_stats
are updated so we would refresh anyway. So I wouldn't bother to mark
this for stable, yet something nice to fix.
[mhocko@suse.com: changelog enhancement]
Link: http://lkml.kernel.org/r/1541601517-17282-1-git-send-email-janne.huttunen@nokia.com
Fixes:
|
||
|
|
f0ecf25a09 |
mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size
Having two gigantic arrays that must manually be kept in sync, including ifdefs, isn't exactly robust. To make it easier to catch such issues in the future, add a BUILD_BUG_ON(). Link: http://lkml.kernel.org/r/20181001143138.95119-3-jannh@google.com Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Kemi Wang <kemi.wang@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
68d48e6a2d |
mm: workingset: add vmstat counter for shadow nodes
Make it easier to catch bugs in the shadow node shrinker by adding a counter for the shadow nodes in circulation. [akpm@linux-foundation.org: assert that irqs are disabled, for __inc_lruvec_page_state()] [akpm@linux-foundation.org: s/WARN_ON_ONCE/VM_WARN_ON_ONCE/, per Johannes] Link: http://lkml.kernel.org/r/20181009184732.762-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
1899ad18c6 |
mm: workingset: tell cache transitions from workingset thrashing
Refaults happen during transitions between workingsets as well as in-place thrashing. Knowing the difference between the two has a range of applications, including measuring the impact of memory shortage on the system performance, as well as the ability to smarter balance pressure between the filesystem cache and the swap-backed workingset. During workingset transitions, inactive cache refaults and pushes out established active cache. When that active cache isn't stale, however, and also ends up refaulting, that's bonafide thrashing. Introduce a new page flag that tells on eviction whether the page has been active or not in its lifetime. This bit is then stored in the shadow entry, to classify refaults as transitioning or thrashing. How many page->flags does this leave us with on 32-bit? 20 bits are always page flags 21 if you have an MMU 23 with the zone bits for DMA, Normal, HighMem, Movable 29 with the sparsemem section bits 30 if PAE is enabled 31 with this patch. So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If that's not enough, the system can switch to discontigmem and re-gain the 6 or 7 sparsemem section bits. Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b29940c1ab |
mm: rename and change semantics of nr_indirectly_reclaimable_bytes
The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by
commit
|
||
|
|
58bc4c34d2 |
mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
|
||
|
|
28e2c4bb99 |
mm/vmstat.c: fix outdated vmstat_text
|
||
|
|
28557cc106 |
Revert mm/vmstat.c: fix vmstat_update() preemption BUG
Revert commit
|
||
|
|
fddda2b7b5 |
proc: introduce proc_create_seq{,_data}
Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
||
|
|
7aaf772723 |
mm: don't show nr_indirectly_reclaimable in /proc/vmstat
Don't show nr_indirectly_reclaimable in /proc/vmstat, because there is no need to export this vm counter to userspace, and some changes are expected in reclaimable object accounting, which can alter this counter. Link: http://lkml.kernel.org/r/20180425191422.9159-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
eb59254608 |
mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES
Patch series "indirectly reclaimable memory", v2. This patchset introduces the concept of indirectly reclaimable memory and applies it to fix the issue of when a big number of dentries with external names can significantly affect the MemAvailable value. This patch (of 3): Introduce a concept of indirectly reclaimable memory and adds the corresponding memory counter and /proc/vmstat item. Indirectly reclaimable memory is any sort of memory, used by the kernel (except of reclaimable slabs), which is actually reclaimable, i.e. will be released under memory pressure. The counter is in bytes, as it's not always possible to count such objects in pages. The name contains BYTES by analogy to NR_KERNEL_STACK_KB. Link: http://lkml.kernel.org/r/20180305133743.12746-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c7f26ccfb2 |
mm/vmstat.c: fix vmstat_update() preemption BUG
Attempting to hotplug CPUs with CONFIG_VM_EVENT_COUNTERS enabled can cause vmstat_update() to report a BUG due to preemption not being disabled around smp_processor_id(). Discovered on Ubiquiti EdgeRouter Pro with Cavium Octeon II processor. BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:1/269 caller is vmstat_update+0x50/0xa0 CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted 4.16.0-rc4-Cavium-Octeon-00009-gf83bbd5-dirty #1 Workqueue: mm_percpu_wq vmstat_update Call Trace: show_stack+0x94/0x128 dump_stack+0xa4/0xe0 check_preemption_disabled+0x118/0x120 vmstat_update+0x50/0xa0 process_one_work+0x144/0x348 worker_thread+0x150/0x4b8 kthread+0x110/0x140 ret_from_kernel_thread+0x14/0x1c Link: http://lkml.kernel.org/r/1520881552-25659-1-git-send-email-steven.hill@cavium.com Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4518085e12 |
mm, sysctl: make NUMA stats configurable
This is the second step which introduces a tunable interface that allow numa stats configurable for optimizing zone_statistics(), as suggested by Dave Hansen and Ying Huang. ========================================================================= When page allocation performance becomes a bottleneck and you can tolerate some possible tool breakage and decreased numa counter precision, you can do: echo 0 > /proc/sys/vm/numa_stat In this case, numa counter update is ignored. We can see about *4.8%*(185->176) drop of cpu cycles per single page allocation and reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343->315) drop of cpu cycles per single page allocation and reclaim on Jesper's page_bench03 (88 threads) running on a 2-Socket Broadwell-based server (88 threads, 126G memory). Benchmark link provided by Jesper D Brouer (increase loop times to 10000000): https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench ========================================================================= When page allocation performance is not a bottleneck and you want all tooling to work, you can do: echo 1 > /proc/sys/vm/numa_stat This is system default setting. Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka for comments to help improve the original patch. [keescook@chromium.org: make sure mutex is a global static] Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Ying Huang <ying.huang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Luis R . Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christopher Lameter <cl@linux.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3a50d14d0d |
mm: remove unused pgdat->inactive_ratio
Since commit
|
||
|
|
638032224e |
mm: consider the number in local CPUs when reading NUMA stats
To avoid deviation, the per cpu number of NUMA stats in vm_numa_stat_diff[] is included when a user *reads* the NUMA stats. Since NUMA stats does not be read by users frequently, and kernel does not need it to make a decision, it will not be a problem to make the readers more expensive. Link: http://lkml.kernel.org/r/1503568801-21305-4-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Ying Huang <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
1d90ca897c |
mm: update NUMA counter threshold size
There is significant overhead in cache bouncing caused by zone counters (NUMA associated counters) update in parallel in multi-threaded page allocation (suggested by Dave Hansen). This patch updates NUMA counter threshold to a fixed size of MAX_U16 - 2, as a small threshold greatly increases the update frequency of the global counter from local per cpu counter(suggested by Ying Huang). The rationality is that these statistics counters don't affect the kernel's decision, unlike other VM counters, so it's not a problem to use a large threshold. With this patchset, we see 31.3% drop of CPU cycles(537-->369) for per single page allocation and reclaim on Jesper's page_bench03 benchmark. Benchmark provided by Jesper D Brouer(increase loop times to 10000000): https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/ bench Threshold CPU cycles Throughput(88 threads) 32 799 241760478 64 640 301628829 125 537 358906028 <==> system by default (base) 256 468 412397590 512 428 450550704 4096 399 482520943 20000 394 489009617 30000 395 488017817 65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics Link: http://lkml.kernel.org/r/1503568801-21305-3-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Ying Huang <ying.huang@intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Christopher Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3a321d2a3d |
mm: change the call sites of numa statistics items
Patch series "Separate NUMA statistics from zone statistics", v2. Each page allocation updates a set of per-zone statistics with a call to zone_statistics(). As discussed in 2017 MM summit, these are a substantial source of overhead in the page allocator and are very rarely consumed. This significant overhead in cache bouncing caused by zone counters (NUMA associated counters) update in parallel in multi-threaded page allocation (pointed out by Dave Hansen). A link to the MM summit slides: http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf To mitigate this overhead, this patchset separates NUMA statistics from zone statistics framework, and update NUMA counter threshold to a fixed size of MAX_U16 - 2, as a small threshold greatly increases the update frequency of the global counter from local per cpu counter (suggested by Ying Huang). The rationality is that these statistics counters don't need to be read often, unlike other VM counters, so it's not a problem to use a large threshold and make readers more expensive. With this patchset, we see 31.3% drop of CPU cycles(537-->369, see below) for per single page allocation and reclaim on Jesper's page_bench03 benchmark. Meanwhile, this patchset keeps the same style of virtual memory statistics with little end-user-visible effects (only move the numa stats to show behind zone page stats, see the first patch for details). I did an experiment of single page allocation and reclaim concurrently using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based server (88 processors with 126G memory) with different size of threshold of pcp counter. Benchmark provided by Jesper D Brouer(increase loop times to 10000000): https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench Threshold CPU cycles Throughput(88 threads) 32 799 241760478 64 640 301628829 125 537 358906028 <==> system by default 256 468 412397590 512 428 450550704 4096 399 482520943 20000 394 489009617 30000 395 488017817 65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics This patch (of 3): In this patch, NUMA statistics is separated from zone statistics framework, all the call sites of NUMA stats are changed to use numa-stats-specific functions, it does not have any functionality change except that the number of NUMA stats is shown behind zone page stats when users *read* the zone info. E.g. cat /proc/zoneinfo ***Base*** ***With this patch*** nr_free_pages 3976 nr_free_pages 3976 nr_zone_inactive_anon 0 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_inactive_file 0 nr_zone_active_file 0 nr_zone_active_file 0 nr_zone_unevictable 0 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_zone_write_pending 0 nr_mlock 0 nr_mlock 0 nr_page_table_pages 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_kernel_stack 0 nr_bounce 0 nr_bounce 0 nr_zspages 0 nr_zspages 0 numa_hit 0 *nr_free_cma 0* numa_miss 0 numa_hit 0 numa_foreign 0 numa_miss 0 numa_interleave 0 numa_foreign 0 numa_local 0 numa_interleave 0 numa_other 0 numa_local 0 *nr_free_cma 0* numa_other 0 ... ... vm stats threshold: 10 vm stats threshold: 10 ... ... The next patch updates the numa stats counter size and threshold. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Ying Huang <ying.huang@intel.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
cbc65df240 |
mm, swap: add swap readahead hit statistics
Patch series "mm, swap: VMA based swap readahead", v4. The swap readahead is an important mechanism to reduce the swap in latency. Although pure sequential memory access pattern isn't very popular for anonymous memory, the space locality is still considered valid. In the original swap readahead implementation, the consecutive blocks in swap device are readahead based on the global space locality estimation. But the consecutive blocks in swap device just reflect the order of page reclaiming, don't necessarily reflect the access pattern in virtual memory space. And the different tasks in the system may have different access patterns, which makes the global space locality estimation incorrect. In this patchset, when page fault occurs, the virtual pages near the fault address will be readahead instead of the swap slots near the fault swap slot in swap device. This avoid to readahead the unrelated swap slots. At the same time, the swap readahead is changed to work on per-VMA from globally. So that the different access patterns of the different VMAs could be distinguished, and the different readahead policy could be applied accordingly. The original core readahead detection and scaling algorithm is reused, because it is an effect algorithm to detect the space locality. In addition to the swap readahead changes, some new sysfs interface is added to show the efficiency of the readahead algorithm and some other swap statistics. This new implementation will incur more small random read, on SSD, the improved correctness of estimation and readahead target should beat the potential increased overhead, this is also illustrated in the test results below. But on HDD, the overhead may beat the benefit, so the original implementation will be used by default. The test and result is as follow, Common test condition ===================== Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM) Swap device: NVMe disk Micro-benchmark with combined access pattern ============================================ vm-scalability, sequential swap test case, 4 processes to eat 50G virtual memory space, repeat the sequential memory writing until 300 seconds. The first round writing will trigger swap out, the following rounds will trigger sequential swap in and out. At the same time, run vm-scalability random swap test case in background, 8 processes to eat 30G virtual memory space, repeat the random memory write until 300 seconds. This will trigger random swap-in in the background. This is a combined workload with sequential and random memory accessing at the same time. The result (for sequential workload) is as follow, Base Optimized ---- --------- throughput 345413 KB/s 414029 KB/s (+19.9%) latency.average 97.14 us 61.06 us (-37.1%) latency.50th 2 us 1 us latency.60th 2 us 1 us latency.70th 98 us 2 us latency.80th 160 us 2 us latency.90th 260 us 217 us latency.95th 346 us 369 us latency.99th 1.34 ms 1.09 ms ra_hit% 52.69% 99.98% The original swap readahead algorithm is confused by the background random access workload, so readahead hit rate is lower. The VMA-base readahead algorithm works much better. Linpack ======= The test memory size is bigger than RAM to trigger swapping. Base Optimized ---- --------- elapsed_time 393.49 s 329.88 s (-16.2%) ra_hit% 86.21% 98.82% The score of base and optimized kernel hasn't visible changes. But the elapsed time reduced and readahead hit rate improved, so the optimized kernel runs better for startup and tear down stages. And the absolute value of readahead hit rate is high, shows that the space locality is still valid in some practical workloads. This patch (of 5): The statistics for total readahead pages and total readahead hits are recorded and exported via the following sysfs interface. /sys/kernel/mm/swap/ra_hits /sys/kernel/mm/swap/ra_total With them, the efficiency of the swap readahead could be measured, so that the swap readahead algorithm and parameters could be tuned accordingly. [akpm@linux-foundation.org: don't display swap stats if CONFIG_SWAP=n] Link: http://lkml.kernel.org/r/20170807054038.1843-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f113e64121 |
mm/vmstat.c: fix wrong comment
Comment for pagetypeinfo_showblockcount() is mistakenly duplicated from
pagetypeinfo_show_free()'s comment. This commit fixes it.
Link: http://lkml.kernel.org/r/20170809185816.11244-1-sj38.park@gmail.com
Fixes:
|
||
|
|
88d6ac40c1 |
mm/vmstat: fix divide error at __fragmentation_index
When order is -1 or too big, *1UL << order* will be 0, which will cause a divide error. Although it seems that all callers of __fragmentation_index() will only do so with a valid order, the patch can make it more robust. Should prevent reoccurrences of https://bugzilla.kernel.org/show_bug.cgi?id=196555 Link: http://lkml.kernel.org/r/1501751520-2598-1-git-send-email-wen.yang99@zte.com.cn Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c41f012ade |
mm: rename global_page_state to global_zone_page_state
global_page_state is error prone as a recent bug report pointed out [1].
It only returns proper values for zone based counters as the enum it
gets suggests. We already have global_node_page_state so let's rename
global_page_state to global_zone_page_state to be more explicit here.
All existing users seems to be correct:
$ git grep "global_page_state(NR_" | sed 's@.*(\(NR_[A-Z_]*\)).*@\1@' | sort | uniq -c
2 NR_BOUNCE
2 NR_FREE_CMA_PAGES
11 NR_FREE_PAGES
1 NR_KERNEL_STACK_KB
1 NR_MLOCK
2 NR_PAGETABLE
This patch shouldn't introduce any functional change.
[1] http://lkml.kernel.org/r/201707260628.v6Q6SmaS030814@www262.sakura.ne.jp
Link: http://lkml.kernel.org/r/20170801134256.5400-2-hannes@cmpxchg.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
fe490cc0fe |
mm, THP, swap: add THP swapping out fallback counting
When swapping out THP (Transparent Huge Page), instead of swapping out the THP as a whole, sometimes we have to fallback to split the THP into normal pages before swapping, because no free swap clusters are available, or cgroup limit is exceeded, etc. To count the number of the fallback, a new VM event THP_SWPOUT_FALLBACK is added, and counted when we fallback to split the THP. Link: http://lkml.kernel.org/r/20170724051840.2309-13-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c] Cc: Vishal L Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
225311a464 |
mm: test code to write THP to swap device as a whole
To support delay splitting THP (Transparent Huge Page) after swapped out, we need to enhance swap writing code to support to write a THP as a whole. This will improve swap write IO performance. As Ming Lei <ming.lei@redhat.com> pointed out, this should be based on multipage bvec support, which hasn't been merged yet. So this patch is only for testing the functionality of the other patches in the series. And will be reimplemented after multipage bvec support is merged. Link: http://lkml.kernel.org/r/20170724051840.2309-7-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c] Cc: Shaohua Li <shli@kernel.org> Cc: Vishal L Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
727c080f03 |
mm: avoid taking zone lock in pagetypeinfo_showmixed()
pagetypeinfo_showmixedcount_print is found to take a lot of time to complete and it does this holding the zone lock and disabling interrupts. In some cases it is found to take more than a second (On a 2.4GHz,8Gb RAM,arm64 cpu). Avoid taking the zone lock similar to what is done by read_page_owner, which means possibility of inaccurate results. Link: http://lkml.kernel.org/r/1498045643-12257-1-git-send-email-vinmenon@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: zhongjiang <zhongjiang@huawei.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
385386cff4 |
mm: vmstat: move slab statistics from zone to node counters
Patch series "mm: per-lruvec slab stats" Josef is working on a new approach to balancing slab caches and the page cache. For this to work, he needs slab cache statistics on the lruvec level. These patches implement that by adding infrastructure that allows updating and reading generic VM stat items per lruvec, then switches some existing VM accounting sites, including the slab accounting ones, to this new cgroup-aware API. I'll follow up with more patches on this, because there is actually substantial simplification that can be done to the memory controller when we replace private memcg accounting with making the existing VM accounting sites cgroup-aware. But this is enough for Josef to base his slab reclaim work on, so here goes. This patch (of 5): To re-implement slab cache vs. page cache balancing, we'll need the slab counters at the lruvec level, which, ever since lru reclaim was moved from the zone to the node, is the intersection of the node, not the zone, and the memcg. We could retain the per-zone counters for when the page allocator dumps its memory information on failures, and have counters on both levels - which on all but NUMA node 0 is usually redundant. But let's keep it simple for now and just move them. If anybody complains we can restore the per-zone counters. [hannes@cmpxchg.org: fix oops] Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8e675f7af5 |
mm/oom_kill: count global and memory cgroup oom kills
Show count of oom killer invocations in /proc/vmstat and count of processes killed in memory cgroup in knob "memory.events" (in memory.oom_control for v1 cgroup). Also describe difference between "oom" and "oom_kill" in memory cgroup documentation. Currently oom in memory cgroup kills tasks iff shortage has happened inside page fault. These counters helps in monitoring oom kills - for now the only way is grepping for magic words in kernel log. [akpm@linux-foundation.org: fix for mem_cgroup_count_vm_event() rename] [akpm@linux-foundation.org: fix comment, per Konstantin] Link: http://lkml.kernel.org/r/149570810989.203600.9492483715840752937.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Roman Guschin <guroan@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d336e94e44 |
mm, vmstat: skip reporting offline pages in pagetypeinfo
pagetypeinfo_showblockcount_print skips over invalid pfns but it would report pages which are offline because those have a valid pfn. Their migrate type is misleading at best. Now that we have pfn_to_online_page() we can use it instead of pfn_valid() and fix this. [mhocko@suse.com: fix build] Link: http://lkml.kernel.org/r/20170519072225.GA13041@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20170515085827.16474-11-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Joonsoo Kim <js1304@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9d85e15f1d |
mm/vmstat.c: standardize file operations variable names
Standardize the file operation variable names related to all four memory management /proc interface files. Also change all the symbol permissions (S_IRUGO) into octal permissions (0444) as it got complaints from checkpatch.pl. This does not create any functional change to the interface. Link: http://lkml.kernel.org/r/20170427030632.8588-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8d35bb3106 |
mm, vmstat: Remove spurious WARN() during zoneinfo print
After commit |
||
|
|
7dfb8bf3b9 |
mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo
After "mm, vmstat: print non-populated zones in zoneinfo", /proc/zoneinfo will show unpopulated zones. The per-cpu pageset statistics are not relevant for unpopulated zones and can be potentially lengthy, so supress them when they are not interesting. Also moves lowmem reserve protection information above pcp stats since it is relevant for all zones per vm.lowmem_reserve_ratio. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1703061400500.46428@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b2bd859819 |
mm, vmstat: print non-populated zones in zoneinfo
Initscripts can use the information (protection levels) from /proc/zoneinfo to configure vm.lowmem_reserve_ratio at boot. vm.lowmem_reserve_ratio is an array of ratios for each configured zone on the system. If a zone is not populated on an arch, /proc/zoneinfo suppresses its output. This results in there not being a 1:1 mapping between the set of zones emitted by /proc/zoneinfo and the zones configured by vm.lowmem_reserve_ratio. This patch shows statistics for non-populated zones in /proc/zoneinfo. The zones exist and hold a spot in the vm.lowmem_reserve_ratio array. Without this patch, it is not possible to determine which index in the array controls which zone if one or more zones on the system are not populated. Remaining users of walk_zones_in_node() are unchanged. Files such as /proc/pagetypeinfo require certain zone data to be initialized properly for display, which is not done for unpopulated zones. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1703031451310.98023@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f7ad2a6cb9 |
mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
madv()'s MADV_FREE indicate pages are 'lazyfree'. They are still anonymous pages, but they can be freed without pageout. To distinguish these from normal anonymous pages, we clear their SwapBacked flag. MADV_FREE pages could be freed without pageout, so they pretty much like used once file pages. For such pages, we'd like to reclaim them once there is memory pressure. Also it might be unfair reclaiming MADV_FREE pages always before used once file pages and we definitively want to reclaim the pages before other anonymous and file pages. To speed up MADV_FREE pages reclaim, we put the pages into LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny nowadays and should be full of used once file pages. Reclaiming MADV_FREE pages will not have much interfere of anonymous and active file pages. And the inactive file pages and MADV_FREE pages will be reclaimed according to their age, so we don't reclaim too many MADV_FREE pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also means we can reclaim the pages without swap support. This idea is suggested by Johannes. This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to avoid bisect failure, next patch will do it. The patch is based on Minchan's original patch. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c822f6223d |
mm: delete NR_PAGES_SCANNED and pgdat_reclaimable()
NR_PAGES_SCANNED counts number of pages scanned since the last page free event in the allocator. This was used primarily to measure the reclaimability of zones and nodes, and determine when reclaim should give up on them. In that role, it has been replaced in the preceding patches by a different mechanism. Being implemented as an efficient vmstat counter, it was automatically exported to userspace as well. It's however unlikely that anyone outside the kernel is using this counter in any meaningful way. Remove the counter and the unused pgdat_reclaimable(). Link: http://lkml.kernel.org/r/20170228214007.5621-8-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jia He <hejianet@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c73322d098 |
mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
Patch series "mm: kswapd spinning on unreclaimable nodes - fixes and
cleanups".
Jia reported a scenario in which the kswapd of a node indefinitely spins
at 100% CPU usage. We have seen similar cases at Facebook.
The kernel's current method of judging its ability to reclaim a node (or
whether to back off and sleep) is based on the amount of scanned pages
in proportion to the amount of reclaimable pages. In Jia's and our
scenarios, there are no reclaimable pages in the node, however, and the
condition for backing off is never met. Kswapd busyloops in an attempt
to restore the watermarks while having nothing to work with.
This series reworks the definition of an unreclaimable node based not on
scanning but on whether kswapd is able to actually reclaim pages in
MAX_RECLAIM_RETRIES (16) consecutive runs. This is the same criteria
the page allocator uses for giving up on direct reclaim and invoking the
OOM killer. If it cannot free any pages, kswapd will go to sleep and
leave further attempts to direct reclaim invocations, which will either
make progress and re-enable kswapd, or invoke the OOM killer.
Patch #1 fixes the immediate problem Jia reported, the remainder are
smaller fixlets, cleanups, and overall phasing out of the old method.
Patch #6 is the odd one out. It's a nice cleanup to get_scan_count(),
and directly related to #5, but in itself not relevant to the series.
If the whole series is too ambitious for 4.11, I would consider the
first three patches fixes, the rest cleanups.
This patch (of 9):
Jia He reports a problem with kswapd spinning at 100% CPU when
requesting more hugepages than memory available in the system:
$ echo 4000 >/proc/sys/vm/nr_hugepages
top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
At that time, there are no reclaimable pages left in the node, but as
kswapd fails to restore the high watermarks it refuses to go to sleep.
Kswapd needs to back away from nodes that fail to balance. Up until
commit
|
||
|
|
80d136e138 |
mm: make mm_percpu_wq non freezable
Geert has reported a freeze during PM resume and some additional
debugging has shown that the device_resume worker cannot make a forward
progress because it waits for an event which is stuck waiting in
drain_all_pages:
INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
Not tainted 4.11.0-rc7-koelsch-00029-g005882e53d62f25d-dirty #3476
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:0 D 0 5 2 0x00000000
Workqueue: events_unbound async_run_entry_fn
__schedule
schedule
schedule_timeout
wait_for_common
dpm_wait_for_superior
device_resume
async_resume
async_run_entry_fn
process_one_work
worker_thread
kthread
[...]
bash D 0 1703 1694 0x00000000
__schedule
schedule
schedule_timeout
wait_for_common
flush_work
drain_all_pages
start_isolate_page_range
alloc_contig_range
cma_alloc
__alloc_from_contiguous
cma_allocator_alloc
__dma_alloc
arm_dma_alloc
sh_eth_ring_init
sh_eth_open
sh_eth_resume
dpm_run_callback
device_resume
dpm_resume
dpm_resume_end
suspend_devices_and_enter
pm_suspend
state_store
kernfs_fop_write
__vfs_write
vfs_write
SyS_write
[...]
Showing busy workqueues and worker pools:
[...]
workqueue mm_percpu_wq: flags=0xc
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
delayed: drain_local_pages_wq, vmstat_update
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
delayed: drain_local_pages_wq BAR(1703), vmstat_update
Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
so it is frozen this early during resume so we are effectively
deadlocked. Fix this by dropping WQ_FREEZABLE when creating
mm_percpu_wq. We really want to have it operational all the time.
Fixes:
|
||
|
|
ce612879dd |
mm: move pcp and lru-pcp draining into single wq
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory
allocation nor perform any allocations themselves. We will save one
rescuer thread this way.
On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).
Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().
This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress. We can reuse the same one as for lru draining and
vmstat.
Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
597b7305dd |
mm: move mm_percpu_wq initialization earlier
Yang Li has reported that drain_all_pages triggers a WARN_ON which means that this function is called earlier than the mm_percpu_wq is initialized on arm64 with CMA configured: WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-next-20170310-00027-g64dfbc5 #127 Hardware name: Freescale Layerscape 2088A RDB Board (DT) task: ffffffc07c4a6d00 task.stack: ffffffc07c4a8000 PC is at drain_all_pages+0x244/0x25c LR is at start_isolate_page_range+0x14c/0x1f0 [...] drain_all_pages+0x244/0x25c start_isolate_page_range+0x14c/0x1f0 alloc_contig_range+0xec/0x354 cma_alloc+0x100/0x1fc dma_alloc_from_contiguous+0x3c/0x44 atomic_pool_init+0x7c/0x208 arm64_dma_init+0x44/0x4c do_one_initcall+0x38/0x128 kernel_init_freeable+0x1a0/0x240 kernel_init+0x10/0xfc ret_from_fork+0x10/0x20 Fix this by moving the whole setup_vmstat which is an initcall right now to init_mm_internals which will be called right after the WQ subsystem is initialized. Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Yang Li <pku.leo@gmail.com> Tested-by: Yang Li <pku.leo@gmail.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ce9311cf95 |
mm/vmstats: add thp_split_pud event for clarity
We added support for PUD-sized transparent hugepages, however we count the event "thp split pud" into thp_split_pmd event. To separate the event count of thp split pud from pmd, add a new event named thp_split_pud. Link: http://lkml.kernel.org/r/1488282380-5076-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7f354a548d |
mm, compaction: add vmstats for kcompactd work
A "compact_daemon_wake" vmstat exists that represents the number of times kcompactd has woken up. This doesn't represent how much work it actually did, though. It's useful to understand how much compaction work is being done by kcompactd versus other methods such as direct compaction and explicitly triggered per-node (or system) compaction. This adds two new vmstats: "compact_daemon_migrate_scanned" and "compact_daemon_free_scanned" to represent the number of pages kcompactd has scanned as part of its migration scanner and freeing scanner, respectively. These values are still accounted for in the general "compact_migrate_scanned" and "compact_free_scanned" for compatibility. It could be argued that explicitly triggered compaction could also be tracked separately, and that could be added if others find it useful. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1612071749390.69852@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |