Commit Graph

953 Commits

Author SHA1 Message Date
Blagovest Kolenichev
e79e029826 Merge android-5.4.5 (9cdc723) into msm-5.4
* refs/heads/tmp-9cdc723:
  Revert "usb: dwc3: gadget: Fix logical condition"
  Revert "FROMLIST: scsi: ufs-qcom: Adjust bus bandwidth voting and unvoting"
  Linux 5.4.5
  r8169: add missing RX enabling for WoL on RTL8125
  net: mscc: ocelot: unregister the PTP clock on deinit
  ionic: keep users rss hash across lif reset
  xdp: obtain the mem_id mutex before trying to remove an entry.
  page_pool: do not release pool until inflight == 0.
  net/mlx5e: ethtool, Fix analysis of speed setting
  net/mlx5e: Fix translation of link mode into speed
  net/mlx5e: Fix freeing flow with kfree() and not kvfree()
  net/mlx5e: Fix SFF 8472 eeprom length
  act_ct: support asymmetric conntrack
  net/mlx5e: Fix TXQ indices to be sequential
  net: Fixed updating of ethertype in skb_mpls_push()
  hsr: fix a NULL pointer dereference in hsr_dev_xmit()
  Fixed updating of ethertype in function skb_mpls_pop
  gre: refetch erspan header from skb->data after pskb_may_pull()
  cls_flower: Fix the behavior using port ranges with hw-offload
  net: sched: allow indirect blocks to bind to clsact in TC
  net: core: rename indirect block ingress cb function
  tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
  tcp: tighten acceptance of ACKs not matching a child socket
  tcp: fix rejected syncookies due to stale timestamps
  net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
  net: ipv6: add net argument to ip6_dst_lookup_flow
  net/mlx5e: Query global pause state before setting prio2buffer
  tipc: fix ordering of tipc module init and exit routine
  tcp: md5: fix potential overestimation of TCP option space
  openvswitch: support asymmetric conntrack
  net/tls: Fix return values to avoid ENOTSUPP
  net: thunderx: start phy before starting autonegotiation
  net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
  net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues
  net: ethernet: ti: cpsw: fix extra rx interrupt
  net: dsa: fix flow dissection on Tx path
  net: bridge: deny dev_set_mac_address() when unregistering
  mqprio: Fix out-of-bounds access in mqprio_dump
  inet: protect against too small mtu values.
  ANDROID: add initial ABI whitelist for android-5.4
  ANDROID: abi update for 5.4.4
  ANDROID: mm: Throttle rss_stat tracepoint
  FROMLIST: vsprintf: Inline call to ptr_to_hashval
  UPSTREAM: rss_stat: Add support to detect RSS updates of external mm
  UPSTREAM: mm: emit tracepoint when RSS changes
  Linux 5.4.4
  EDAC/ghes: Do not warn when incrementing refcount on 0
  r8169: fix rtl_hw_jumbo_disable for RTL8168evl
  workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
  blk-mq: make sure that line break can be printed
  ext4: fix leak of quota reservations
  ext4: fix a bug in ext4_wait_for_tail_page_commit
  splice: only read in as much information as there is pipe buffer space
  rtc: disable uie before setting time and enable after
  USB: dummy-hcd: increase max number of devices to 32
  powerpc: Define arch_is_kernel_initmem_freed() for lockdep
  mm/shmem.c: cast the type of unmap_start to u64
  s390/kaslr: store KASLR offset for early dumps
  s390/smp,vdso: fix ASCE handling
  firmware: qcom: scm: Ensure 'a0' status code is treated as signed
  ext4: work around deleting a file with i_nlink == 0 safely
  mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
  mfd: rk808: Fix RK818 ID template
  mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings
  powerpc: Fix vDSO clock_getres()
  powerpc: Avoid clang warnings around setjmp and longjmp
  omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
  omap: pdata-quirks: revert pandora specific gpiod additions
  iio: ad7949: fix channels mixups
  iio: ad7949: kill pointless "readback"-handling code
  Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
  scsi: qla2xxx: Fix a dma_pool_free() call
  scsi: qla2xxx: Fix SRB leak on switch command timeout
  reiserfs: fix extended attributes on the root directory
  ext4: Fix credit estimate for final inode freeing
  quota: fix livelock in dquot_writeback_dquots
  seccomp: avoid overflow in implicit constant conversion
  ext2: check err when partial != NULL
  quota: Check that quota is not dirty before release
  video/hdmi: Fix AVI bar unpack
  powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
  powerpc: Allow flush_icache_range to work across ranges >4GB
  powerpc/xive: Prevent page fault issues in the machine crash handler
  powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
  coresight: Serialize enabling/disabling a link device.
  stm class: Lose the protocol driver when dropping its reference
  ppdev: fix PPGETTIME/PPSETTIME ioctls
  RDMA/core: Fix ib_dma_max_seg_size()
  ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
  mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
  pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
  pinctrl: samsung: Fix device node refcount leaks in init code
  pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
  pinctrl: samsung: Fix device node refcount leaks in Exynos wakeup controller init
  pinctrl: samsung: Add of_node_put() before return in error path
  pinctrl: armada-37xx: Fix irq mask access in armada_37xx_irq_set_type()
  pinctrl: rza2: Fix gpio name typos
  ACPI: PM: Avoid attaching ACPI PM domain to certain devices
  ACPI: EC: Rework flushing of pending work
  ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
  ACPI: OSL: only free map once in osl.c
  ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge
  ACPI: LPSS: Add dmi quirk for skipping _DEP check for some device-links
  ACPI: LPSS: Add LNXVIDEO -> BYT I2C1 to lpss_device_links
  ACPI: LPSS: Add LNXVIDEO -> BYT I2C7 to lpss_device_links
  ACPI / utils: Move acpi_dev_get_first_match_dev() under CONFIG_ACPI
  ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
  ALSA: oxfw: fix return value in error path of isochronous resources reservation
  ALSA: fireface: fix return value in error path of isochronous resources reservation
  cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
  PM / devfreq: Lock devfreq in trans_stat_show
  intel_th: pci: Add Tiger Lake CPU support
  intel_th: pci: Add Ice Lake CPU support
  intel_th: Fix a double put_device() in error path
  powerpc/perf: Disable trace_imc pmu
  drm/panfrost: Open/close the perfcnt BO
  perf tests: Fix out of bounds memory access
  erofs: zero out when listxattr is called with no xattr
  cpuidle: use first valid target residency as poll time
  cpuidle: teo: Fix "early hits" handling for disabled idle states
  cpuidle: teo: Consider hits and misses metrics of disabled states
  cpuidle: teo: Rename local variable in teo_select()
  cpuidle: teo: Ignore disabled idle states that are too deep
  cpuidle: Do not unset the driver if it is there already
  media: cec.h: CEC_OP_REC_FLAG_ values were swapped
  media: radio: wl1273: fix interrupt masking on release
  media: bdisp: fix memleak on release
  media: vimc: sen: remove unused kthread_sen field
  media: hantro: Fix picture order count table enable
  media: hantro: Fix motion vectors usage condition
  media: hantro: Fix s_fmt for dynamic resolution changes
  s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
  ar5523: check NULL before memcpy() in ar5523_cmd()
  wil6210: check len before memcpy() calls
  cgroup: pids: use atomic64_t for pids->limit
  blk-mq: avoid sysfs buffer overflow with too many CPU cores
  md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
  ASoC: fsl_audmix: Add spin lock to protect tdms
  ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
  ASoC: rt5645: Fixed typo for buddy jack support.
  ASoC: rt5645: Fixed buddy jack support.
  workqueue: Fix pwq ref leak in rescuer_thread()
  workqueue: Fix spurious sanity check failures in destroy_workqueue()
  dm zoned: reduce overhead of backing device checks
  dm writecache: handle REQ_FUA
  hwrng: omap - Fix RNG wait loop timeout
  ovl: relax WARN_ON() on rename to self
  ovl: fix corner case of non-unique st_dev;st_ino
  ovl: fix lookup failure on multi lower squashfs
  lib: raid6: fix awk build warnings
  rtlwifi: rtl8192de: Fix missing enable interrupt flag
  rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
  rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
  btrfs: record all roots for rename exchange on a subvol
  Btrfs: send, skip backreference walking for extents with many references
  btrfs: Remove btrfs_bio::flags member
  btrfs: Avoid getting stuck during cyclic writebacks
  Btrfs: fix negative subv_writers counter and data space leak after buffered write
  Btrfs: fix metadata space leak on fixup worker failure to set range as delalloc
  btrfs: use refcount_inc_not_zero in kill_all_nodes
  btrfs: use btrfs_block_group_cache_done in update_block_group
  btrfs: check page->mapping when loading free space cache
  iwlwifi: pcie: fix support for transmitting SKBs with fraglist
  usb: typec: fix use after free in typec_register_port()
  phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
  usb: dwc3: ep0: Clear started flag on completion
  usb: dwc3: gadget: Clear started flag for non-IOC
  usb: dwc3: gadget: Fix logical condition
  usb: dwc3: pci: add ID for the Intel Comet Lake -H variant
  virtio-balloon: fix managed page counts when migrating pages between zones
  virt_wifi: fix use-after-free in virt_wifi_newlink()
  mtd: rawnand: Change calculating of position page containing BBM
  mtd: spear_smi: Fix Write Burst mode
  brcmfmac: disable PCIe interrupts before bus reset
  EDAC/altera: Use fast register IO for S10 IRQs
  tpm: Switch to platform_get_irq_optional()
  tpm: add check after commands attribs tab allocation
  usb: mon: Fix a deadlock in usbmon between mmap and read
  usb: core: urb: fix URB structure initialization function
  USB: adutux: fix interface sanity check
  usb: roles: fix a potential use after free
  USB: serial: io_edgeport: fix epic endpoint lookup
  USB: idmouse: fix interface sanity checks
  USB: atm: ueagle-atm: add missing endpoint check
  iio: adc: ad7124: Enable internal reference
  iio: adc: ad7606: fix reading unnecessary data from device
  iio: imu: inv_mpu6050: fix temperature reporting using bad unit
  iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
  iio: adis16480: Fix scales factors
  iio: imu: st_lsm6dsx: fix ODR check in st_lsm6dsx_write_raw
  iio: adis16480: Add debugfs_reg_access entry
  ARM: dts: pandora-common: define wl1251 as child node of mmc3
  usb: common: usb-conn-gpio: Don't log an error on probe deferral
  interconnect: qcom: qcs404: Walk the list safely on node removal
  interconnect: qcom: sdm845: Walk the list safely on node removal
  xhci: make sure interrupts are restored to correct state
  xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
  xhci: Increase STS_HALT timeout in xhci_suspend()
  xhci: fix USB3 device initiated resume race with roothub autosuspend
  xhci: Fix memory leak in xhci_add_in_port()
  usb: xhci: only set D3hot for pci device
  staging: gigaset: add endpoint-type sanity check
  staging: gigaset: fix illegal free on probe errors
  staging: gigaset: fix general protection fault on probe
  staging: vchiq: call unregister_chrdev_region() when driver registration fails
  staging: rtl8712: fix interface sanity check
  staging: rtl8188eu: fix interface sanity check
  staging: exfat: fix multiple definition error of `rename_file'
  binder: fix incorrect calculation for num_valid
  usb: host: xhci-tegra: Correct phy enable sequence
  usb: Allow USB device to be warm reset in suspended state
  USB: documentation: flags on usb-storage versus UAS
  USB: uas: heed CAPACITY_HEURISTICS
  USB: uas: honor flag to avoid CAPACITY16
  media: venus: remove invalid compat_ioctl32 handler
  ceph: fix compat_ioctl for ceph_dir_operations
  compat_ioctl: add compat_ptr_ioctl()
  scsi: qla2xxx: Fix memory leak when sending I/O fails
  scsi: qla2xxx: Fix double scsi_done for abort path
  scsi: qla2xxx: Fix driver unload hang
  scsi: qla2xxx: Do command completion on abort timeout
  scsi: zfcp: trace channel log even for FCP command responses
  scsi: lpfc: Fix bad ndlp ptr in xri aborted handling
  Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
  nvme: Namepace identification descriptor list is optional
  usb: gadget: pch_udc: fix use after free
  usb: gadget: configfs: Fix missing spin_lock_init()
  BACKPORT: FROMLIST: scsi: ufs: Export query request interfaces
  ANDROID: update abi with unbindable_ports sysctl
  BACKPORT: FROMLIST: net: introduce ip_local_unbindable_ports sysctl
  ANDROID: update abi for 5.4.3 merge
  ANDROID: update abi_gki_aarch64.xml for ion, drm changes
  ANDROID: drivers: gpu: drm: export drm_mode_convert_umode symbol
  ANDROID: ion: flush cache before exporting non-cached buffers
  Linux 5.4.3
  kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
  perf script: Fix invalid LBR/binary mismatch error
  EDAC/ghes: Fix locking and memory barrier issues
  watchdog: aspeed: Fix clock behaviour for ast2600
  drm/mcde: Fix an error handling path in 'mcde_probe()'
  md/raid0: Fix an error message in raid0_make_request()
  cpufreq: imx-cpufreq-dt: Correct i.MX8MN's default speed grade value
  ALSA: hda - Fix pending unsol events at shutdown
  KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
  binder: Handle start==NULL in binder_update_page_range()
  binder: Prevent repeated use of ->mmap() via NULL mapping
  binder: Fix race between mmap() and binder_alloc_print_pages()
  Revert "serial/8250: Add support for NI-Serial PXI/PXIe+485 devices"
  vcs: prevent write access to vcsu devices
  thermal: Fix deadlock in thermal thermal_zone_device_check
  iomap: Fix pipe page leakage during splicing
  bdev: Refresh bdev size for disks without partitioning
  bdev: Factor out bdev revalidation into a common helper
  rfkill: allocate static minor
  RDMA/qib: Validate ->show()/store() callbacks before calling them
  can: ucan: fix non-atomic allocation in completion handler
  spi: Fix NULL pointer when setting SPI_CS_HIGH for GPIO CS
  spi: Fix SPI_CS_HIGH setting when using native and GPIO CS
  spi: atmel: Fix CS high support
  spi: stm32-qspi: Fix kernel oops when unbinding driver
  spi: spi-fsl-qspi: Clear TDH bits in FLSHCR register
  crypto: user - fix memory leak in crypto_reportstat
  crypto: user - fix memory leak in crypto_report
  crypto: ecdh - fix big endian bug in ECC library
  crypto: ccp - fix uninitialized list head
  crypto: geode-aes - switch to skcipher for cbc(aes) fallback
  crypto: af_alg - cast ki_complete ternary op to int
  crypto: atmel-aes - Fix IV handling when req->nbytes < ivsize
  crypto: crypto4xx - fix double-free in crypto4xx_destroy_sdr
  KVM: x86: Grab KVM's srcu lock when setting nested state
  KVM: x86: Remove a spurious export of a static function
  KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
  KVM: x86: do not modify masked bits of shared MSRs
  KVM: arm/arm64: vgic: Don't rely on the wrong pending table
  KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated
  KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
  KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
  arm64: dts: exynos: Revert "Remove unneeded address space mapping for soc node"
  arm64: Validate tagged addresses in access_ok() called from kernel threads
  drm/i810: Prevent underflow in ioctl
  drm: damage_helper: Fix race checking plane->state->fb
  drm/msm: fix memleak on release
  jbd2: Fix possible overflow in jbd2_log_space_left()
  kernfs: fix ino wrap-around detection
  nfsd: restore NFSv3 ACL support
  nfsd: Ensure CLONE persists data and metadata changes to the target file
  can: slcan: Fix use-after-free Read in slcan_open
  tty: vt: keyboard: reject invalid keycodes
  CIFS: Fix SMB2 oplock break processing
  CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
  x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect
  x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
  media: rc: mark input device as pointing stick
  Input: Fix memory leak in psxpad_spi_probe
  coresight: etm4x: Fix input validation for sysfs.
  Input: goodix - add upside-down quirk for Teclast X89 tablet
  Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
  Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
  Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
  soc: mediatek: cmdq: fixup wrong input order of write api
  ALSA: hda: Modify stream stripe mask only when needed
  ALSA: hda - Add mute led support for HP ProBook 645 G4
  ALSA: pcm: oss: Avoid potential buffer overflows
  ALSA: hda/realtek - Fix inverted bass GPIO pin on Acer 8951G
  ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236
  ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
  ALSA: hda/realtek - Enable internal speaker of ASUS UX431FLC
  SUNRPC: Avoid RPC delays when exiting suspend
  io_uring: ensure req->submit is copied when req is deferred
  io_uring: fix missing kmap() declaration on powerpc
  fuse: verify attributes
  fuse: verify write return
  fuse: verify nlink
  fuse: fix leak of fuse_io_priv
  io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR
  io_uring: fix dead-hung for non-iter fixed rw
  mwifiex: Re-work support for SDIO HW reset
  serial: ifx6x60: add missed pm_runtime_disable
  serial: 8250_dw: Avoid double error messaging when IRQ absent
  serial: stm32: fix clearing interrupt error flags
  serial: serial_core: Perform NULL checks for break_ctl ops
  serial: pl011: Fix DMA ->flush_buffer()
  tty: serial: msm_serial: Fix flow control
  tty: serial: fsl_lpuart: use the sg count from dma_map_sg
  serial: 8250-mtk: Use platform_get_irq_optional() for optional irq
  usb: gadget: u_serial: add missing port entry locking
  staging/octeon: Use stubs for MIPS && !CAVIUM_OCTEON_SOC
  mailbox: tegra: Fix superfluous IRQ error message
  time: Zero the upper 32-bits in __kernel_timespec on 32-bit
  lp: fix sparc64 LPSETTIMEOUT ioctl
  sparc64: implement ioremap_uc
  perf scripts python: exported-sql-viewer.py: Fix use of TRUE with SQLite
  arm64: tegra: Fix 'active-low' warning for Jetson Xavier regulator
  arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
  rsi: release skb if rsi_prepare_beacon fails
  FROMLIST: scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic
  FROMLIST: scsi: ufs: Add dev ref clock gating wait time support
  FROMLIST: scsi: ufs-qcom: Adjust bus bandwidth voting and unvoting
  FROMLIST: scsi: ufs: Remove the check before call setup clock notify vops
  FROMLIST: scsi: ufs: set load before setting voltage in regulators
  FROMLIST: scsi: ufs: Flush exception event before suspend
  FROMLIST: scsi: ufs: Do not rely on prefetched data
  FROMLIST: scsi: ufs: Fix up clock scaling
  FROMGIT: scsi: ufs: Do not free irq in suspend
  FROMGIT: scsi: ufs: Do not clear the DL layer timers
  FROMGIT: scsi: ufs: Release clock if DMA map fails
  FROMGIT: scsi: ufs: Use DBD setting in mode sense
  FROMGIT: scsi: core: Adjust DBD setting in MODE SENSE for caching mode page per LLD
  FROMGIT: scsi: ufs: Complete pending requests in host reset and restore path
  FROMGIT: scsi: ufs: Avoid messing up the compl_time_stamp of lrbs
  FROMGIT: scsi: ufs: Update VCCQ2 and VCCQ min/max voltage hard codes
  FROMGIT: scsi: ufs: Recheck bkops level if bkops is disabled
  ANDROID: update abi_gki_aarch64.xml for LTO, CFI, and SCS
  ANDROID: gki_defconfig: enable LTO, CFI, and SCS
  ANDROID: update abi_gki_aarch64.xml for CONFIG_GNSS
  ANDROID: cuttlefish_defconfig: Enable CONFIG_GNSS
  ANDROID: gki_defconfig: enable HID configs
  UPSTREAM: arm64: Validate tagged addresses in access_ok() called from kernel threads
  ANDROID: kbuild: limit LTO inlining
  ANDROID: kbuild: merge module sections with LTO
  ANDROID: f2fs: fix possible merge of unencrypted with encrypted I/O
  ANDROID: gki_defconfig: Enable UCLAMP by default
  ANDROID: make sure proc mount options are applied
  ANDROID: sound: usb: Add helper APIs to enable audio stream
  ANDROID: Update ABI representation
  ANDROID: Don't base allmodconfig on gki_defconfig
  ANDROID: Disable UNWINDER_ORC for allmodconfig
  ANDROID: ASoC: Fix 'allmodconfig' build break
  Linux 5.4.2
  platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
  platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
  HID: core: check whether Usage Page item is after Usage ID items
  crypto: talitos - Fix build error by selecting LIB_DES
  Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()"
  ext4: add more paranoia checking in ext4_expand_extra_isize handling
  r8169: fix resume on cable plug-in
  r8169: fix jumbo configuration for RTL8168evl
  selftests: pmtu: use -oneline for ip route list cache
  tipc: fix link name length check
  selftests: bpf: correct perror strings
  selftests: bpf: test_sockmap: handle file creation failures gracefully
  net/tls: use sg_next() to walk sg entries
  net/tls: remove the dead inplace_crypto code
  selftests/tls: add a test for fragmented messages
  net: skmsg: fix TLS 1.3 crash with full sk_msg
  net/tls: free the record on encryption error
  net/tls: take into account that bpf_exec_tx_verdict() may free the record
  openvswitch: remove another BUG_ON()
  openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
  sctp: cache netns in sctp_ep_common
  slip: Fix use-after-free Read in slip_open
  sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
  openvswitch: fix flow command message size
  net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
  net: psample: fix skb_over_panic
  net: macb: add missed tasklet_kill
  net: dsa: sja1105: fix sja1105_parse_rgmii_delays()
  mdio_bus: don't use managed reset-controller
  macvlan: schedule bc_work even if error
  gve: Fix the queue page list allocated pages count
  x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
  thunderbolt: Power cycle the router if NVM authentication fails
  mei: me: add comet point V device id
  mei: bus: prefix device names on bus with the bus name
  USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
  staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
  staging: rtl8723bs: Drop ACPI device ids
  staging: rtl8192e: fix potential use after free
  staging: wilc1000: fix illegal memory access in wilc_parse_join_bss_param()
  usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
  driver core: platform: use the correct callback type for bus_find_device
  crypto: inside-secure - Fix stability issue with Macchiatobin
  net: disallow ancillary data for __sys_{send,recv}msg_file()
  net: separate out the msghdr copy from ___sys_{send,recv}msg()
  io_uring: async workers should inherit the user creds
  ANDROID: Update ABI representation
  UPSTREAM: of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
  UPSTREAM: of: property: Fix the semantics of of_is_ancestor_of()
  UPSTREAM: i2c: of: Populate fwnode in of_i2c_get_board_info()
  UPSTREAM: regulator: core: Don't try to remove device links if add failed
  UPSTREAM: driver core: Clarify documentation for fwnode_operations.add_links()
  ANDROID: Update ABI representation
  ANDROID: gki_defconfig: IIO=y
  ANDROID: Update ABI representation
  ANDROID: ASoC: core - add hostless DAI support
  ANDROID: gki_defconfig: =m's applied for virtio configs in arm64
  ANDROID: Update ABI representation after 5.4.1 merge
  Linux 5.4.1
  KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
  powerpc/book3s64: Fix link stack flush on context switch
  staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error
  USB: serial: option: add support for Foxconn T77W968 LTE modules
  USB: serial: option: add support for DW5821e with eSIM support
  USB: serial: mos7840: fix remote wakeup
  USB: serial: mos7720: fix remote wakeup
  USB: serial: mos7840: add USB ID to support Moxa UPort 2210
  appledisplay: fix error handling in the scheduled work
  USB: chaoskey: fix error case of a timeout
  usb-serial: cp201x: support Mark-10 digital force gauge
  usbip: Fix uninitialized symbol 'nents' in stub_recv_cmd_submit()
  usbip: tools: fix fd leakage in the function of read_attr_usbip_status
  USBIP: add config dependency for SGL_ALLOC
  ALSA: hda - Disable audio component for legacy Nvidia HDMI codecs
  media: mceusb: fix out of bounds read in MCE receiver buffer
  media: imon: invalid dereference in imon_touch_event
  media: cxusb: detect cxusb_ctrl_msg error in query
  media: b2c2-flexcop-usb: add sanity checking
  media: uvcvideo: Fix error path in control parsing failure
  futex: Prevent exit livelock
  futex: Provide distinct return value when owner is exiting
  futex: Add mutex around futex exit
  futex: Provide state handling for exec() as well
  futex: Sanitize exit state handling
  futex: Mark the begin of futex exit explicitly
  futex: Set task::futex_state to DEAD right after handling futex exit
  futex: Split futex_mm_release() for exit/exec
  exit/exec: Seperate mm_release()
  futex: Replace PF_EXITPIDONE with a state
  futex: Move futex exit handling into futex code
  cpufreq: Add NULL checks to show() and store() methods of cpufreq
  media: usbvision: Fix races among open, close, and disconnect
  media: usbvision: Fix invalid accesses after device disconnect
  media: vivid: Fix wrong locking that causes race conditions on streaming stop
  media: vivid: Set vid_cap_streaming and vid_out_streaming to true
  ALSA: usb-audio: Fix Scarlett 6i6 Gen 2 port data
  ALSA: usb-audio: Fix NULL dereference at parsing BADD
  futex: Prevent robust futex exit race
  x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
  x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
  selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel
  selftests/x86/mov_ss_trap: Fix the SYSENTER test
  x86/entry/32: Fix NMI vs ESPFIX
  x86/entry/32: Unwind the ESPFIX stack earlier on exception entry
  x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL
  x86/entry/32: Use %ss segment where required
  x86/entry/32: Fix IRET exception
  x86/cpu_entry_area: Add guard page for entry stack on 32bit
  x86/pti/32: Size initial_page_table correctly
  x86/doublefault/32: Fix stack canaries in the double fault handler
  x86/xen/32: Simplify ring check in xen_iret_crit_fixup()
  x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout
  x86/stackframe/32: Repair 32-bit Xen PV
  nbd: prevent memory leak
  x86/speculation: Fix redundant MDS mitigation message
  x86/speculation: Fix incorrect MDS/TAA mitigation status
  x86/insn: Fix awk regexp warnings
  md/raid10: prevent access of uninitialized resync_pages offset
  Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
  Revert "Bluetooth: hci_ll: set operational frequency earlier"
  ath10k: restore QCA9880-AR1A (v1) detection
  ath10k: Fix HOST capability QMI incompatibility
  ath10k: Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe
  ath9k_hw: fix uninitialized variable data
  Bluetooth: Fix invalid-free in bcsp_close()
  ANDROID: gki_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
  FROMLIST: crypto: arm64/sha: fix function types
  ANDROID: arm64: kvm: disable CFI
  ANDROID: arm64: add __nocfi to __apply_alternatives
  ANDROID: arm64: add __pa_function
  ANDROID: arm64: add __nocfi to functions that jump to a physical address
  ANDROID: arm64: bpf: implement arch_bpf_jit_check_func
  ANDROID: bpf: validate bpf_func when BPF_JIT is enabled with CFI
  ANDROID: add support for Clang's Control Flow Integrity (CFI)
  ANDROID: arm64: allow LTO_CLANG and THINLTO to be selected
  FROMLIST: arm64: fix alternatives with LLVM's integrated assembler
  FROMLIST: arm64: lse: fix LSE atomics with LLVM's integrated assembler
  ANDROID: arm64: disable HAVE_ARCH_PREL32_RELOCATIONS with LTO_CLANG
  ANDROID: arm64: vdso: disable LTO
  ANDROID: irqchip/gic-v3: rename gic_of_init to work around a ThinLTO+CFI bug
  ANDROID: soc/tegra: disable ARCH_TEGRA_210_SOC with LTO
  ANDROID: init: ensure initcall ordering with LTO
  ANDROID: drivers/misc/lkdtm: disable LTO for rodata.o
  ANDROID: efi/libstub: disable LTO
  ANDROID: scripts/mod: disable LTO for empty.c
  ANDROID: kbuild: fix dynamic ftrace with clang LTO
  ANDROID: kbuild: add support for Clang LTO
  ANDROID: kbuild: add CONFIG_LD_IS_LLD
  FROMGIT: driver core: platform: use the correct callback type for bus_find_device
  FROMLIST: arm64: implement Shadow Call Stack
  FROMLIST: arm64: disable SCS for hypervisor code
  FROMLIST: arm64: vdso: disable Shadow Call Stack
  FROMLIST: arm64: efi: restore x18 if it was corrupted
  FROMLIST: arm64: preserve x18 when CPU is suspended
  FROMLIST: arm64: reserve x18 from general allocation with SCS
  FROMLIST: arm64: disable function graph tracing with SCS
  FROMLIST: scs: add support for stack usage debugging
  FROMLIST: scs: add accounting
  FROMLIST: add support for Clang's Shadow Call Stack (SCS)
  FROMLIST: arm64: kernel: avoid x18 in __cpu_soft_restart
  FROMLIST: arm64: kvm: stop treating register x18 as caller save
  FROMLIST: arm64/lib: copy_page: avoid x18 register in assembler code
  FROMLIST: arm64: mm: avoid x18 in idmap_kpti_install_ng_mappings
  ANDROID: clang: update to 10.0.1
  ANDROID: update ABI representation

Conflicts:
	Documentation/devicetree/bindings
	Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
	arch/arm64/Kconfig
	drivers/firmware/qcom_scm-64.c
	drivers/hwtracing/coresight/coresight.c
	drivers/scsi/ufs/ufs.h
	drivers/scsi/ufs/ufshcd.c
	drivers/scsi/ufs/ufshcd.h
	drivers/scsi/ufs/unipro.h
	drivers/staging/android/ion/heaps/ion_cma_heap.c
	drivers/staging/android/ion/heaps/ion_system_heap.c
	drivers/usb/dwc3/ep0.c
	drivers/usb/dwc3/gadget.c
	include/sound/pcm.h
	include/sound/soc.h
	kernel/exit.c
	kernel/sched/core.c

Change-Id: I66ea973ddcafd352ba999a1dc98e04df33397e3b
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2020-01-23 04:00:53 -08:00
Satya Durga Srinivasu Prabhala
64b577b9cc sched: Add snapshot of Window Assisted Load Tracking (WALT)
This snapshot is taken from msm-4.19 as of commit 5debecbe7195
("trace: filter out spurious preemption and IRQs disable traces").

Change-Id: I8fab4084971baadcaa037f40ab549fc073a4b1ea
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2019-12-10 12:39:14 -08:00
Greg Kroah-Hartman
c32aefc014 Merge 5.4.1 into android-5.4
Changes in 5.4.1
	Bluetooth: Fix invalid-free in bcsp_close()
	ath9k_hw: fix uninitialized variable data
	ath10k: Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe
	ath10k: Fix HOST capability QMI incompatibility
	ath10k: restore QCA9880-AR1A (v1) detection
	Revert "Bluetooth: hci_ll: set operational frequency earlier"
	Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
	md/raid10: prevent access of uninitialized resync_pages offset
	x86/insn: Fix awk regexp warnings
	x86/speculation: Fix incorrect MDS/TAA mitigation status
	x86/speculation: Fix redundant MDS mitigation message
	nbd: prevent memory leak
	x86/stackframe/32: Repair 32-bit Xen PV
	x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout
	x86/xen/32: Simplify ring check in xen_iret_crit_fixup()
	x86/doublefault/32: Fix stack canaries in the double fault handler
	x86/pti/32: Size initial_page_table correctly
	x86/cpu_entry_area: Add guard page for entry stack on 32bit
	x86/entry/32: Fix IRET exception
	x86/entry/32: Use %ss segment where required
	x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL
	x86/entry/32: Unwind the ESPFIX stack earlier on exception entry
	x86/entry/32: Fix NMI vs ESPFIX
	selftests/x86/mov_ss_trap: Fix the SYSENTER test
	selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel
	x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
	x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
	futex: Prevent robust futex exit race
	ALSA: usb-audio: Fix NULL dereference at parsing BADD
	ALSA: usb-audio: Fix Scarlett 6i6 Gen 2 port data
	media: vivid: Set vid_cap_streaming and vid_out_streaming to true
	media: vivid: Fix wrong locking that causes race conditions on streaming stop
	media: usbvision: Fix invalid accesses after device disconnect
	media: usbvision: Fix races among open, close, and disconnect
	cpufreq: Add NULL checks to show() and store() methods of cpufreq
	futex: Move futex exit handling into futex code
	futex: Replace PF_EXITPIDONE with a state
	exit/exec: Seperate mm_release()
	futex: Split futex_mm_release() for exit/exec
	futex: Set task::futex_state to DEAD right after handling futex exit
	futex: Mark the begin of futex exit explicitly
	futex: Sanitize exit state handling
	futex: Provide state handling for exec() as well
	futex: Add mutex around futex exit
	futex: Provide distinct return value when owner is exiting
	futex: Prevent exit livelock
	media: uvcvideo: Fix error path in control parsing failure
	media: b2c2-flexcop-usb: add sanity checking
	media: cxusb: detect cxusb_ctrl_msg error in query
	media: imon: invalid dereference in imon_touch_event
	media: mceusb: fix out of bounds read in MCE receiver buffer
	ALSA: hda - Disable audio component for legacy Nvidia HDMI codecs
	USBIP: add config dependency for SGL_ALLOC
	usbip: tools: fix fd leakage in the function of read_attr_usbip_status
	usbip: Fix uninitialized symbol 'nents' in stub_recv_cmd_submit()
	usb-serial: cp201x: support Mark-10 digital force gauge
	USB: chaoskey: fix error case of a timeout
	appledisplay: fix error handling in the scheduled work
	USB: serial: mos7840: add USB ID to support Moxa UPort 2210
	USB: serial: mos7720: fix remote wakeup
	USB: serial: mos7840: fix remote wakeup
	USB: serial: option: add support for DW5821e with eSIM support
	USB: serial: option: add support for Foxconn T77W968 LTE modules
	staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error
	powerpc/book3s64: Fix link stack flush on context switch
	KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
	Linux 5.4.1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id50109953b5638956d150e4fc648a94b6e347fb5
2019-11-29 10:56:00 +01:00
Thomas Gleixner
1bcee23370 futex: Split futex_mm_release() for exit/exec
commit 150d71584b upstream.

To allow separate handling of the futex exit state in the futex exit code
for exit and exec, split futex_mm_release() into two functions and invoke
them from the corresponding exit/exec_mm_release() callsites.

Preparatory only, no functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.332094221@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29 10:10:10 +01:00
Thomas Gleixner
7d7e93588f exit/exec: Seperate mm_release()
commit 4610ba7ad8 upstream.

mm_release() contains the futex exit handling. mm_release() is called from
do_exit()->exit_mm() and from exec()->exec_mm().

In the exit_mm() case PF_EXITING and the futex state is updated. In the
exec_mm() case these states are not touched.

As the futex exit code needs further protections against exit races, this
needs to be split into two functions.

Preparatory only, no functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29 10:10:10 +01:00
Thomas Gleixner
8012f98f92 futex: Move futex exit handling into futex code
commit ba31c1a485 upstream.

The futex exit handling is #ifdeffed into mm_release() which is not pretty
to begin with. But upcoming changes to address futex exit races need to add
more functionality to this exit code.

Split it out into a function, move it into futex code and make the various
futex exit functions static.

Preparatory only and no functional change.

Folded build fix from Borislav.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.049705556@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29 10:10:08 +01:00
Sami Tolvanen
ff9de73a0a FROMLIST: add support for Clang's Shadow Call Stack (SCS)
This change adds generic support for Clang's Shadow Call Stack,
which uses a shadow stack to protect return addresses from being
overwritten by an attacker. Details are available here:

  https://clang.llvm.org/docs/ShadowCallStack.html

Note that security guarantees in the kernel differ from the
ones documented for user space. The kernel must store addresses
of shadow stacks used by other tasks and interrupt handlers in
memory, which means an attacker capable reading and writing
arbitrary memory may be able to locate them and hijack control
flow by modifying shadow stacks that are not currently in use.

Bug: 145210207
Change-Id: I2a8ba6a3decac50c169731c3121c9dcab96621d2
(am from https://lore.kernel.org/patchwork/patch/1149054/)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2019-11-27 12:49:09 -08:00
Greg Kroah-Hartman
54e301676a Merge 5.4 into android-mainline
Linux 5.4

Here we go!

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iee409f3bbf65b93fa419e5b09818efb8e56569fd
2019-11-25 09:57:21 +01:00
Luc Van Oostenryck
9e77716a75 fork: fix pidfd_poll()'s return type
pidfd_poll() is defined as returning 'unsigned int' but the
.poll method is declared as returning '__poll_t', a bitwise type.

Fix this by using the proper return type and using the EPOLL
constants instead of the POLL ones, as required for __poll_t.

Fixes: b53b0b9d9a ("pidfd: add polling support")
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable@vger.kernel.org # 5.3
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191120003320.31138-1-luc.vanoostenryck@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-11-20 11:48:50 +01:00
Greg Kroah-Hartman
682d8bf784 Merge tag 'v5.4-rc7' into android-mainline
Linux 5.4-rc7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I505207a0a6f68ccc3519d7f190d8faf25d9d479a
2019-11-11 06:10:55 +01:00
Christian Brauner
fa729c4df5 clone3: validate stack arguments
Validate the stack arguments and setup the stack depening on whether or not
it is growing down or up.

Legacy clone() required userspace to know in which direction the stack is
growing and pass down the stack pointer appropriately. To make things more
confusing microblaze uses a variant of the clone() syscall selected by
CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument.
IA64 has a separate clone2() syscall which also takes an additional
stack_size argument. Finally, parisc has a stack that is growing upwards.
Userspace therefore has a lot nasty code like the following:

 #define __STACK_SIZE (8 * 1024 * 1024)
 pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
 {
         pid_t ret;
         void *stack;

         stack = malloc(__STACK_SIZE);
         if (!stack)
                 return -ENOMEM;

 #ifdef __ia64__
         ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
 #elif defined(__parisc__) /* stack grows up */
         ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd);
 #else
         ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
 #endif
         return ret;
 }

or even crazier variants such as [3].

With clone3() we have the ability to validate the stack. We can check that
when stack_size is passed, the stack pointer is valid and the other way
around. We can also check that the memory area userspace gave us is fine to
use via access_ok(). Furthermore, we probably should not require
userspace to know in which direction the stack is growing. It is easy
for us to do this in the kernel and I couldn't find the original
reasoning behind exposing this detail to userspace.

/* Intentional user visible API change */
clone3() was released with 5.3. Currently, it is not documented and very
unclear to userspace how the stack and stack_size argument have to be
passed. After talking to glibc folks we concluded that trying to change
clone3() to setup the stack instead of requiring userspace to do this is
the right course of action.
Note, that this is an explicit change in user visible behavior we introduce
with this patch. If it breaks someone's use-case we will revert! (And then
e.g. place the new behavior under an appropriate flag.)
Breaking someone's use-case is very unlikely though. First, neither glibc
nor musl currently expose a wrapper for clone3(). Second, there is no real
motivation for anyone to use clone3() directly since it does not provide
features that legacy clone doesn't. New features for clone3() will first
happen in v5.5 which is why v5.4 is still a good time to try and make that
change now and backport it to v5.3. Searches on [4] did not reveal any
packages calling clone3().

[1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17TA@mail.gmail.com
[2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein
[3]: 5238e95759/src/basic/raw-clone.h (L31)
[4]: https://codesearch.debian.net
Fixes: 7f192e3cd3 ("fork: add clone3")
Cc: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-api@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 5.3
Cc: GNU C Library <libc-alpha@sourceware.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20191031113608.20713-1-christian.brauner@ubuntu.com
2019-11-05 15:50:14 +01:00
Greg Kroah-Hartman
630839ac24 Merge 5.4-rc3 into android-mainline
Linux 5.4-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia87ba662738dd58ddb917e32c1fbd812861e7a46
2019-10-17 05:28:13 -07:00
Michal Hocko
b0f53dbc4b kernel/sysctl.c: do not override max_threads provided by userspace
Partially revert 16db3d3f11 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.

set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory.  This is a good thing in general but there
are workloads which might need to increase this limit for an application
to run (reportedly WebSpher MQ is affected) and that is simply not
possible after the mentioned change.  It is also very dubious to
override an admin decision by an estimation that doesn't have any direct
relation to correctness of the kernel operation.

Fix this by dropping set_max_threads from sysctl_max_threads so any
value is accepted as long as it fits into MAX_THREADS which is important
to check because allowing more threads could break internal robust futex
restriction.  While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when
below this limit.

This became more severe when we switched x86 from 4k to 8k kernel
stacks.  Starting since 6538b8ea88 ("x86_64: expand kernel stack to
16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned
value.

In the particular case

  3.12
  kernel.threads-max = 515561

  4.4
  kernel.threads-max = 200000

Neither of the two values is really insane on 32GB machine.

I am not sure we want/need to tune the max_thread value further.  If
anything the tuning should be removed altogether if proven not useful in
general.  But we definitely need a way to override this auto-tuning.

Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f11 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07 15:47:19 -07:00
Greg Kroah-Hartman
8e9e0abf99 Merge 5.4-rc2 into android-mainline
Linux 5.4-rc2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idfe13500feef5c1095d06c419fa121f751daa459
2019-10-07 07:09:37 +02:00
Linus Torvalds
e524d16e7e Merge tag 'copy-struct-from-user-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull copy_struct_from_user() helper from Christian Brauner:
 "This contains the copy_struct_from_user() helper which got split out
  from the openat2() patchset. It is a generic interface designed to
  copy a struct from userspace.

  The helper will be especially useful for structs versioned by size of
  which we have quite a few. This allows for backwards compatibility,
  i.e. an extended struct can be passed to an older kernel, or a legacy
  struct can be passed to a newer kernel. For the first case (extended
  struct, older kernel) the new fields in an extended struct can be set
  to zero and the struct safely passed to an older kernel.

  The most obvious benefit is that this helper lets us get rid of
  duplicate code present in at least sched_setattr(), perf_event_open(),
  and clone3(). More importantly it will also help to ensure that users
  implementing versioning-by-size end up with the same core semantics.

  This point is especially crucial since we have at least one case where
  versioning-by-size is used but with slighly different semantics:
  sched_setattr(), perf_event_open(), and clone3() all do do similar
  checks to copy_struct_from_user() while rt_sigprocmask(2) always
  rejects differently-sized struct arguments.

  With this pull request we also switch over sched_setattr(),
  perf_event_open(), and clone3() to use the new helper"

* tag 'copy-struct-from-user-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  usercopy: Add parentheses around assignment in test_copy_struct_from_user
  perf_event_open: switch to copy_struct_from_user()
  sched_setattr: switch to copy_struct_from_user()
  clone3: switch to copy_struct_from_user()
  lib: introduce copy_struct_from_user() helper
2019-10-04 10:36:31 -07:00
Christian Brauner
501bd0166e fork: add kernel-doc for clone3
Add kernel-doc for the clone3() syscall.

Link: https://lore.kernel.org/r/20191001114701.24661-2-christian.brauner@ubuntu.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-10-03 21:18:06 +02:00
Greg Kroah-Hartman
cb33d78781 Merge 5.4-rc1 into android-mainline
Linux 5.4-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I15eec52df70f829acf81ff614a1c2a5fb443a4e0
2019-10-02 19:10:07 +02:00
Greg Kroah-Hartman
94139142d9 Merge 5.4-rc1-prelrease into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e
2019-10-02 17:58:47 +02:00
Aleksa Sarai
f14c234b4b clone3: switch to copy_struct_from_user()
Switch clone3() syscall from it's own copying struct clone_args from
userspace to the new dedicated copy_struct_from_user() helper.

The change is very straightforward, and helps unify the syscall
interface for struct-from-userspace syscalls. Additionally, explicitly
define CLONE_ARGS_SIZE_VER0 to match the other users of the
struct-extension pattern.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
[christian.brauner@ubuntu.com: improve commit message]
Link: https://lore.kernel.org/r/20191001011055.19283-3-cyphar@cyphar.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-10-01 15:45:10 +02:00
Linus Torvalds
9c5efe9ae7 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:

 - Apply a number of membarrier related fixes and cleanups, which fixes
   a use-after-free race in the membarrier code

 - Introduce proper RCU protection for tasks on the runqueue - to get
   rid of the subtle task_rcu_dereference() interface that was easy to
   get wrong

 - Misc fixes, but also an EAS speedup

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Avoid redundant EAS calculation
  sched/core: Remove double update_max_interval() call on CPU startup
  sched/core: Fix preempt_schedule() interrupt return comment
  sched/fair: Fix -Wunused-but-set-variable warnings
  sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
  sched/membarrier: Return -ENOMEM to userspace on memory allocation failure
  sched/membarrier: Skip IPIs when mm->mm_users == 1
  selftests, sched/membarrier: Add multi-threaded test
  sched/membarrier: Fix p->mm->membarrier_state racy load
  sched/membarrier: Call sync_core only before usermode for same mm
  sched/membarrier: Remove redundant check
  sched/membarrier: Fix private expedited registration check
  tasks, sched/core: RCUify the assignment of rq->curr
  tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code
  tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
  tasks: Add a count of task RCU users
  sched/core: Convert vcpu_is_preempted() from macro to an inline function
  sched/fair: Remove unused cfs_rq_clock_task() function
2019-09-28 12:39:07 -07:00
Sai Praneeth Prakhya
8495f7e673 fork: improve error message for corrupted page tables
When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared).  For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g:
Resident file mapping pages vs Resident shared memory pages).  The loop
index in check_mm() is used to index rss_stat[] which represents
individual memory type stats.  Hence, instead of printing index, print
memory type, thereby improving error message.

Without patch:
--------------
[  204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
[  204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
[  204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
[  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480

With patch:
-----------
[   69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
[   69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
[   69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
[   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.

Link: http://lkml.kernel.org/r/da75b5153f617f4c5739c08ee6ebeb3d19db0fbc.1565123758.git.sai.praneeth.prakhya@intel.com
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
Eric W. Biederman
0ff7b2cfba tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
In the ordinary case today the RCU grace period for a task_struct is
triggered when another process wait's for it's zombine and causes the
kernel to call release_task().  As the waiting task has to receive a
signal and then act upon it before this happens, typically this will
occur after the original task as been removed from the runqueue.

Unfortunaty in some cases such as self reaping tasks it can be shown
that release_task() will be called starting the grace period for
task_struct long before the task leaves the runqueue.

Therefore use put_task_struct_rcu_user() in finish_task_switch() to
guarantee that the there is a RCU lifetime after the task
leaves the runqueue.

Besides the change in the start of the RCU grace period for the
task_struct this change may cause perf_event_delayed_put and
trace_sched_process_free.  The function perf_event_delayed_put boils
down to just a WARN_ON for cases that I assume never show happen.  So
I don't see any problem with delaying it.

The function trace_sched_process_free is a trace point and thus
visible to user space.  Occassionally userspace has the strangest
dependencies so this has a miniscule chance of causing a regression.
This change only changes the timing of when the tracepoint is called.
The change in timing arguably gives userspace a more accurate picture
of what is going on.  So I don't expect there to be a regression.

In the case where a task self reaps we are pretty much guaranteed that
the RCU grace period is delayed.  So we should get quite a bit of
coverage in of this worst case for the change in a normal threaded
workload.  So I expect any issues to turn up quickly or not at all.

I have lightly tested this change and everything appears to work
fine.

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Inspired-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87r24jdpl5.fsf_-_@x220.int.ebiederm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-25 17:42:29 +02:00
Eric W. Biederman
3fbd7ee285 tasks: Add a count of task RCU users
Add a count of the number of RCU users (currently 1) of the task
struct so that we can later add the scheduler case and get rid of the
very subtle task_rcu_dereference(), and just use rcu_dereference().

As suggested by Oleg have the count overlap rcu_head so that no
additional space in task_struct is required.

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Inspired-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87woebdplt.fsf_-_@x220.int.ebiederm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-25 17:42:29 +02:00
Greg Kroah-Hartman
896be8f44d Merge 5.4-rc1-prereleae into android-mainline
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I29b683c837ed1a3324644dbf9bf863f30740cd0b
2019-09-23 14:14:08 +02:00
Linus Torvalds
84da111de0 Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
 "This is more cleanup and consolidation of the hmm APIs and the very
  strongly related mmu_notifier interfaces. Many places across the tree
  using these interfaces are touched in the process. Beyond that a
  cleanup to the page walker API and a few memremap related changes
  round out the series:

   - General improvement of hmm_range_fault() and related APIs, more
     documentation, bug fixes from testing, API simplification &
     consolidation, and unused API removal

   - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
     and make them internal kconfig selects

   - Hoist a lot of code related to mmu notifier attachment out of
     drivers by using a refcount get/put attachment idiom and remove the
     convoluted mmu_notifier_unregister_no_release() and related APIs.

   - General API improvement for the migrate_vma API and revision of its
     only user in nouveau

   - Annotate mmu_notifiers with lockdep and sleeping region debugging

  Two series unrelated to HMM or mmu_notifiers came along due to
  dependencies:

   - Allow pagemap's memremap_pages family of APIs to work without
     providing a struct device

   - Make walk_page_range() and related use a constant structure for
     function pointers"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
  libnvdimm: Enable unit test infrastructure compile checks
  mm, notifier: Catch sleeping/blocking for !blockable
  kernel.h: Add non_block_start/end()
  drm/radeon: guard against calling an unpaired radeon_mn_unregister()
  csky: add missing brackets in a macro for tlb.h
  pagewalk: use lockdep_assert_held for locking validation
  pagewalk: separate function pointers from iterator data
  mm: split out a new pagewalk.h header from mm.h
  mm/mmu_notifiers: annotate with might_sleep()
  mm/mmu_notifiers: prime lockdep
  mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
  mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
  mm/hmm: hmm_range_fault() infinite loop
  mm/hmm: hmm_range_fault() NULL pointer bug
  mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
  mm/mmu_notifiers: remove unregister_no_release
  RDMA/odp: remove ib_ucontext from ib_umem
  RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
  RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
  RDMA/mlx5: Use ib_umem_start instead of umem.address
  ...
2019-09-21 10:07:42 -07:00
Greg Kroah-Hartman
bfa0399bc8 Merge Linus's 5.4-rc1-prerelease branch into android-mainline
This merges Linus's tree as of commit b41dae061b ("Merge tag
'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
into android-mainline.

This "early" merge makes it easier to test and handle merge conflicts
instead of having to wait until the "end" of the merge window and handle
all 10000+ commits at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812
2019-09-20 16:07:54 -07:00
Linus Torvalds
7f2444d38f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Thomas Gleixner:
 "Timers and timekeeping updates:

   - A large overhaul of the posix CPU timer code which is a preparation
     for moving the CPU timer expiry out into task work so it can be
     properly accounted on the task/process.

     An update to the bogus permission checks will come later during the
     merge window as feedback was not complete before heading of for
     travel.

   - Switch the timerqueue code to use cached rbtrees and get rid of the
     homebrewn caching of the leftmost node.

   - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
     single function

   - Implement the separation of hrtimers to be forced to expire in hard
     interrupt context even when PREEMPT_RT is enabled and mark the
     affected timers accordingly.

   - Implement a mechanism for hrtimers and the timer wheel to protect
     RT against priority inversion and live lock issues when a (hr)timer
     which should be canceled is currently executing the callback.
     Instead of infinitely spinning, the task which tries to cancel the
     timer blocks on a per cpu base expiry lock which is held and
     released by the (hr)timer expiry code.

   - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
     resulting in faster access to timekeeping functions.

   - Updates to various clocksource/clockevent drivers and their device
     tree bindings.

   - The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  posix-cpu-timers: Fix permission check regression
  posix-cpu-timers: Always clear head pointer on dequeue
  hrtimer: Add a missing bracket and hide `migration_base' on !SMP
  posix-cpu-timers: Make expiry_active check actually work correctly
  posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
  tick: Mark sched_timer to expire in hard interrupt context
  hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
  x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
  posix-cpu-timers: Utilize timerqueue for storage
  posix-cpu-timers: Move state tracking to struct posix_cputimers
  posix-cpu-timers: Deduplicate rlimit handling
  posix-cpu-timers: Remove pointless comparisons
  posix-cpu-timers: Get rid of 64bit divisions
  posix-cpu-timers: Consolidate timer expiry further
  posix-cpu-timers: Get rid of zero checks
  rlimit: Rewrite non-sensical RLIMIT_CPU comment
  posix-cpu-timers: Respect INFINITY for hard RTTIME limit
  posix-cpu-timers: Switch thread group sampling to array
  posix-cpu-timers: Restructure expiry array
  posix-cpu-timers: Remove cputime_expires
  ...
2019-09-17 12:35:15 -07:00
Linus Torvalds
76f0f227cf Merge tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
 "The big change here is removal of support for SGI Altix"

* tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits)
  genirq: remove the is_affinity_mask_valid hook
  ia64: remove CONFIG_SWIOTLB ifdefs
  ia64: remove support for machvecs
  ia64: move the screen_info setup to common code
  ia64: move the ROOT_DEV setup to common code
  ia64: rework iommu probing
  ia64: remove the unused sn_coherency_id symbol
  ia64: remove the SGI UV simulator support
  ia64: remove the zx1 swiotlb machvec
  ia64: remove CONFIG_ACPI ifdefs
  ia64: remove CONFIG_PCI ifdefs
  ia64: remove the hpsim platform
  ia64: remove now unused machvec indirections
  ia64: remove support for the SGI SN2 platform
  drivers: remove the SGI SN2 IOC4 base support
  drivers: remove the SGI SN2 IOC3 base support
  qla2xxx: remove SGI SN2 support
  qla1280: remove SGI SN2 support
  misc/sgi-xp: remove SGI SN2 support
  char/mspec: remove SGI SN2 support
  ...
2019-09-16 15:32:01 -07:00
Linus Torvalds
c17112a5c4 Merge tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd/waitid updates from Christian Brauner:
 "This contains two features and various tests.

  First, it adds support for waiting on process through pidfds by adding
  the P_PIDFD type to the waitid() syscall. This completes the basic
  functionality of the pidfd api (cf. [1]). In the meantime we also have
  a new adition to the userspace projects that make use of the pidfd
  api. The qt project was nice enough to send a mail pointing out that
  they have a pr up to switch to the pidfd api (cf. [2]).

  Second, this tag contains an extension to the waitid() syscall to make
  it possible to wait on the current process group in a race free manner
  (even though the actual problem is very unlikely) by specifing 0
  together with the P_PGID type. This extension traces back to a
  discussion on the glibc development mailing list.

  There are also a range of tests for the features above. Additionally,
  the test-suite which detected the pidfd-polling race we fixed in [3]
  is included in this tag"

[1] https://lwn.net/Articles/794707/
[2] https://codereview.qt-project.org/c/qt/qtbase/+/108456
[3] commit b191d6491b ("pidfd: fix a poll race when setting exit_state")

* tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  waitid: Add support for waiting for the current process group
  tests: add pidfd poll tests
  tests: move common definitions and functions into pidfd.h
  pidfd: add pidfd_wait tests
  pidfd: add P_PIDFD to waitid()
2019-09-16 09:28:19 -07:00
Greg Kroah-Hartman
bc378eba65 Merge 5.3 into android-mainline
Linux 5.3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2d3f5eea4589c23da4dec57af3f9e9d74b151eca
2019-09-16 08:26:05 +02:00
Eugene Syromiatnikov
a0eb9abd8a fork: block invalid exit signals with clone3()
Previously, higher 32 bits of exit_signal fields were lost when copied
to the kernel args structure (that uses int as a type for the respective
field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
it has to be checked for sanity before use; for the legacy syscalls,
applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that
can break at least thread_group_leader.

This commit adds a check to copy_clone_args_from_user() to verify that
the exit signal is limited by CSIGNAL as with legacy clone() and that
the signal is valid. With this we don't get the legacy clone behavior
were an invalid signal could be handed down and would only be detected
and ignored in do_notify_parent(). Users of clone3() will now get a
proper error when they pass an invalid exit signal. Note, that this is
not user-visible behavior since no kernel with clone3() has been
released yet.

The following program will cause a splat on a non-fixed clone3() version
and will fail correctly on a fixed version:

 #define _GNU_SOURCE
 #include <linux/sched.h>
 #include <linux/types.h>
 #include <sched.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <unistd.h>

 int main(int argc, char *argv[])
 {
        pid_t pid = -1;
        struct clone_args args = {0};
        args.exit_signal = -1;

        pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
        if (pid < 0)
                exit(EXIT_FAILURE);

        if (pid == 0)
                exit(EXIT_SUCCESS);

        wait(NULL);

        exit(EXIT_SUCCESS);
 }

Fixes: 7f192e3cd3 ("fork: add clone3")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
[christian.brauner@ubuntu.com: simplify check and rework commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-09-12 14:56:33 +02:00
Thomas Gleixner
244d49e306 posix-cpu-timers: Move state tracking to struct posix_cputimers
Put it where it belongs and clean up the ifdeffery in fork completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de
2019-08-28 11:50:42 +02:00
Thomas Gleixner
3a245c0f11 posix-cpu-timers: Move expiry cache into struct posix_cputimers
The expiry cache belongs into the posix_cputimers container where the other
cpu timers information is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de
2019-08-28 11:50:35 +02:00
Thomas Gleixner
2b69942f90 posix-cpu-timers: Create a container struct
Per task/process data of posix CPU timers is all over the place which
makes the code hard to follow and requires ifdeffery.

Create a container to hold all this information in one place, so data is
consolidated and the ifdeffery can be confined to the posix timer header
file and removed from places like fork.

As a first step, move the cpu_timers list head array into the new struct
and clean up the initializers and simplify fork. The remaining #ifdef in
fork will be removed later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20190821192920.819418976@linutronix.de
2019-08-28 11:50:33 +02:00
Jason Gunthorpe
daa138a58c Merge branch 'odp_fixes' into hmm.git
From rdma.git

Jason Gunthorpe says:

====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================

The branch is based on v5.3-rc5 due to dependencies, and is being taken
into hmm.git due to dependencies in the next patches.

* odp_fixes:
  RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
  RDMA/mlx5: Use ib_umem_start instead of umem.address
  RDMA/core: Make invalidate_range a device operation
  RDMA/odp: Use kvcalloc for the dma_list and page_list
  RDMA/odp: Check for overflow when computing the umem_odp end
  RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
  RDMA/odp: Split creating a umem_odp from ib_umem_get
  RDMA/odp: Make the three ways to create a umem_odp clear
  RMDA/odp: Consolidate umem_odp initialization
  RDMA/odp: Make it clearer when a umem is an implicit ODP umem
  RDMA/odp: Iterate over the whole rbtree directly
  RDMA/odp: Use the common interval tree library instead of generic
  RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 20:58:18 -03:00
Jason Gunthorpe
c7d8b7824f hmm: use mmu_notifier_get/put for 'struct hmm'
This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.

mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.

It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.

The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.

Link: https://lore.kernel.org/r/20190806231548.25242-6-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20 09:35:02 -03:00
Christoph Hellwig
4189ff2348 kernel: only define task_struct_whitelist conditionally
If CONFIG_ARCH_TASK_STRUCT_ALLOCATOR is set task_struct_whitelist is
never called, and thus generates a compiler warning.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20190812065524.19959-5-hch@lst.de
Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-08-12 09:53:28 -07:00
Christian Brauner
3695eae5fe pidfd: add P_PIDFD to waitid()
This adds the P_PIDFD type to waitid().
One of the last remaining bits for the pidfd api is to make it possible
to wait on pidfds. With P_PIDFD added to waitid() the parts of userspace
that want to use the pidfd api to exclusively manage processes can do so
now.

One of the things this will unblock in the future is the ability to make
it possible to retrieve the exit status via waitid(P_PIDFD) for
non-parent processes if handed a _suitable_ pidfd that has this feature
set. This is similar to what you can do on FreeBSD with kqueue(). It
might even end up being possible to wait on a process as a non-parent if
an appropriate property is enabled on the pidfd.

With P_PIDFD no scoping of the process identified by the pidfd is
possible, i.e. it explicitly blocks things such as wait4(-1), wait4(0),
waitid(P_ALL), waitid(P_PGID) etc. It only allows for semantics
equivalent to wait4(pid), waitid(P_PID). Users that need scoping should
rely on pid-based wait*() syscalls for now.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20190727222229.6516-2-christian@brauner.io
2019-08-01 21:49:46 +02:00
Greg Kroah-Hartman
bea0791583 Merge 5.3-rc2 into android-mainline
Linux 5.3-rc2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4d36fd27ccc8cd773ba1b97dc3bd382e99a4dd7a
2019-07-29 08:40:17 +02:00
Jann Horn
16d51a590a sched/fair: Don't free p->numa_faults with concurrent readers
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25 15:37:04 +02:00
Greg Kroah-Hartman
37766c2946 Merge 5.3.0-rc1 into android-mainline
Linus 5.3-rc1 release

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic171e37d4c21ffa495240c5538852bbb5a9dcce8
2019-07-23 16:21:59 -07:00
Linus Torvalds
3c69914b4c Merge tag 'for-linus-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd and clone3 fixes from Christian Brauner:
 "This contains a bugfix for CLONE_PIDFD when used with the legacy clone
  syscall, two fixes to ensure that syscall numbering and clone3
  entrypoint implementations will stay consistent, and an update for the
  maintainers file:

   - The addition of clone3 broke CLONE_PIDFD for legacy clone on all
     architectures that use do_fork() directly instead of calling the
     clone syscall itself. (Fwiw, cleaning do_fork() up is on my todo.)

     The reason this happened was that during conversion of _do_fork()
     to use struct kernel_clone_args we missed that do_fork() is called
     directly by various architectures. This is fixed by making sure
     that the pidfd argument in struct kernel_clone_args is correctly
     initialized with the parent_tidptr argument passed down from
     do_fork(). Additionally, do_fork() missed a check to make
     CLONE_PIDFD and CLONE_PARENT_SETTID mutually exclusive just a
     clone() does. This is now fixed too.

   - When clone3() was introduced we skipped architectures that require
     special handling for fork-like syscalls. Their syscall tables did
     not contain any mention of clone3().

     To make sure that Arnd's work to make syscall numbers on all
     architectures identical (minus alpha) was not for naught we are
     placing a comment in all syscall tables that do not yet implement
     clone3(). The comment makes it clear that 435 is reserved for
     clone3 and should not be used.

   - Also, this contains a patch to make the clone3() syscall definition
     in asm-generic/unist.h conditional on __ARCH_WANT_SYS_CLONE3. This
     lets us catch new architectures that implicitly make use of clone3
     without setting __ARCH_WANT_SYS_CLONE3 which is a good indicator
     that they did not check whether it needs special treatment or not.

   - Finally, this contains a patch to add me as maintainer for pidfd
     stuff so people can start blaming me (more)"

* tag 'for-linus-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  MAINTAINERS: add new entry for pidfd api
  unistd: protect clone3 via __ARCH_WANT_SYS_CLONE3
  arch: mark syscall number 435 reserved for clone3
  clone: fix CLONE_PIDFD support
2019-07-16 11:30:07 -07:00
Linus Torvalds
fec88ab0af Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull HMM updates from Jason Gunthorpe:
 "Improvements and bug fixes for the hmm interface in the kernel:

   - Improve clarity, locking and APIs related to the 'hmm mirror'
     feature merged last cycle. In linux-next we now see AMDGPU and
     nouveau to be using this API.

   - Remove old or transitional hmm APIs. These are hold overs from the
     past with no users, or APIs that existed only to manage cross tree
     conflicts. There are still a few more of these cleanups that didn't
     make the merge window cut off.

   - Improve some core mm APIs:
       - export alloc_pages_vma() for driver use
       - refactor into devm_request_free_mem_region() to manage
         DEVICE_PRIVATE resource reservations
       - refactor duplicative driver code into the core dev_pagemap
         struct

   - Remove hmm wrappers of improved core mm APIs, instead have drivers
     use the simplified API directly

   - Remove DEVICE_PUBLIC

   - Simplify the kconfig flow for the hmm users and core code"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
  mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
  mm: remove the HMM config option
  mm: sort out the DEVICE_PRIVATE Kconfig mess
  mm: simplify ZONE_DEVICE page private data
  mm: remove hmm_devmem_add
  mm: remove hmm_vma_alloc_locked_page
  nouveau: use devm_memremap_pages directly
  nouveau: use alloc_page_vma directly
  PCI/P2PDMA: use the dev_pagemap internal refcount
  device-dax: use the dev_pagemap internal refcount
  memremap: provide an optional internal refcount in struct dev_pagemap
  memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
  memremap: remove the data field in struct dev_pagemap
  memremap: add a migrate_to_ram method to struct dev_pagemap_ops
  memremap: lift the devmap_enable manipulation into devm_memremap_pages
  memremap: pass a struct dev_pagemap to ->kill and ->cleanup
  memremap: move dev_pagemap callbacks into a separate structure
  memremap: validate the pagemap type passed to devm_memremap_pages
  mm: factor out a devm_request_free_mem_region helper
  mm: export alloc_pages_vma
  ...
2019-07-14 19:42:11 -07:00
Dmitry V. Levin
028b6e8a89 clone: fix CLONE_PIDFD support
The introduction of clone3 syscall accidentally broke CLONE_PIDFD
support in traditional clone syscall on compat x86 and those
architectures that use do_fork to implement clone syscall.

This bug was found by strace test suite.

Link: https://strace.io/logs/strace/2019-07-12
Fixes: 7f192e3cd3 ("fork: add clone3")
Bisected-and-tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Link: https://lore.kernel.org/r/20190714162047.GB10389@altlinux.org
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-14 20:36:12 +02:00
Linus Torvalds
8f6ccf6159 Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull clone3 system call from Christian Brauner:
 "This adds the clone3 syscall which is an extensible successor to clone
  after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
  window for clone(). It cleanly supports all of the flags from clone()
  and thus all legacy workloads.

  There are few user visible differences between clone3 and clone.
  First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
  this flag. Second, the CSIGNAL flag is deprecated and will cause
  EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
  argument in struct clone_args thus freeing up even more flags. And
  third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
  clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
  argument.

  The clone3 uapi is designed to be easy to handle on 32- and 64 bit:

    /* uapi */
    struct clone_args {
            __aligned_u64 flags;
            __aligned_u64 pidfd;
            __aligned_u64 child_tid;
            __aligned_u64 parent_tid;
            __aligned_u64 exit_signal;
            __aligned_u64 stack;
            __aligned_u64 stack_size;
            __aligned_u64 tls;
    };

  and a separate kernel struct is used that uses proper kernel typing:

    /* kernel internal */
    struct kernel_clone_args {
            u64 flags;
            int __user *pidfd;
            int __user *child_tid;
            int __user *parent_tid;
            int exit_signal;
            unsigned long stack;
            unsigned long stack_size;
            unsigned long tls;
    };

  The system call comes with a size argument which enables the kernel to
  detect what version of clone_args userspace is passing in. clone3
  validates that any additional bytes a given kernel does not know about
  are set to zero and that the size never exceeds a page.

  A nice feature is that this patchset allowed us to cleanup and
  simplify various core kernel codepaths in kernel/fork.c by making the
  internal _do_fork() function take struct kernel_clone_args even for
  legacy clone().

  This patch also unblocks the time namespace patchset which wants to
  introduce a new CLONE_TIMENS flag.

  Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
  xtensa. These were the architectures that did not require special
  massaging.

  Other architectures treat fork-like system calls individually and
  after some back and forth neither Arnd nor I felt confident that we
  dared to add clone3 unconditionally to all architectures. We agreed to
  leave this up to individual architecture maintainers. This is why
  there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
  which any architecture can set once it has implemented support for
  clone3. The patch also adds a cond_syscall(clone3) for architectures
  such as nios2 or h8300 that generate their syscall table by simply
  including asm-generic/unistd.h. The hope is to get rid of
  __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"

* tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  arch: handle arches who do not yet define clone3
  arch: wire-up clone3() syscall
  fork: add clone3
2019-07-11 10:09:44 -07:00
Linus Torvalds
5450e8a316 Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner:
 "This adds two main features.

   - First, it adds polling support for pidfds. This allows process
     managers to know when a (non-parent) process dies in a race-free
     way.

     The notification mechanism used follows the same logic that is
     currently used when the parent of a task is notified of a child's
     death. With this patchset it is possible to put pidfds in an
     {e}poll loop and get reliable notifications for process (i.e.
     thread-group) exit.

   - The second feature compliments the first one by making it possible
     to retrieve pollable pidfds for processes that were not created
     using CLONE_PIDFD.

     A lot of processes get created with traditional PID-based calls
     such as fork() or clone() (without CLONE_PIDFD). For these
     processes a caller can currently not create a pollable pidfd. This
     is a problem for Android's low memory killer (LMK) and service
     managers such as systemd.

  Both patchsets are accompanied by selftests.

  It's perhaps worth noting that the work done so far and the work done
  in this branch for pidfd_open() and polling support do already see
  some adoption:

   - Android is in the process of backporting this work to all their LTS
     kernels [1]

   - Service managers make use of pidfd_send_signal but will need to
     wait until we enable waiting on pidfds for full adoption.

   - And projects I maintain make use of both pidfd_send_signal and
     CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

[2] aab6e3eb73/src/lxc/start.c (L1753)

* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add pidfd_open() tests
  arch: wire-up pidfd_open()
  pid: add pidfd_open()
  pidfd: add polling selftests
  pidfd: add polling support
2019-07-10 22:17:21 -07:00
Linus Torvalds
dad1c12ed8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Remove the unused per rq load array and all its infrastructure, by
   Dietmar Eggemann.

 - Add utilization clamping support by Patrick Bellasi. This is a
   refinement of the energy aware scheduling framework with support for
   boosting of interactive and capping of background workloads: to make
   sure critical GUI threads get maximum frequency ASAP, and to make
   sure background processing doesn't unnecessarily move to cpufreq
   governor to higher frequencies and less energy efficient CPU modes.

 - Add the bare minimum of tracepoints required for LISA EAS regression
   testing, by Qais Yousef - which allows automated testing of various
   power management features, including energy aware scheduling.

 - Restructure the former tsk_nr_cpus_allowed() facility that the -rt
   kernel used to modify the scheduler's CPU affinity logic such as
   migrate_disable() - introduce the task->cpus_ptr value instead of
   taking the address of &task->cpus_allowed directly - by Sebastian
   Andrzej Siewior.

 - Misc optimizations, fixes, cleanups and small enhancements - see the
   Git log for details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  sched/uclamp: Add uclamp support to energy_compute()
  sched/uclamp: Add uclamp_util_with()
  sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
  sched/uclamp: Set default clamps for RT tasks
  sched/uclamp: Reset uclamp values on RESET_ON_FORK
  sched/uclamp: Extend sched_setattr() to support utilization clamping
  sched/core: Allow sched_setattr() to use the current policy
  sched/uclamp: Add system default clamps
  sched/uclamp: Enforce last task's UCLAMP_MAX
  sched/uclamp: Add bucket local max tracking
  sched/uclamp: Add CPU's clamp buckets refcounting
  sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
  sched/debug: Export the newly added tracepoints
  sched/debug: Add sched_overutilized tracepoint
  sched/debug: Add new tracepoint to track PELT at se level
  sched/debug: Add new tracepoints to track PELT at rq level
  sched/debug: Add a new sched_trace_*() helper functions
  sched/autogroup: Make autogroup_path() always available
  sched/wait: Deduplicate code with do-while
  sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
  ...
2019-07-08 16:39:53 -07:00
Linus Torvalds
e192832869 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - rwsem scalability improvements, phase #2, by Waiman Long, which are
     rather impressive:

       "On a 2-socket 40-core 80-thread Skylake system with 40 reader
        and writer locking threads, the min/mean/max locking operations
        done in a 5-second testing window before the patchset were:

         40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
         40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

        After the patchset, they became:

         40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
         40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

     There's a lot of changes to the locking implementation that makes
     it similar to qrwlock, including owner handoff for more fair
     locking.

     Another microbenchmark shows how across the spectrum the
     improvements are:

       "With a locking microbenchmark running on 5.1 based kernel, the
        total locking rates (in kops/s) on a 2-socket Skylake system
        with equal numbers of readers and writers (mixed) before and
        after this patchset were:

        # of Threads   Before Patch      After Patch
        ------------   ------------      -----------
             2            2,618             4,193
             4            1,202             3,726
             8              802             3,622
            16              729             3,359
            32              319             2,826
            64              102             2,744"

     The changes are extensive and the patch-set has been through
     several iterations addressing various locking workloads. There
     might be more regressions, but unless they are pathological I
     believe we want to use this new implementation as the baseline
     going forward.

   - jump-label optimizations by Daniel Bristot de Oliveira: the primary
     motivation was to remove IPI disturbance of isolated RT-workload
     CPUs, which resulted in the implementation of batched jump-label
     updates. Beyond the improvement of the real-time characteristics
     kernel, in one test this patchset improved static key update
     overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
     as well.

   - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
     ~10 years of atomic64_t existence the various types used by the
     APIs only had to be self-consistent within each architecture -
     which means they became wildly inconsistent across architectures.
     Mark puts and end to this by reworking all the atomic64
     implementations to use 's64' as the base type for atomic64_t, and
     to ensure that this type is consistently used for parameters and
     return values in the API, avoiding further problems in this area.

   - A large set of small improvements to lockdep by Yuyang Du: type
     cleanups, output cleanups, function return type and othr cleanups
     all around the place.

   - A set of percpu ops cleanups and fixes by Peter Zijlstra.

   - Misc other changes - please see the Git log for more details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
  locking/lockdep: increase size of counters for lockdep statistics
  locking/atomics: Use sed(1) instead of non-standard head(1) option
  locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
  x86/jump_label: Make tp_vec_nr static
  x86/percpu: Optimize raw_cpu_xchg()
  x86/percpu, sched/fair: Avoid local_clock()
  x86/percpu, x86/irq: Relax {set,get}_irq_regs()
  x86/percpu: Relax smp_processor_id()
  x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
  locking/rwsem: Guard against making count negative
  locking/rwsem: Adaptive disabling of reader optimistic spinning
  locking/rwsem: Enable time-based spinning on reader-owned rwsem
  locking/rwsem: Make rwsem->owner an atomic_long_t
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Clarify usage of owner's nonspinaable bit
  locking/rwsem: Wake up almost all readers in wait queue
  locking/rwsem: More optimal RT task handling of null owner
  locking/rwsem: Always release wait_lock before waking up tasks
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Make rwsem_spin_on_owner() return owner state
  ...
2019-07-08 16:12:03 -07:00
Linus Torvalds
927ba67a63 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer and timekeeping departement delivers:

  Core:

   - The consolidation of the VDSO code into a generic library including
     the conversion of x86 and ARM64. Conversion of ARM and MIPS are en
     route through the relevant maintainer trees and should end up in
     5.4.

     This gets rid of the unnecessary different copies of the same code
     and brings all architectures on the same level of VDSO
     functionality.

   - Make the NTP user space interface more robust by restricting the
     TAI offset to prevent undefined behaviour. Includes a selftest.

   - Validate user input in the compat settimeofday() syscall to catch
     invalid values which would be turned into valid values by a
     multiplication overflow

   - Consolidate the time accessors

   - Small fixes, improvements and cleanups all over the place

  Drivers:

   - Support for the NXP system counter, TI davinci timer

   - Move the Microsoft HyperV clocksource/events code into the
     drivers/clocksource directory so it can be shared between x86 and
     ARM64.

   - Overhaul of the Tegra driver

   - Delay timer support for IXP4xx

   - Small fixes, improvements and cleanups as usual"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  time: Validate user input in compat_settimeofday()
  timer: Document TIMER_PINNED
  clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
  clocksource/drivers: Make Hyper-V clocksource ISA agnostic
  MAINTAINERS: Fix Andy's surname and the directory entries of VDSO
  hrtimer: Use a bullet for the returns bullet list
  arm64: vdso: Fix compilation with clang older than 8
  arm64: compat: Fix __arch_get_hw_counter() implementation
  arm64: Fix __arch_get_hw_counter() implementation
  lib/vdso: Make delta calculation work correctly
  MAINTAINERS: Add entry for the generic VDSO library
  arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system
  arm64: vdso: Remove unnecessary asm-offsets.c definitions
  vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h
  clocksource/drivers/davinci: Add support for clocksource
  clocksource/drivers/davinci: Add support for clockevents
  clocksource/drivers/tegra: Set up maximum-ticks limit properly
  clocksource/drivers/tegra: Cycles can't be 0
  clocksource/drivers/tegra: Restore base address before cleanup
  clocksource/drivers/tegra: Add verbose definition for 1MHz constant
  ...
2019-07-08 11:06:29 -07:00
Greg Kroah-Hartman
a4bbf3df04 Merge 5.2 into android-common
Linux 5.2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-08 08:24:40 +02:00