Pull NVMe updates from Jens Axboe:
"Last branch for this series is the nvme changes. It's in a separate
branch to avoid splitting too much between core and NVMe changes,
since NVMe is still helping drive some blk-mq changes. That said, not
a huge amount of core changes in here. The grunt of the work is the
continued split of the code"
* 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits)
uapi: update install list after nvme.h rename
NVMe: Export NVMe attributes to sysfs group
NVMe: Shutdown controller only for power-off
NVMe: IO queue deletion re-write
NVMe: Remove queue freezing on resets
NVMe: Use a retryable error code on reset
NVMe: Fix admin queue ring wrap
nvme: make SG_IO support optional
nvme: fixes for NVME_IOCTL_IO_CMD on the char device
nvme: synchronize access to ctrl->namespaces
nvme: Move nvme_freeze/unfreeze_queues to nvme core
PCI/AER: include header file
NVMe: Export namespace attributes to sysfs
NVMe: Add pci error handlers
block: remove REQ_NO_TIMEOUT flag
nvme: merge iod and cmd_info
nvme: meta_sg doesn't have to be an array
nvme: properly free resources for cancelled command
nvme: simplify completion handling
nvme: special case AEN requests
...
Pull lightnvm fixes and updates from Jens Axboe:
"This should have been part of the drivers branch, but it arrived a bit
late and wasn't based on the official core block driver branch. So
they got a small scolding, but got a pass since it's still new. Hence
it's in a separate branch.
This is mostly pure fixes, contained to lightnvm/, and minor feature
additions"
* 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block: (26 commits)
lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
lightnvm: introduce factory reset
lightnvm: use system block for mm initialization
lightnvm: introduce ioctl to initialize device
lightnvm: core on-disk initialization
lightnvm: introduce mlc lower page table mappings
lightnvm: add mccap support
lightnvm: manage open and closed blocks separately
lightnvm: fix missing grown bad block type
lightnvm: reference rrpc lun in rrpc block
lightnvm: introduce nvm_submit_ppa
lightnvm: move rq->error to nvm_rq->error
lightnvm: support multiple ppas in nvm_erase_ppa
lightnvm: move the pages per block check out of the loop
lightnvm: sectors first in ppa list
lightnvm: fix locking and mempool in rrpc_lun_gc
lightnvm: put block back to gc list on its reclaim fail
lightnvm: check bi_error in gc
lightnvm: return the get_bb_tbl return value
lightnvm: refactor end_io functions for sync
...
Pull block driver updates from Jens Axboe:
"This is the block driver pull request for 4.5, with the exception of
NVMe, which is in a separate branch and will be posted after this one.
This pull request contains:
- A set of bcache stability fixes, which have been acked by Kent.
These have been used and tested for more than a year by the
community, so it's about time that they got in.
- A set of drbd updates from the drbd team (Andreas, Lars, Philipp)
and Markus Elfring, Oleg Drokin.
- A set of fixes for xen blkback/front from the usual suspects, (Bob,
Konrad) as well as community based fixes from Kiri, Julien, and
Peng.
- A 2038 time fix for sx8 from Shraddha, with a fix from me.
- A small mtip32xx cleanup from Zhu Yanjun.
- A null_blk division fix from Arnd"
* 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits)
null_blk: use sector_div instead of do_div
mtip32xx: restrict variables visible in current code module
xen/blkfront: Fix crash if backend doesn't follow the right states.
xen/blkback: Fix two memory leaks.
xen/blkback: make st_ statistics per ring
xen/blkfront: Handle non-indirect grant with 64KB pages
xen-blkfront: Introduce blkif_ring_get_request
xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule()
xen/blkback: Free resources if connect_ring failed.
xen/blocks: Return -EXX instead of -1
xen/blkback: make pool of persistent grants and free pages per-queue
xen/blkback: get the number of hardware queues/rings from blkfront
xen/blkback: pseudo support for multi hardware queues/rings
xen/blkback: separate ring information out of struct xen_blkif
xen/blkfront: correct setting for xen_blkif_max_ring_order
xen/blkfront: make persistent grants pool per-queue
xen/blkfront: Remove duplicate setting of ->xbdev.
xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.
xen/blkfront: negotiate number of queues/rings to be used with backend
xen/blkfront: split per device io_lock
...
After THP refcounting rework we have only two possible return values
from pmd_trans_huge_lock(): success and failure. Return-by-pointer for
ptl doesn't make much sense in this case.
Let's convert pmd_trans_huge_lock() to return ptl on success and NULL on
failure.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding these pr_info and pr_err definitions so as to allow code to be
compiled successfully for testing in userspace, since the printk has
been replaced by pr_info and pr_err in algos.c
Absence of these definitions result in the compilation errors
such as ' undefined reference to `pr_info' ' ' undefined reference to
`pr_err' '
Cc: NeilBrown <neilb@suse.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Expose an interface to allow users to mark several accesses together as
being user space accesses, allowing batching of the surrounding user
space access markers (SMAP on x86, PAN on arm64, domain register
switching on arm).
This is currently only used for the user string lenth and copying
functions, where the SMAP overhead on x86 drowned the actual user
accesses (only noticeable on newer microarchitectures that support SMAP
in the first place, of course).
* user access batching branch:
Use the new batched user accesses in generic user string handling
Add 'unsafe' user access functions for batched accesses
x86: reorganize SMAP handling in user space accesses
Merge third patch-bomb from Andrew Morton:
"I'm pretty much done for -rc1 now:
- the rest of MM, basically
- lib/ updates
- checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit
- cpu_mask simplifications
- kexec, rapidio, MAINTAINERS etc, etc.
- more dma-mapping cleanups/simplifications from hch"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
MAINTAINERS: add/fix git URLs for various subsystems
mm: memcontrol: add "sock" to cgroup2 memory.stat
mm: memcontrol: basic memory statistics in cgroup2 memory controller
mm: memcontrol: do not uncharge old page in page cache replacement
Documentation: cgroup: add memory.swap.{current,max} description
mm: free swap cache aggressively if memcg swap is full
mm: vmscan: do not scan anon pages if memcg swap limit is hit
swap.h: move memcg related stuff to the end of the file
mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
mm: vmscan: pass memcg to get_scan_count()
mm: memcontrol: charge swap to cgroup2
mm: memcontrol: clean up alloc, online, offline, free functions
mm: memcontrol: flatten struct cg_proto
mm: memcontrol: rein in the CONFIG space madness
net: drop tcp_memcontrol.c
mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
mm: memcontrol: allow to disable kmem accounting for cgroup2
mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
mm: memcontrol: separate kmem code from legacy tcp accounting code
...
Pull pwm updates from Thierry Reding:
"This set of changes contains a new driver for OMAP (using the
dual-mode timers) as well as an assortment of fixes all across the
board"
* tag 'pwm/for-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Mark all devices as "might sleep"
pwm: omap-dmtimer: Potential NULL dereference on error
pwm: add HAS_IOMEM dependency to PWM_FSL_FTM
pwm: Add PWM driver for OMAP using dual-mode timers
pwm: rcar: Improve accuracy of frequency division setting
pwm: lpc32xx: return ERANGE, if requested period is not supported
pwm: lpc32xx: fix and simplify duty cycle and period calculations
pwm: lpc32xx: make device usable with common clock framework
pwm: lpc32xx: correct number of PWM channels from 2 to 1
dt: lpc32xx: pwm: update documentation of LPC32xx PWM device
dt: lpc32xx: pwm: correct LPC32xx PWM device node example
pwm: fsl-ftm: Fix clock enable/disable when using PM
pwm: lpss: Rework the sequence of programming PWM_SW_UPDATE
pwm: lpss: Select core part automatically
pwm: lpss: Update PWM setting for Broxton
pwm: bcm2835: Fix email address specification
pwm: bcm2835: Prevent division by zero
pwm: bcm2835: Calculate scaler in ->config()
pwm: lpss: Remove ->free() callback
There are a number of problems with revoking a "was sending" message:
(1) We never make any attempt to revoke data - only kvecs contibute to
con->out_skip. However, once the header (envelope) is written to the
socket, our peer learns data_len and sets itself to expect at least
data_len bytes to follow front or front+middle. If ceph_msg_revoke()
is called while the messenger is sending message's data portion,
anything we send after that call is counted by the OSD towards the now
revoked message's data portion. The effects vary, the most common one
is the eventual hang - higher layers get stuck waiting for the reply to
the message that was sent out after ceph_msg_revoke() returned and
treated by the OSD as a bunch of data bytes. This is what Matt ran
into.
(2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
is wrong. If ceph_msg_revoke() is called before the tag is sent out or
while the messenger is sending the header, we will get a connection
reset, either due to a bad tag (0 is not a valid tag) or a bad header
CRC, which kind of defeats the purpose of revoke. Currently the kernel
client refuses to work with header CRCs disabled, but that will likely
change in the future, making this even worse.
(3) con->out_skip is not reset on connection reset, leading to one or
more spurious connection resets if we happen to get a real one between
con->out_skip is set in ceph_msg_revoke() and before it's cleared in
write_partial_skip().
Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never
zero the tag or the header, i.e. send out tag+header regardless of when
ceph_msg_revoke() is called. That way the header is always correct, no
unnecessary resets are induced and revoke stands ready for disabled
CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new
"message out temp" and copy the header into it before sending.
Cc: stable@vger.kernel.org # 4.0+
Reported-by: Matt Conner <matt.conner@keepertech.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Matt Conner <matt.conner@keepertech.com>
Reviewed-by: Sage Weil <sage@redhat.com>
This patch makes ceph_frag_contains_value return bool to improve
readability due to this particular function only using either one or
zero as its return value.
No functional change.
Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
These functions were introduced in commit 3d14c5d2b ("ceph: factor
out libceph from Ceph file system"). Howover, there's no user of
these functions since then, so remove them for simplicity.
Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
There is one common bug left in all the event_function_call() users,
between loading ctx->task and getting to the remote_function(),
ctx->task can already have been changed.
Therefore we need to double check and retry if ctx->task != current.
Insert another trampoline specific to event_function_call() that
checks for this and further validates state. This also allows getting
rid of the active/inactive functions.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When modifying a QP, the desired operation was determined in
the mlx5_core using a transition table that takes the current
state, the final state, and returns the desired operation.
Since this logic will be used for Raw Packet QP, move the
operation table to the mlx5_ib.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the user changes the Address Vector(AV) in the modify QP, he
provides an SL. This SL should be translated to Ethernet Priority
by taking the 3 LSB bits, and modify the QP's TIS according to this
Ethernet priority.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since Raw Packet QP is composed of RQ and SQ, the IB QP's
state is derived from the sub-objects. Therefore we need
to query each one of the sub-objects, and decide on the
IB QP's state.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated
events are affiliated also with SQs and RQs.
Since SQ, RQ and QP resource numbers do not share the same name
space, a queue type field was added to the event data to specify
the SW object that the event is affiliated with.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To be used by mlx5_ib in the following patches for implementing
RAW PACKET QP.
Add mlx5_core_ prefix to alloc and delloc transport_domain since
they are exposed now.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull more power management and ACPI updates from Rafael Wysocki:
"This includes fixes on top of the previous batch of PM+ACPI updates
and some new material as well.
From the new material perspective the most significant are the driver
core changes that should allow USB devices to stay suspended over
system suspend/resume cycles if they have been runtime-suspended
already beforehand. Apart from that, ACPICA is updated to upstream
revision 20160108 (cosmetic mostly, but including one fixup on top of
the previous ACPICA update) and there are some devfreq updates the
didn't make it before (due to timing).
A few recent regressions are fixed, most importantly in the cpuidle
menu governor and in the ACPI backlight driver and some x86 platform
drivers depending on it.
Some more bugs are fixed and cleanups are made on top of that.
Specifics:
- Modify the driver core and the USB subsystem to allow USB devices
to stay suspended over system suspend/resume cycles if they have
been runtime-suspended already beforehand and fix some bugs on top
of these changes (Tomeu Vizoso, Rafael Wysocki).
- Update ACPICA to upstream revision 20160108, including updates of
the ACPICA's copyright notices, a code fixup resulting from a
regression fix that was necessary in the upstream code only (the
regression fixed by it has never been present in Linux) and a
compiler warning fix (Bob Moore, Lv Zheng).
- Fix a recent regression in the cpuidle menu governor that broke it
on practically all architectures other than x86 and make a couple
of optimizations on top of that fix (Rafael Wysocki).
- Clean up the selection of cpuidle governors depending on whether or
not the kernel is configured for tickless systems (Jean Delvare).
- Revert a recent commit that introduced a regression in the ACPI
backlight driver, address the problem it attempted to fix in a
different way and revert one more cosmetic change depending on the
problematic commit (Hans de Goede).
- Add two more ACPI backlight quirks (Hans de Goede).
- Fix a few minor problems in the core devfreq code, clean it up a
bit and update the MAINTAINERS information related to it (Chanwoo
Choi, MyungJoo Ham).
- Improve an error message in the ACPI fan driver (Andy Lutomirski).
- Fix a recent build regression in the cpupower tool (Shreyas
Prabhu)"
* tag 'pm+acpi-4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
cpuidle: menu: Avoid pointless checks in menu_select()
sched / idle: Drop default_idle_call() fallback from call_cpuidle()
cpupower: Fix build error in cpufreq-info
cpuidle: Don't enable all governors by default
cpuidle: Default to ladder governor on ticking systems
time: nohz: Expose tick_nohz_enabled
ACPICA: Update version to 20160108
ACPICA: Silence a -Wbad-function-cast warning when acpi_uintptr_t is 'uintptr_t'
ACPICA: Additional 2016 copyright changes
ACPICA: Reduce regression fix divergence from upstream ACPICA
ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830
ACPI / video: Revert "thinkpad_acpi: Use acpi_video_handles_brightness_key_presses()"
ACPI / video: Document acpi_video_handles_brightness_key_presses() a bit
ACPI / video: Fix using an uninitialized mutex / list_head in acpi_video_handles_brightness_key_presses()
ACPI / video: Revert "ACPI / video: driver must be registered before checking for keypresses"
ACPI / fan: Improve acpi_device_update_power error message
ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700
cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0
MAINTAINERS: Add devfreq-event entry
MAINTAINERS: Add missing git repository and directory for devfreq
...
Pull SH driver updates from Simon Horman:
"Clean up the clock API on legacy SH to make it more similar to the
Common Clock Framework. This will avoid different behaviour in
drivers shared between legacy and CCF-based platforms (e.g. SCIF)"
* tag 'renesas-sh-drivers-for-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
drivers: sh: clk: Avoid crashes when passing NULL clocks
drivers: sh: clk: Remove obsolete and unused clk_round_parent()
Pull ARM SoC driver updates from Olof Johansson:
"Driver updates for ARM SoCs. Some for SoC-family code under
drivers/soc, but also some other driver updates that don't belong
anywhere else. We also bring in the drivers/reset code through
arm-soc.
Some of the larger updates:
- Qualcomm support for SMEM, SMSM, SMP2P. All used to communicate
with other parts of the chip/board on these platforms, all
proprietary protocols that don't fit into other subsystems and live
in drivers/soc for now.
- System bus driver for UniPhier
- Driver for the TI Wakeup M3 IPC device
- Power management for Raspberry PI
+ Again a bunch of other smaller updates and patches"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits)
bus: uniphier: allow only built-in driver
ARM: bcm2835: clarify RASPBERRYPI_FIRMWARE dependency
MAINTAINERS: Drop Kumar Gala from QCOM
bus: uniphier-system-bus: add UniPhier System Bus driver
ARM: bcm2835: add rpi power domain driver
dt-bindings: add rpi power domain driver bindings
ARM: bcm2835: Define two new packets from the latest firmware.
drivers/soc: make mediatek/mtk-scpsys.c explicitly non-modular
soc: mediatek: SCPSYS: Add regulator support
MAINTAINERS: Change QCOM entries
soc: qcom: smd-rpm: Add existing platform support
memory/tegra: Add number of TLB lines for Tegra124
reset: hi6220: fix modular build
soc: qcom: Introduce WCNSS_CTRL SMD client
ARM: qcom: select ARM_CPU_SUSPEND for power management
MAINTAINERS: Add rules for Qualcomm dts files
soc: qcom: enable smsm/smp2p modular build
serial: msm_serial: Make config tristate
soc: qcom: smp2p: Qualcomm Shared Memory Point to Point
soc: qcom: smsm: Add driver for Qualcomm SMSM
...
Pull ARM SoC platform updates from Olof Johansson:
"Updates for new platform support:
- New platform: Tango4 from Sigma Designs.
- Broadcom BCM2836 (Raspberry Pi 2 SoC)
- Enable cpufreq on Freescale i.MX7D
- Rockchip: SMP support for rk3036, general support for rk3228
- SMP support on Broadcom Kona and NSP
- Cleanups for OMAP removing legacy IOMMU data
+ a bunch of misc fixes and tweaks for various platforms"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits)
ARM: tango: Fix UP build issues
ARM: tango: pass ARM arch level for smc.S
ARM: bcm2835: Add Kconfig support for bcm2836
ARM: OMAP2+: Add support for dm814x and dra62x usb
ARM: OMAP2+: Add mmc hwmod entries for dm814x
ARM: OMAP2+: Update 81xx clock and power domains for default, active and sgx
ARM: OMAP2+: Fix SoC detection for dra62x j5-eco
ARM: tango4: Initial platform support
ARM: bcm2835: Add a compat string for bcm2836 machine probe
dt-bindings: Add root properties for Raspberry Pi 2
ARM: imx: select SRC for i.MX7
ARM: uniphier: select PINCTRL
ARM: OMAP2+: Remove device creation for omap-pcm-audio
ARM: OMAP1: Remove device creation for omap-pcm-audio
ARM: rockchip: enable support for RK3228 SoCs
ARM: rockchip: use const and __initconst for rk3036 smp_operations
ARM: zynq: Select ARCH_HAS_RESET_CONTROLLER
ARM: BCM: Add SMP support for Broadcom 4708
ARM: BCM: Add SMP support for Broadcom NSP
ARM: BCM: Clean up SMP support for Broadcom Kona
...
Pull ARM SoC multiplatform code updates from Arnd Bergmann:
"This branch is the culmination of 5 years of effort to bring the ARMv6
and ARMv7 platforms together such that they can all be enabled and
boot the same kernel. It has been a tremendous amount of cleanup and
refactoring by a huge number of people, and creation of several new
(and major) subsystems to better abstract out all the platform details
in an appropriate manner.
The bulk of this branch is a large patchset from Arnd that brings
several of the more minor and older platforms we have closer to
multiplatform support. Among these are MMP, S3C64xx, Orion5x, mv78xx0
and realview Much of this is moving around header files from old mach
directories, but there are also some cleanup patches of debug_ll
(lowlevel debug per-platform options) and other parts.
Linus Walleij also has some patchs to clean up the older ARM Realview
platforms by finally introducing DT support, and Rob Herring has some
for ARM Versatile which is now DT-only. Both of these platforms are
now multiplatform.
Finally, a couple of patches from Russell for Dove PMU, and a fix from
Valentin Rothberg for Exynos ADC, which were rebased on top of the
series to avoid conflicts"
* tag 'armsoc-multiplatform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (75 commits)
ARM: realview: don't select SMP_ON_UP for UP builds
ARM: s3c: simplify s3c_irqwake_{e,}intallow definition
ARM: s3c64xx: fix pm-debug compilation
iio: exynos-adc: fix irqf_oneshot.cocci warnings
ARM: realview: build realview-dt SMP support only when used
ARM: realview: select apropriate targets
ARM: realview: clean up header files
ARM: realview: make all header files local
ARM: no longer make CPU targets visible separately
ARM: integrator: use explicit core module options
ARM: realview: enable multiplatform
ARM: make default platform work for NOMMU
ARM: debug-ll: move DEBUG_LL_UART_EFM32 to correct Kconfig location
ARM: defconfig: use correct debug_ll settings
ARM: versatile: convert to multi-platform
ARM: versatile: merge mach code into a single file
ARM: versatile: switch to DT only booting and remove legacy code
ARM: versatile: add DT based PCI detection
ARM: pxa: mark ezx structures as __maybe_unused
ARM: pxa: mark raumfeld init functions as __maybe_unused
...
Pull ARM SoC cleanups from Olof Johansson:
"A smallish number of general cleanup commits this release cycle. Some
of these are minor tweaks:
- shmobile change of binding for their GIC (using arm,pl390 now)
- ARCH_RENESAS introduction
- Misc other renesas updates
There's also a couple of treewide commits from Masahiro Yamada
cleaning up const/__initconst for SMP operation structs and a switch
to using "depends on" instead of if-constructs on most of the Kconfig
platform targets"
* tag 'armsoc-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
staging: board: armadillo800eva: Use "arm,pl390"
staging: board: kzm9d: Use "arm,pl390"
ARM: shmobile: r8a7778 dtsi: Use "arm,pl390" for GIC
ARM: shmobile: emev2 dtsi: Use "arm,pl390" for GIC
ARM: shmobile: r8a7740 dtsi: Use "arm,pl390" for GIC
ARM: shmobile: r7s72100 dtsi: Use "arm,pl390" for GIC
ARM: use "depends on" for SoC configs instead of "if" after prompt
ARM/clocksource: use automatic DT probing for ux500 PRCMU
ARM: use const and __initconst for smp_operations
ARM: hisi: do not export smp_operations structures
ARM: mvebu: remove unused mach/gpio.h
ARM: shmobile: Remove legacy mach/irqs.h
ARM: shmobile: Introduce ARCH_RENESAS
MAINTAINERS: Remove link to oss.renesas.com which is closed
Pull non-urgent ARM SoC fixes from Olof Johansson:
"As usual, we queue up a few fixes that don't seem urgent enough to go
in through -rc.
- MAINTAINERS updates to add a list for brcmstb and fix a typo
- A handful of fixes for OMAP 81xx, a recently resurrected platform
so these can't be considered real regressions and thus got queued.
- A couple of other small fixes for scoop, sa1100 and davinci"
* tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: OMAP2+: Fix randconfig build warning for dm814_pllss_data
ARM: sa1100/simpad: Be sure to clamp return value
ARM: scoop: Be sure to clamp return value
ARM: davinci: fix a problematic usage of WARN()
ARM: davinci: only select WT cache if cache is enabled
ARM: OMAP2+: Remove useless check for legacy booting for dm814x
ARM: OMAP2+: Enable GPIO for dm814x
ARM: dts: Fix dm814x pinctrl address and mask
ARM: dts: Fix dm8148 control modules ranges
ARM: OMAP2+: Fix timer entries for dm814x
ARM: dts: Fix some mux and divider clocks to get dm814x-evm booting
ARM: OMAP2+: Add DPPLS clock manager for dm814x
clk: ti: Add few dm814x clock aliases
ARM: dts: Fix dm814x entries for pllss and prcm
MAINTAINERS: gpio-brcmstb: Remove stray '>'
MAINTAINERS: brcmstb: Include Broadcom internal mailing-list
Pull SCSI target updates from Nicholas Bellinger:
"The highlights this round include:
- Introduce configfs support for unlocked configfs_depend_item()
(krzysztof + andrezej)
- Conversion of usb-gadget target driver to new function registration
interface (andrzej + sebastian)
- Enable qla2xxx FC target mode support for Extended Logins (himansu +
giridhar)
- Enable qla2xxx FC target mode support for Exchange Offload (himansu +
giridhar)
- Add qla2xxx FC target mode irq affinity notification + selective
command queuing. (quinn + himanshu)
- Fix iscsi-target deadlock in se_node_acl configfs deletion (sagi +
nab)
- Convert se_node_acl configfs deletion + se_node_acl->queue_depth to
proper se_session->sess_kref + target_get_session() usage. (hch +
sagi + nab)
- Fix long-standing race between se_node_acl->acl_kref get and
get_initiator_node_acl() lookup. (hch + nab)
- Fix target/user block-size handling, and make sure netlink reaches
all network namespaces (sheng + andy)
Note there is an outstanding bug-fix series for remote I_T nexus port
TMR LUN_RESET has been posted and still being tested, and will likely
become post -rc1 material at this point"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (56 commits)
scsi: qla2xxxx: avoid type mismatch in comparison
target/user: Make sure netlink would reach all network namespaces
target: Obtain se_node_acl->acl_kref during get_initiator_node_acl
target: Convert ACL change queue_depth se_session reference usage
iscsi-target: Fix potential dead-lock during node acl delete
ib_srpt: Convert acl lookup to modern get_initiator_node_acl usage
tcm_fc: Convert acl lookup to modern get_initiator_node_acl usage
tcm_fc: Wait for command completion before freeing a session
target: Fix a memory leak in target_dev_lba_map_store()
target: Support aborting tasks with a 64-bit tag
usb/gadget: Remove set-but-not-used variables
target: Remove an unused variable
target: Fix indentation in target_core_configfs.c
target/user: Allow user to set block size before enabling device
iser-target: Fix non negative ERR_PTR isert_device_get usage
target/fcoe: Add tag support to tcm_fc
qla2xxx: Check for online flag instead of active reset when transmitting responses
qla2xxx: Set all queues to 4k
qla2xxx: Disable ZIO at start time.
qla2xxx: Move atioq to a different lock to reduce lock contention
...
Swap cache pages are freed aggressively if swap is nearly full (>50%
currently), because otherwise we are likely to stop scanning anonymous
when we near the swap limit even if there is plenty of freeable swap cache
pages. We should follow the same trend in case of memory cgroup, which
has its own swap limit.
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following patches will add more functions to the memcg section of
include/linux/swap.h. Some of them will need values defined below the
current location of the section. So let's move the section to the end of
the file. No functional changes intended.
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_cgroup_lruvec_online() takes lruvec, but it only needs memcg. Since
get_scan_count(), which is the only user of this function, now possesses
pointer to memcg, let's pass memcg directly to mem_cgroup_online() instead
of picking it out of lruvec and rename the function accordingly.
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces swap accounting to cgroup2.
This patch (of 7):
In the legacy hierarchy we charge memsw, which is dubious, because:
- memsw.limit must be >= memory.limit, so it is impossible to limit
swap usage less than memory usage. Taking into account the fact that
the primary limiting mechanism in the unified hierarchy is
memory.high while memory.limit is either left unset or set to a very
large value, moving memsw.limit knob to the unified hierarchy would
effectively make it impossible to limit swap usage according to the
user preference.
- memsw.usage != memory.usage + swap.usage, because a page occupying
both swap entry and a swap cache page is charged only once to memsw
counter. As a result, it is possible to effectively eat up to
memory.limit of memory pages *and* memsw.limit of swap entries, which
looks unexpected.
That said, we should provide a different swap limiting mechanism for
cgroup2.
This patch adds mem_cgroup->swap counter, which charges the actual number
of swap entries used by a cgroup. It is only charged in the unified
hierarchy, while the legacy hierarchy memsw logic is left intact.
The swap usage can be monitored using new memory.swap.current file and
limited using memory.swap.max.
Note, to charge swap resource properly in the unified hierarchy, we have
to make swap_entry_free uncharge swap only when ->usage reaches zero, not
just ->count, i.e. when all references to a swap entry, including the one
taken by swap cache, are gone. This is necessary, because otherwise
swap-in could result in uncharging swap even if the page is still in swap
cache and hence still occupies a swap entry. At the same time, this
shouldn't break memsw counter logic, where a page is never charged twice
for using both memory and swap, because in case of legacy hierarchy we
uncharge swap on commit (see mem_cgroup_commit_charge).
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The creation and teardown of struct mem_cgroup is fairly messy and
that has attracted mistakes and subtle bugs before.
The main cause for this is that there is no clear model about what
needs to happen when, and that attracts more chaos. So create one:
1. mem_cgroup_alloc() should allocate struct mem_cgroup and its
auxiliary members and initialize work items, locks etc. so that the
object it returns is fully initialized and in a neutral state.
2. mem_cgroup_css_alloc() will use mem_cgroup_alloc() to obtain a new
memcg object and configure it and the system according to the role
of the new memory-controlled cgroup in the hierarchy.
3. mem_cgroup_css_online() is no longer needed to synchronize with
iterators, but it verifies css->id which isn't available earlier.
4. mem_cgroup_css_offline() implements stuff that needs to happen upon
the user-visible destruction of a cgroup, which includes stopping
all user interfacing as well as releasing certain structures when
continued memory consumption would be unexpected at that point.
5. mem_cgroup_css_free() prepares the system and the memcg object for
the object's disappearance, neutralizes its state, and then gives
it back to mem_cgroup_free().
6. mem_cgroup_free() releases struct mem_cgroup and auxiliary memory.
[arnd@arndb.de: fix SLOB build regression]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no more external users of struct cg_proto, flatten the
structure into struct mem_cgroup.
Since using those struct members doesn't stand out as much anymore,
add cgroup2 static branches to make it clearer which code is legacy.
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What CONFIG_INET and CONFIG_LEGACY_KMEM guard inside the memory
controller code is insignificant, having these conditionals is not
worth the complication and fragility that comes with them.
[akpm@linux-foundation.org: rework mem_cgroup_css_free() statement ordering]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Let the user know that CONFIG_MEMCG_KMEM does not apply to the cgroup2
interface. This also makes legacy-only code sections stand out better.
[arnd@arndb.de: mm: memcontrol: only manage socket pressure for CONFIG_INET]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On any given memcg, the kmem accounting feature has three separate
states: not initialized, structures allocated, and actively accounting
slab memory. These are represented through a combination of the
kmem_acct_activated and kmem_acct_active flags, which is confusing.
Convert to a kmem_state enum with the states NONE, ALLOCATED, and
ONLINE. Then rename the functions to modify the state accordingly.
This follows the nomenclature of css object states more closely.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.
[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current lz4 compress buffer is 16kb on 32-bits, 32kb on 64-bits
system. But, lz4 needs only 16kb on both. On 64-bits, this causes
wasted cpu cycles for additional memset during every compression.
In case of lz4hc, the current buffer size is (256kb + 8) on 32-bits,
(512kb + 16) on 64-bits. But, lz4hc needs only (256kb + 2 * pointer) on
both.
This patch fixes these wrong compress buffer sizes for 64-bits.
Signed-off-by: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Chanho Min <chanho.min@lge.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With commit d72da4a4d9 ("rbtree: Make lockless searches non-fatal") our
rbtrees provide weak guarantees that allows us to do lockless (and very
speculative) reads of the tree. Such readers cannot see partial stores
on nodes, ie left/right as well as root. As such, similar to the
WRITE_ONCE semantics when doing rotations, use READ_ONCE when checking
the root node in RB_EMPTY_ROOT.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the stuff currently only used by the kexec file code within
CONFIG_KEXEC_FILE (and CONFIG_KEXEC_VERIFY_SIG).
Also move internal "struct kexec_sha_region" and "struct kexec_buf" into
"kexec_internal.h".
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.
To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.
The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.
While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.
In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:
/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret
Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>