Most of the PM code needed for am335x and am437x can be moved into a
module under drivers but some core code must remain in mach-omap2 at the
moment. This includes some internal clockdomain APIs and low-level ARM
APIs which are also not exported for use by modules.
Implement a few functions that handle these low-level platform
operations can be passed to the pm33xx module through the use of
platform data.
In addition to this, to be able to share data structures between C and
the sleep33xx and sleep43xx assembly code, we can automatically generate
all of the C struct member offsets and sizes as macros by processing
pm-asm-offsets.c into assembly code and then extracting the relevant
data as is done for the generated platform asm-offsets.h files.
Finally, add amx3_common_pm_init to create a dummy platform_device for
pm33xx so that our soon to be introduced pm33xx module can probe on
am335x and am437x platforms to enable basic suspend to mem and standby
support.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Currently if notification event is lost due to event allocation failing
we ENOMEM, we just silently continue (except for fanotify permission
events where we deny the access). This is undesirable as userspace has
no way of knowing whether the notifications it got are complete or not.
Treat lost events due to ENOMEM the same way as lost events due to queue
overflow so that userspace knows something bad happened and it likely
needs to rescan the filesystem.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reading the function declarations for the console callbacks lacks any
hints as to what the arguments are. Instead of going and digging around in
various implementations that may each only have a subset of the callbacks,
name all the arguments in the declaration. This has no functional change.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that ti-sysc can manage child devices, we must also be backwards
compatible with the current omap_device code. With omap_device, we
assume that the child device manages the interconnect target module
directly.
The drivers needing special handling are the ones that still set
pm_runtime_irq_safe(). In the long run we want to update those drivers
as otherwise they will cause problems with genpd as a permanent PM
runtime usage count is set on the parent device.
We can handle omap_device these devices by improving the ti-sysc quirk
handling to detect the devices needing special handling based on
register map and revision register if usable. We also need to implement
dev_pm_domain for these child devices just like omap_device does.
Signed-off-by: Tony Lindgren <tony@atomide.com>
We want to pass the device tree configuration for interconnect target
modules from ti-sysc driver to the existing platform hwmod code.
This allows us to first validate the dts data against the existing
platform data before we start dropping the platform data in favor of
device tree data.
To do this, let's add platform data callbacks for PM runtime functions
to call for the interconnect target modules if platform data is
available.
Note that as ti-sysc driver can rebind, omap_auxdata_lookup and related
functions can no longer be __init.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d28370 ("mm: introduce get_user_pages_longterm")
Cc: <stable@vger.kernel.org>
Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull x86 fixes from Thomas Gleixner:
"Yet another pile of melted spectrum related changes:
- sanitize the array_index_nospec protection mechanism: Remove the
overengineered array_index_nospec_mask_check() magic and allow
const-qualified types as index to avoid temporary storage in a
non-const local variable.
- make the microcode loader more robust by properly propagating error
codes. Provide information about new feature bits after micro code
was updated so administrators can act upon.
- optimizations of the entry ASM code which reduce code footprint and
make the code simpler and faster.
- fix the {pmd,pud}_{set,clear}_flags() implementations to work
properly on paravirt kernels by removing the address translation
operations.
- revert the harmful vmexit_fill_RSB() optimization
- use IBRS around firmware calls
- teach objtool about retpolines and add annotations for indirect
jumps and calls.
- explicitly disable jumplabel patching in __init code and handle
patching failures properly instead of silently ignoring them.
- remove indirect paravirt calls for writing the speculation control
MSR as these calls are obviously proving the same attack vector
which is tried to be mitigated.
- a few small fixes which address build issues with recent compiler
and assembler versions"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
objtool, retpolines: Integrate objtool with retpoline support more closely
x86/entry/64: Simplify ENCODE_FRAME_POINTER
extable: Make init_kernel_text() global
jump_label: Warn on failed jump_label patching attempt
jump_label: Explicitly disable jump labels in __init code
x86/entry/64: Open-code switch_to_thread_stack()
x86/entry/64: Move ASM_CLAC to interrupt_entry()
x86/entry/64: Remove 'interrupt' macro
x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
objtool: Add module specific retpoline rules
objtool: Add retpoline validation
objtool: Use existing global variables for options
x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
x86/paravirt, objtool: Annotate indirect calls
...
Pull KVM fixes from Paolo Bonzini:
"s390:
- optimization for the exitless interrupt support that was merged in 4.16-rc1
- improve the branch prediction blocking for nested KVM
- replace some jump tables with switch statements to improve expoline performance
- fixes for multiple epoch facility
ARM:
- fix the interaction of userspace irqchip VMs with in-kernel irqchip VMs
- make sure we can build 32-bit KVM/ARM with gcc-8.
x86:
- fixes for AMD SEV
- fixes for Intel nested VMX, emulated UMIP and a dump_stack() on VM startup
- fixes for async page fault migration
- small optimization to PV TLB flush (new in 4.16-rc1)
- syzkaller fixes
Generic:
- compiler warning fixes
- syzkaller fixes
- more improvements to the kvm_stat tool
Two more small Spectre fixes are going to reach you via Ingo"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (40 commits)
KVM: SVM: Fix SEV LAUNCH_SECRET command
KVM: SVM: install RSM intercept
KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command
include: psp-sev: Capitalize invalid length enum
crypto: ccp: Fix sparse, use plain integer as NULL pointer
KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled
x86/kvm: Make parse_no_xxx __init for kvm
KVM: x86: fix backward migration with async_PF
kvm: fix warning for non-x86 builds
kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
tools/kvm_stat: print 'Total' line for multiple events only
tools/kvm_stat: group child events indented after parent
tools/kvm_stat: separate drilldown and fields filtering
tools/kvm_stat: eliminate extra guest/pid selection dialog
tools/kvm_stat: mark private methods as such
tools/kvm_stat: fix debugfs handling
tools/kvm_stat: print error on invalid regex
tools/kvm_stat: fix crash when filtering out all non-child trace events
tools/kvm_stat: avoid 'is' for equality checks
tools/kvm_stat: use a more pythonic way to iterate over dictionaries
...
When two blkdev_open() calls for a partition race with device removal
and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
blkdev_open(). The race can happen as follows:
CPU0 CPU1 CPU2
del_gendisk()
bdev_unhash_inode(part1);
blkdev_open(part1, O_EXCL) blkdev_open(part1, O_EXCL)
bdev = bd_acquire() bdev = bd_acquire()
blkdev_get(bdev)
bd_start_claiming(bdev)
- finds old inode 'whole'
bd_prepare_to_claim() -> 0
bdev_unhash_inode(whole);
<device removed>
<new device under same
number created>
blkdev_get(bdev);
bd_start_claiming(bdev)
- finds new inode 'whole'
bd_prepare_to_claim()
- this also succeeds as we have
different 'whole' here...
- bad things happen now as we
have two exclusive openers of
the same bdev
The problem here is that block device opens can see various intermediate
states while gendisk is shutting down and then being recreated.
We fix the problem by introducing new lookup_sem in gendisk that
synchronizes gendisk deletion with get_gendisk() and furthermore by
making sure that get_gendisk() does not return gendisk that is being (or
has been) deleted. This makes sure that once we ever manage to look up
newly created bdev inode, we are also guaranteed that following
get_gendisk() will either return failure (and we fail open) or it
returns gendisk for the new device and following bdget_disk() will
return new bdev inode (i.e., blkdev_open() follows the path as if it is
completely run after new device is created).
Reported-and-analyzed-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename get_disk() to get_disk_and_module() to make sure what the
function does. It's not a great name but at least it is now clear that
put_disk() is not it's counterpart.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-02-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various improvements for BPF kselftests: i) skip unprivileged tests
when kernel.unprivileged_bpf_disabled sysctl knob is set, ii) count
the number of skipped tests from unprivileged, iii) when a test case
had an unexpected error then print the actual but also the unexpected
one for better comparison, from Joe.
2) Add a sample program for collecting CPU state statistics with regards
to how long the CPU resides in cstate and pstate levels. Based on
cpu_idle and cpu_frequency trace points, from Leo.
3) Various x64 BPF JIT optimizations to further shrink the generated
image size in order to make it more icache friendly. When tested on
the Cilium generated programs, image size reduced by approx 4-5% in
best case mainly due to how LLVM emits unsigned 32 bit constants,
from Daniel.
4) Improvements and fixes on the BPF sockmap sample programs: i) fix
the sockmap's Makefile to include nlattr.o for libbpf, ii) detach
the sock ops programs from the cgroup before exit, from Prashant.
5) Avoid including xdp.h in filter.h by just forward declaring the
struct xdp_rxq_info in filter.h, from Jesper.
6) Fix the BPF kselftests Makefile for cgroup_helpers.c by only declaring
it a dependency for test_dev_cgroup.c but not every other test case
where it is not needed, from Jesper.
7) Adjust rlimit RLIMIT_MEMLOCK for test_tcpbpf_user selftest since the
default is insufficient for creating the 'global_map' used in the
corresponding BPF program, from Yonghong.
8) Likewise, for the xdp_redirect sample, Tushar ran into the same when
invoking xdp_redirect and xdp_monitor at the same time, therefore
in order to have the sample generically work bump the limit here,
too. Fix from Tushar.
9) Avoid an unnecessary NULL check in BPF_CGROUP_RUN_PROG_INET_SOCK()
since sk is always guaranteed to be non-NULL, from Yafang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We want the IIO/Staging fixes in here, and to resolve a merge problem
with the move of the fsl-mc code.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
regmap_init_mmio_clk allows to specify a clock that needs to be enabled
while accessing the registers.
However, that clock is retrieved through its clock ID, which means it will
lookup that clock based on the current device that registers the regmap,
and, in the DT case, will only look in that device OF node.
This might be problematic if the clock to enable is stored in another node.
Let's add a function that allows to attach a clock that has already been
retrieved to a regmap in order to fix this.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull locking fixes from Thomas Gleixner:
"Three patches to fix memory ordering issues on ALPHA and a comment to
clarify the usage scope of a mutex internal function"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
locking/xchg/alpha: Clean up barrier usage by using smp_mb() in place of __ASM__MB
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
locking/mutex: Add comment to __mutex_owner() to deter usage
<linux/mutex.h> does not use nor need <linux/linkage.h>, so drop that
one header file from mutex.h.
<linux/mutex.h> is currently #included in around 1250 C source files
(oops, I didn't count other header files that #include it), making it
the 27th most-used header file.
Build tested on i386 and x86_64 * (allnoconfig, tiny.config, defconfig,
allyesconfig, and allmodconfig) and x64_64 allmodconfig + SMP=disabled.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/582b3892-4e4c-06b2-a368-5c2d439de7fc@infradead.org
<linux/irq.h> does not use nor need several of its #included files,
so drop those header files from irq.h.
<linux/irq.h> is currently #included in around 1135 C source files
(oops, I didn't count other header files that #include it), making it
the 29th most-used header file.
Build tested on i386 and x86_64 * (allnoconfig, tiny.config, defconfig,
allyesconfig, and allmodconfig) and x64_64 allmodconfig + SMP=disabled.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/02745e91-c117-74b5-d043-dceb3d4bb4e0@infradead.org
Fix the following sparse warning by moving the prototype
of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
CHECK arch/s390/kvm/../../../virt/kvm/kvm_main.c
arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the kvm_arch_irq_routing_update() prototype outside of
ifdef CONFIG_HAVE_KVM_EVENTFD guards to fix the following sparse warning:
arch/s390/kvm/../../../virt/kvm/irqchip.c:171:28: warning: symbol 'kvm_arch_irq_routing_update' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create a new representor type: REP_IB. which will be initialized by an IB
device that is used as a logical representor of a eswitch vport (VF or
uplink) just like we have a net device today in switchdev mode.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In preparation for IB representors, move representors structs to a global
scope, also expose functions needed for registration, unregistration,
eswitch mode and creating a flow rule to direct traffic from SQs to the
right VF.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull drm fixes from Dave Airlie:
"A bunch of fixes for rc3:
Exynos:
- fixes for using monotonic timestamps
- register definitions
- removal of unused file
ipu-v3L
- minor changes
- make some register arrays const+static
- fix some leaks
meson:
- fix for vsync
atomic:
- fix for memory leak
EDID parser:
- add quirks for some more non-desktop devices
- 6-bit panel fix.
drm_mm:
- fix a bug in the core drm mm hole handling
cirrus:
- fix lut loading regression
Lastly there is a deadlock fix around runtime suspend for secondary
GPUs.
There was a deadlock between one thread trying to wait for a workqueue
job to finish in the runtime suspend path, and the workqueue job it
was waiting for in turn waiting for a runtime_get_sync to return.
The fixes avoids it by not doing the runtime sync in the workqueue as
then we always wait for all those tasks to complete before we runtime
suspend"
* tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux: (25 commits)
drm/tve200: fix kernel-doc documentation comment include
drm/edid: quirk Sony PlayStation VR headset as non-desktop
drm/edid: quirk Windows Mixed Reality headsets as non-desktop
drm/edid: quirk Oculus Rift headsets as non-desktop
drm/meson: fix vsync buffer update
drm: Handle unexpected holes in color-eviction
drm: exynos: Use proper macro definition for HDMI_I2S_PIN_SEL_1
drm/exynos: remove exynos_drm_rotator.h
drm/exynos: g2d: Delete an error message for a failed memory allocation in two functions
drm/exynos: fix comparison to bitshift when dealing with a mask
drm/exynos: g2d: use monotonic timestamps
drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
gpu: ipu-csi: add 10/12-bit grayscale support to mbus_code_to_bus_cfg
gpu: ipu-cpmem: add 16-bit grayscale support to ipu_cpmem_set_image
gpu: ipu-v3: prg: fix device node leak in ipu_prg_lookup_by_phandle
gpu: ipu-v3: pre: fix device node leak in ipu_pre_lookup_by_phandle
drm/amdgpu: Fix deadlock on runtime suspend
drm/radeon: Fix deadlock on runtime suspend
drm/nouveau: Fix deadlock on runtime suspend
drm: Allow determining if current task is output poll worker
...
sk is already allocated in inet_create/inet6_create, hence when
BPF_CGROUP_RUN_PROG_INET_SOCK is executed sk will never be NULL.
The logic is as bellow,
sk = sk_alloc();
if (!sk)
goto out;
BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Now that arch/metag/ has been removed, remove the two metag irqchip
drivers. They are of no value without the architecture code.
- irq-metag: Meta internal (HWSTATMETA) interrupt code.
- irq-metag-ext: Meta External interrupt code.
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-metag@vger.kernel.org
The Ampere Computing PCIe root port does not support ACS at this point.
However, the hardware provides isolation and source validation through the
SMMU. The stream ID generated by the PCIe ports contain both the
bus/device/function number as well as the port ID in its 3 most significant
bits. Turn on ACS but disable all the peer-to-peer features.
APM is being rebranded to Ampere. The Vendor and Device IDs change, but
the functionality stays the same.
Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Move pcieport_if.h from include/linux to drivers/pci/pcie/pcieport_if.h
because the interfaces there are only used by the PCI core.
Replace all uses of #include<linux/pcieport_if.h> with relative paths to
the new file location, e.g., #include "../pcieport_if.h"
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
There's no reason pci_uevent_ers() needs to be inline in pci.h, so move it
out to a C file.
Given it's used by AER the obvious location would be somewhere in
drivers/pci/pcie/aer, but because it's also used by powerpc EEH code
unfortunately that doesn't work in the case where EEH is enabled but
PCIEPORTBUS is not.
So for now put it in pci-driver.c, next to pci_uevent(), with an
appropriate #ifdef so it's not built if AER and EEH are both disabled.
While we're moving it also fix up the kernel doc comment for @pdev to be
accurate.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
We are still initializing smartreflex with platform data using
omap_device_build(). We can instead pass the platform data in
with auxdata in pdata-quirks.c and make the driver use that
in later patches.
Note that we cannot enable the auxdata use yet, this is done
in the last patch of the series.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Johannes Berg says:
====================
Various updates across wireless.
One thing to note: I've included a new ethertype
that wireless uses (ETH_P_PREAUTH) in if_ether.h.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5-updates-2018-02-21
This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.
By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads. The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.
The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database. To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.
By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: don't defer struct page initialization for Xen pv guests
lib/Kconfig.debug: enable RUNTIME_TESTING_MENU
vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems
selftests/memfd: add run_fuse_test.sh to TEST_FILES
bug.h: work around GCC PR82365 in BUG()
mm/swap.c: make functions and their kernel-doc agree (again)
mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc
ida: do zeroing in ida_pre_get()
mm, swap, frontswap: fix THP swap if frontswap enabled
certs/blacklist_nohashes.c: fix const confusion in certs blacklist
kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
mm, mlock, vmscan: no more skipping pagevecs
mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats
Kbuild: always define endianess in kconfig.h
include/linux/sched/mm.h: re-inline mmdrop()
tools: fix cross-compile var clobbering
Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).
On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.
Linus suggested per-user rate limit to solve this.
So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.
In efivarfs if the limit is exceeded when reading, we take an
interruptible nap for 50ms and check the rate limit again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The header files for some structures could get included in such a way
that struct attributes (specifically __randomize_layout from path.h) would
be parsed as variable names instead of attributes. This could lead to
some instances of a structure being unrandomized, causing nasty GPFs, etc.
This patch makes sure the compiler_types.h header is included in
kconfig.h so that we've always got types and struct attributes defined,
since kconfig.h is included from the compiler command line.
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Root-caused-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: 3859a271a0 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the source files out of staging into their final locations:
-mc.h include file in drivers/staging/fsl-mc/include go to include/linux/fsl
-source files in drivers/staging/fsl-mc/bus go to drivers/bus/fsl-mc
-overview.rst, providing an overview of DPAA2, goes to
Documentation/networking/dpaa2/overview.rst
Update or delete other remaining staging files -- Makefile, Kconfig, TODO.
Update dpaa2_eth and dpio staging drivers.
Add integration bits for the documentation build system.
Signed-off-by: Stuart Yoder <stuyoder@gmail.com>
[rebased, add dpaa2_eth and dpio #include updates]
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
[rebased, split irqchip to separate patch]
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looking at functions with large stack frames across all architectures
led me discovering that BUG() suffers from the same problem as
fortify_panic(), which I've added a workaround for already.
In short, variables that go out of scope by calling a noreturn function
or __builtin_unreachable() keep using stack space in functions
afterwards.
A workaround that was identified is to insert an empty assembler
statement just before calling the function that doesn't return. I'm
adding a macro "barrier_before_unreachable()" to document this, and
insert calls to that in all instances of BUG() that currently suffer
from this problem.
The files that saw the largest change from this had these frame sizes
before, and much less with my patch:
fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
In case of ARC and CRIS, it turns out that the BUG() implementation
actually does return (or at least the compiler thinks it does),
resulting in lots of warnings about uninitialized variable use and
leaving noreturn functions, such as:
block/cfq-iosched.c: In function 'cfq_async_queue_prio':
block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
include/linux/dmaengine.h: In function 'dma_maxpq':
include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
This makes them call __builtin_trap() instead, which should normally
dump the stack and kill the current process, like some of the other
architectures already do.
I tried adding barrier_before_unreachable() to panic() and
fortify_panic() as well, but that had very little effect, so I'm not
submitting that patch.
Vineet said:
: For ARC, it is double win.
:
: 1. Fixes 3 -Wreturn-type warnings
:
: | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
: non-void function [-Wreturn-type]
:
: 2. bloat-o-meter reports code size improvements as gcc elides the
: generated code for stack return.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a thread mlocks an address space backed either by file pages which
are currently not present in memory or swapped out anon pages (not in
swapcache), a new page is allocated and added to the local pagevec
(lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
On I/O completion, the thread can wake on a different CPU, the mlock
syscall will then sets the PageMlocked() bit of the page but will not be
able to put that page in unevictable LRU as the page is on the pagevec
of a different CPU. Even on drain, that page will go to evictable LRU
because the PageMlocked() bit is not checked on pagevec drain.
The page will eventually go to right LRU on reclaim but the LRU stats
will remain skewed for a long time.
This patch puts all the pages, even unevictable, to the pagevecs and on
the drain, the pages will be added on their LRUs correctly by checking
their evictability. This resolves the mlocked pages on pagevec of other
CPUs issue because when those pagevecs will be drained, the mlocked file
pages will go to unevictable LRU. Also this makes the race with munlock
easier to resolve because the pagevec drains happen in LRU lock.
However there is still one place which makes a page evictable and does
PageLRU check on that page without LRU lock and needs special attention.
TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().
#0: __pagevec_lru_add_fn #1: clear_page_mlock
SetPageLRU() if (!TestClearPageMlocked())
return
smp_mb() // <--required
// inside does PageLRU
if (!PageMlocked()) if (isolate_lru_page())
move to evictable LRU putback_lru_page()
else
move to unevictable LRU
In '#1', TestClearPageMlocked() provides full memory barrier semantics
and thus the PageLRU check (inside isolate_lru_page) can not be
reordered before it.
In '#0', without explicit memory barrier, the PageMlocked() check can be
reordered before SetPageLRU(). If that happens, '#0' can put a page in
unevictable LRU and '#1' might have just cleared the Mlocked bit of that
page but fails to isolate as PageLRU fails as '#0' still hasn't set
PageLRU bit of that page. That page will be stranded on the unevictable
LRU.
There is one (good) side effect though. Without this patch, the pages
allocated for System V shared memory segment are added to evictable LRUs
even after shmctl(SHM_LOCK) on that segment. This patch will correctly
put such pages to unevictable LRU.
Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit a983b5ebee ("mm: memcontrol: fix excessive complexity in
memory.stat reporting"), we observed slowly upward creeping NR_WRITEBACK
counts over the course of several days, both the per-memcg stats as well
as the system counter in e.g. /proc/meminfo.
The conversion from full per-cpu stat counts to per-cpu cached atomic
stat counts introduced an irq-unsafe RMW operation into the updates.
Most stat updates come from process context, but one notable exception
is the NR_WRITEBACK counter. While writebacks are issued from process
context, they are retired from (soft)irq context.
When writeback completions interrupt the RMW counter updates of new
writebacks being issued, the decs from the completions are lost.
Since the global updates are routed through the joint lruvec API, both
the memcg counters as well as the system counters are affected.
This patch makes the joint stat and event API irq safe.
Link: http://lkml.kernel.org/r/20180203082353.17284-1-hannes@cmpxchg.org
Fixes: a983b5ebee ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Build testing with LTO found a couple of files that get compiled
differently depending on whether asm/byteorder.h gets included early
enough or not. In particular, include/asm-generic/qrwlock_types.h is
affected by this, but there are probably others as well.
The symptom is a series of LTO link time warnings, including these:
net/netlabel/netlabel_unlabeled.h:223: error: type of 'netlbl_unlhsh_add' does not match original declaration [-Werror=lto-type-mismatch]
int netlbl_unlhsh_add(struct net *net,
net/netlabel/netlabel_unlabeled.c:377: note: 'netlbl_unlhsh_add' was previously declared here
include/net/ipv6.h:360: error: type of 'ipv6_renew_options_kern' does not match original declaration [-Werror=lto-type-mismatch]
ipv6_renew_options_kern(struct sock *sk,
net/ipv6/exthdrs.c:1162: note: 'ipv6_renew_options_kern' was previously declared here
net/core/dev.c:761: note: 'dev_get_by_name_rcu' was previously declared here
struct net_device *dev_get_by_name_rcu(struct net *net, const char *name)
net/core/dev.c:761: note: code may be misoptimized unless -fno-strict-aliasing is used
drivers/gpu/drm/i915/i915_drv.h:3377: error: type of 'i915_gem_object_set_to_wc_domain' does not match original declaration [-Werror=lto-type-mismatch]
i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write);
drivers/gpu/drm/i915/i915_gem.c:3639: note: 'i915_gem_object_set_to_wc_domain' was previously declared here
include/linux/debugfs.h:92:9: error: type of 'debugfs_attr_read' does not match original declaration [-Werror=lto-type-mismatch]
ssize_t debugfs_attr_read(struct file *file, char __user *buf,
fs/debugfs/file.c:318: note: 'debugfs_attr_read' was previously declared here
include/linux/rwlock_api_smp.h:30: error: type of '_raw_read_unlock' does not match original declaration [-Werror=lto-type-mismatch]
void __lockfunc _raw_read_unlock(rwlock_t *lock) __releases(lock);
kernel/locking/spinlock.c:246:26: note: '_raw_read_unlock' was previously declared here
include/linux/fs.h:3308:5: error: type of 'simple_attr_open' does not match original declaration [-Werror=lto-type-mismatch]
int simple_attr_open(struct inode *inode, struct file *file,
fs/libfs.c:795: note: 'simple_attr_open' was previously declared here
All of the above are caused by include/asm-generic/qrwlock_types.h
failing to include asm/byteorder.h after commit e0d02285f1
("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
in linux-4.15.
Similar bugs may or may not exist in older kernels as well, but there is
no easy way to test those with link-time optimizations, and kernels
before 4.14 are harder to fix because they don't have Babu's patch
series
We had similar issues with CONFIG_ symbols in the past and ended up
always including the configuration headers though linux/kconfig.h. This
works around the issue through that same file, defining either
__BIG_ENDIAN or __LITTLE_ENDIAN depending on CONFIG_CPU_BIG_ENDIAN,
which is now always set on all architectures since commit 4c97a0c8fe
("arch: define CPU_BIG_ENDIAN for all fixed big endian archs").
Link: http://lkml.kernel.org/r/20180202154104.1522809-2-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>