Some PWM drivers are testing the PWMF_ENABLED flag. Create a helper
function to hide the logic behind enabled test. This will allow us to
smoothly move from the current approach to an atomic PWM update
approach.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Pull x86 fixes from Ingo Molnar:
"Two families of fixes:
- Fix an FPU context related boot crash on newer x86 hardware with
larger context sizes than what most people test. To fix this
without ugly kludges or extensive reverts we had to touch core task
allocator, to allow x86 to determine the task size dynamically, at
boot time.
I've tested it on a number of x86 platforms, and I cross-built it
to a handful of architectures:
(warns) (warns)
testing x86-64: -git: pass ( 0), -tip: pass ( 0)
testing x86-32: -git: pass ( 0), -tip: pass ( 0)
testing arm: -git: pass ( 1359), -tip: pass ( 1359)
testing cris: -git: pass ( 1031), -tip: pass ( 1031)
testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
testing mips: -git: pass ( 1162), -tip: pass ( 1162)
testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
testing sparc: -git: pass ( 1185), -tip: pass ( 1185)
... so I hope the cross-arch impact 'none', as intended.
(by Dave Hansen)
- Fix various NMI handling related bugs unearthed by the big asm code
rewrite and generally make the NMI code more robust and more
maintainable while at it. These changes are a bit late in the
cycle, I hope they are still acceptable.
(by Andy Lutomirski)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
x86/fpu, sched: Dynamically allocate 'struct fpu'
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
x86/nmi/64: Make the "NMI executing" variable more consistent
x86/nmi/64: Minor asm simplification
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
x86/nmi/64: Reorder nested NMI checks
x86/nmi/64: Improve nested NMI comments
x86/nmi/64: Switch stacks on userspace NMI entry
x86/nmi/64: Remove asm code that saves CR2
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
Currently, we set wrong gfp_mask to page_owner info in case of isolated
freepage by compaction and split page. It causes incorrect mixed
pageblock report that we can get from '/proc/pagetypeinfo'. This metric
is really useful to measure fragmentation effect so should be accurate.
This patch fixes it by setting correct information.
Without this patch, after kernel build workload is finished, number of
mixed pageblock is 112 among roughly 210 movable pageblocks.
But, with this fix, output shows that mixed pageblock is just 57.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile). For example it can detect when formatting a pointer as a
number, like the issue fixed in commit a3fa71c40f ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f. for example commit 5ce1aca814
("reiserfs: fix __RASSERT format string").
To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/. These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull staging and IIO driver fixes from Greg KH:
"Here's some staging and IIO driver fixes for 4.2-rc3.
Nothing major, the majority are IIO issues that were reported, with a
few other minor staging driver fixes. All have been in linux-next for
a while with no reported issues"
* tag 'staging-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (25 commits)
staging: vt6656: check ieee80211_bss_conf bssid not NULL
staging: vt6655: check ieee80211_bss_conf bssid not NULL
staging:lustre: remove irq.h from socklnd.h
staging: make board support depend on OF_IRQ and CLKDEV_LOOKUP
iio: tmp006: Check channel info on write
iio: sx9500: Add missing init in sx9500_buffer_pre{en,dis}able()
iio:light:ltr501: fix regmap dependency
iio:light:ltr501: fix variable in ltr501_init
iio: sx9500: fix bug in compensation code
iio: sx9500: rework error handling of raw readings
iio: magnetometer: mmc35240: fix available sampling frequencies
iio:light:stk3310: Fix REGMAP_I2C dependency
iio: light: STK3310: un-invert proximity values
iio:adc:cc10001_adc: fix Kconfig dependency
iio: light: tcs3414: Fix bug preventing to set integration time
iio:accel:bmc150-accel: fix counting direction
iio:light:cm3323: clear bitmask before set
iio: adc: at91_adc: allow to use full range of startup time
iio: DAC: ad5624r_spi: fix bit shift of output data value
iio: proximity: sx9500: Fix proximity value
...
regmap: Create a new struct reg_sequence for register sequences
In order to allow us to start adding extra annotations for sequences
without bloating register default tables duplicate the structure under
the new name reg_sequence and update the APIs to use that instead of
reg_default.
Pull GPIO fixes from Linus Walleij:
"This is a first set of GPIO fixes for the v4.2 series, all hitting
individual drivers and nothing else (except for a documentation
oneliner. I intended to send a request earlier but life intervened)"
* tag 'gpio-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: fix nested irqs rescheduling
gpio: omap: prevent module from being unloaded while in use
gpio: max732x: Add missing dev reference to gpiochip
gpio/xilinx: Use correct address when setting initial values.
gpio: zynq: Fix problem with unbalanced pm_runtime_enable
gpio: omap: add missed spin_unlock_irqrestore in omap_gpio_irq_type
gpio: brcmstb: fix null ptr dereference in driver remove
gpio: Remove double "base" in comment
Lots of devices support huge discard sizes these days. Depending
on how the device handles them internally, huge discards can
introduce massive latencies (hundreds of msec) on the device side.
We have a sysfs file, discard_max_bytes, that advertises the max
hardware supported discard size. Make this writeable, and split
the settings into a soft and hard limit. This can be set from
'discard_granularity' and up to the hardware limit.
Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
set limit.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Percpu refcount is the perfect match for partition's case,
and the conversion is quite straight.
With the convertion, one pair of atomic inc/dec can be saved
for accounting block I/O, which is run in hot path of block I/O.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
So the helper can be used in both generic partition
case and part0 case.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The last patch in this series makes the flags parameter for the various
gpiod_get* functions mandatory and so allows to remove an ugly cpp hack
introduced in commit 39b2bbe3d7 (gpio: add flags argument to gpiod_get*()
functions) for v3.17-rc1.
The other nine commits fix the last remaining users of these functions that
don't pass flags yet. (Only etraxfs-uart wasn't fixed; this driver's use of the
gpiod functions needs fixing anyhow.)
x86s NMI backtrace implementation (for arch_trigger_all_cpu_backtrace())
is fairly generic in nature - the only architecture specific bits are
the act of raising the NMI to other CPUs, and reporting the status of
the NMI handler.
These are fairly simple to factor out, and produce a generic
implementation which can be shared between ARM and x86.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull block fixes from Jens Axboe:
"A collection of fixes from the last few weeks that should go into the
current series. This contains:
- Various fixes for the per-blkcg policy data, fixing regressions
since 4.1. From Arianna and Tejun
- Code cleanup for bcache closure macros from me. Really just
flushing this out, it's been sitting in another branch for months
- FIELD_SIZEOF cleanup from Maninder Singh
- bio integrity oops fix from Mike
- Timeout regression fix for blk-mq from Ming Lei"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: set default timeout as 30 seconds
NVMe: Reread partitions on metadata formats
bcache: don't embed 'return' statements in closure macros
blkcg: fix blkcg_policy_data allocation bug
blkcg: implement all_blkcgs list
blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[]
blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods
block/blk-cgroup.c: free per-blkcg data when freeing the blkcg
block: use FIELD_SIZEOF to calculate size of a field
bio integrity: do not assume bio_integrity_pool exists if bioset exists
Add an optional delay_us field in reg_sequence to allow the client to
specify a delay (in microseconds) to be applied after any given write
in a sequence of writes.
We treat a delay in a sequence the same way we treat a page change as
they are logically similar in that you can coalesce all write before
a delay (in the same way you can coalesce all writes before a page
change is needed)
Signed-off-by: Nariman Poushin <nariman@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Separate the functionality using sequences of register writes from the
functions that take register defaults. This change renames the arguments
in order to support the extension of reg_sequence to take an optional
delay to be applied after any given register in a sequence is written.
This avoids adding an int to all register defaults, which could
substantially increase memory usage for regmaps with large default tables.
This also updates all the clients of multi_reg_write/register_patch.
Signed-off-by: Nariman Poushin <nariman@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add MAX77843_MUIC prefix to some of the defines used in max77843 extcon
driver so the max77693-private.h can be included simultaneously with
max77843-private.h.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add MAX77693 prefix to some of the defines used in max77693 extcon
driver so the max77693-private.h can be included simultaneously with
max77843-private.h.
Additionally use BIT() macro in header.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This prepares for merging some of the drivers between max77693 and
max77843 so the child MFD driver can be attached to any parent MFD main
driver.
Move the state container to common header file. Additionally add
consistent 'i2c' prefixes to its members (of 'struct i2c_client' type).
Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Store the device type (obtained from i2c_device_id) as an enum and add a
default type of unknown to distinguish from case when this is not set
at all.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Clean up the max77693 private header file by removing:
1. Left-overs from previous way of interrupt handling (driver uses
regmap_irq_chip).
2. Unused members of struct 'max77693_dev' related to interrupts in
extcon driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
A big problem with the current i8042 debugging option is that it outputs
data going to and from the keyboard by default. As a result, many dmesg
logs uploaded by users will unintentionally contain sensitive information
such as their password, as such it's probably a good idea not to output
data coming from the keyboard unless specifically enabled by the user.
Signed-off-by: Stephen Chandler Paul <cpaul@redhat.com>
Reviewed-by: Andreas Mohr <andim2@users.sf.net>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Since commit 59032702ea ("ARM: shmobile: Remove legacy platform
devices from EMEV2 SoC code"), EMMA Mobile SoCs are only supported in
generic DT-only ARM multi-platform builds. The driver doesn't need to
use platform data anymore, hence remove platform data configuration.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Tested-by: Niklas Söderlund <niso@kth.se>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.
The switch driver can react to protodown notification by doing a phys down
on the associated switch port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DEBUG_LOCK_ALLOC=y is not a production setting, but it is
not very unusual either. Many developers routinely
use kernels built with it enabled.
Apart from being selected by hand, it is also auto-selected by
PROVE_LOCKING "Lock debugging: prove locking correctness" and
LOCK_STAT "Lock usage statistics" config options.
LOCK STAT is necessary for "perf lock" to work.
I wouldn't spend too much time optimizing it, but this particular
function has a very large cost in code size: when it is deinlined,
code size decreases by 830,000 bytes:
text data bss dec hex filename
85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
84837612 22294424 20627456 127759492 79d7484 vmlinux
(with this config: http://busybox.net/~vda/kernel_config)
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Tejun Heo <tj@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: linux-kernel@vger.kernel.org
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull file locking updates from Jeff Layton:
"I had thought that I was going to get away without a pull request this
cycle. There was a NFSv4 file locking problem that cropped up that I
tried to fix in the NFSv4 code alone, but that fix has turned out to
be problematic. These patches fix this in the correct way.
Note that this touches some NFSv4 code as well. Ordinarily I'd wait
for Trond to ACK this, but he's on holiday right now and the bug is
rather nasty. So I suggest we merge this and if he raises issues with
it we can sort it out when he gets back"
Acked-by: Bruce Fields <bfields@fieldses.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
[ +1 to this series fixing a 100% reproducible slab corruption +
general protection fault in my nfs-root test environment. - Dan ]
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux:
locks: inline posix_lock_file_wait and flock_lock_file_wait
nfs4: have do_vfs_lock take an inode pointer
locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait
locks: have flock_lock_file take an inode pointer instead of a filp
Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"
Jonathan writes:
First round of new drivers, cleanups and functionality for IIO in the 4.3 cycle.
Core and tools new stuff
* Allow explicit flush of hardware fifo by using an non blocking read.
This is needed to support some of the Android requirements for HW fifo
devices - also makes sense generally and clarifies a corner of the ABI.
* Add some missing modifier names. Mostly these exist for weird and
wonderful event types, but should still be present in the name array.
* Update iio_event_monitor to cope with new channel types.
* generic_buffer gains support for single byte scan elements (no idea
how this never got implemented before!)
New device support
* ROHM rpr0521 light and proximity sensor driver.
* bmc150 gains bmc156 support.
* ms5611 gains ms5607 temperature and pressure sensor support.
Driver functionality
* inv-mpu - add scale_available attributes to aid userspace in
configuring these devices.
* isl29125 - add scale_available attributes.
* stk8ba50 - sampling frequency control, triggered buffer support.
* stk8312 - sampling frequency control, triggered buffer support.
* cc10001 - ensure ADC powered up at probe time if shared by non linux
running CPUs.
* bmc150-magn - decouple the buffer and trigger allowing other triggers
to be used to drive this device's sampling.
Documentation
* Add some previously missed *scale_available attributes to the ABI docs.
Cleanups
* Clarify some crazy naming in iio_triggered_buffer_setup that seems to
have somehow ended up backwards (dates back a long way). Avoid the top
half and bottom half naming entirely given we are how dealing with a
handler and a thread in all cases.
* Tools cleanup including coding style, variable naming improvements, also
a new sanity check on a full event having been read.
* stk8ba50 - replace the scale table with a struct for clarity. Also suspend
the sensor if an error occurs in init.
* hid-sensor-prox - drop uneeded line break.
* mma9551 - use size in words for word read / write avoiding accidental
sending of an odd number of bytes.
* mma9553 - fix code alignment and document the use of a mutex.
* light/Kconfig - typo fix in commment.
* cm3323 - don't eat an error value, replace an unneeded local variable with
a generic local variable with the same use, add some blank lines for clarity.
* pressure/Kconfig - typo in Measurement Specialties name.
* bmc150-accel - actually use a mask definition rather than repeating the
value inline, code style cleanup.
* adc/Kconfig - general help description cleanup.
* ssp_sensors - drop redundant spi driver bus initialization (done in the
spi core)
* tmp006 - use genmask rather than hand generated masks.
* ms5611 - drop IIO_CHAN_INFO_SCALE as this driver provides a processed
output and as such the read only scale adds nothing useful.
* kxcjk-1013, adf4350, dummy - drop unwanted blank lines.
* Drop all owner assignments from i2c_drivers and this is done in the
i2c core.
For clarity, if CONFIG_SECCOMP isn't defined, seccomp_mode() is returning
"disabled". This makes that more clear, along with another 0-use, and
results in no operational change.
Signed-off-by: Kees Cook <keescook@chromium.org>
This patch is the first step in enabling checkpoint/restore of processes
with seccomp enabled.
One of the things CRIU does while dumping tasks is inject code into them
via ptrace to collect information that is only available to the process
itself. However, if we are in a seccomp mode where these processes are
prohibited from making these syscalls, then what CRIU does kills the task.
This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
filters to disable (and re-enable) seccomp filters for another task so that
they can be successfully dumped (and restored). We restrict the set of
processes that can disable seccomp through ptrace because although today
ptrace can be used to bypass seccomp, there is some discussion of closing
this loophole in the future and we would like this patch to not depend on
that behavior and be future proofed for when it is removed.
Note that seccomp can be suspended before any filters are actually
installed; this behavior is useful on criu restore, so that we can suspend
seccomp, restore the filters, unmap our restore code from the restored
process' address space, and then resume the task by detaching and have the
filters resumed as well.
v2 changes:
* require that the tracer have no seccomp filters installed
* drop TIF_NOTSC manipulation from the patch
* change from ptrace command to a ptrace option and use this ptrace option
as the flag to check. This means that as soon as the tracer
detaches/dies, seccomp is re-enabled and as a corrollary that one can not
disable seccomp across PTRACE_ATTACHs.
v3 changes:
* get rid of various #ifdefs everywhere
* report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
used
v4 changes:
* get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
directly
v5 changes:
* check that seccomp is not enabled (or suspended) on the tracer
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Will Drewry <wad@chromium.org>
CC: Roland McGrath <roland@hack.frob.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
[kees: access seccomp.mode through seccomp_mode() instead]
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull final init.h/module.h code relocation from Paul Gortmaker:
"With the release of 4.2-rc2 done, we should not be seeing any new code
added that gets upset by this small code move, and we've banked yet
another complete week of testing with this move in place on top of
4.2-rc1 via linux-next to ensure that remained true.
Given that, I'd like to put it in now so that people formulating new
work for 4.3-rc1 will be exposed to the ever so slightly stricter (but
sensible) requirements wrt. whether they are needing init.h vs.
module.h macros, even if they are not using linux-next.
The diffstat of the move is slightly asymmetrical due to needing to
leave behind a couple #ifdef in the old location and add the same ones
to the new location, but other than that, it is a 1:1 move, complete
with the module_init/exit trailing semicolon that we can't fix. That
is, until/unless someone does a tree-wide sed fix of all the
approximately 800 currently in tree users relying on it"
* tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
module: relocate module_init from init.h to module.h
Don't bother testing if we need to switch to alternate stack
unless TEE target is used.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In most cases there is no reentrancy into ip/ip6tables.
For skbs sent by REJECT or SYNPROXY targets, there is one level
of reentrancy, but its not relevant as those targets issue an absolute
verdict, i.e. the jumpstack can be clobbered since its not used
after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).
So the only special case where it is relevant is the TEE target, which
returns XT_CONTINUE.
This patch changes ip(6)_do_table to always use the jump stack starting
from 0.
When we detect we're operating on an skb sent via TEE (percpu
nf_skb_duplicated is 1) we switch to an alternate stack to leave
the original one alone.
Since there is no TEE support for arptables, it doesn't need to
test if tee is active.
The jump stack overflow tests are no longer needed as well --
since ->stacksize is the largest call depth we cannot exceed it.
A much better alternative to the external jumpstack would be to just
declare a jumps[32] stack on the local stack frame, but that would mean
we'd have to reject iptables rulesets that used to work before.
Another alternative would be to start rejecting rulesets with a larger
call depth, e.g. 1000 -- in this case it would be feasible to allocate the
entire stack in the percpu area which would avoid one dereference.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This prepares for a TEE like expression in nftables.
We want to ensure only one duplicate is sent, so both will
use the same percpu variable to detect duplication.
The other use case is detection of recursive call to xtables, but since
we don't want dependency from nft to xtables core its put into core.c
instead of the x_tables core.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Add a new set of functions for registering and unregistering per
network namespace hooks.
- Modify the old global namespace hook functions to use the per
network namespace hooks in their implementation, so their remains a
single list that needs to be walked for any hook (this is important
for keeping the hook priority working and for keeping the code
walking the hooks simple).
- Only allow registering the per netdevice hooks in the network
namespace where the network device lives.
- Dynamically allocate the structures in the per network namespace
hook list in nf_register_net_hook, and unregister them in
nf_unregister_net_hook.
Dynamic allocate is required somewhere as the number of network
namespaces are not fixed so we might as well allocate them in the
registration function.
The chain of registered hooks on any list is expected to be small so
the cost of walking that list to find the entry we are unregistering
should also be small.
Performing the management of the dynamically allocated list entries
in the registration and unregistration functions keeps the complexity
from spreading.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function obscures what is going on in nf_hook_thresh and it's existence
requires computing the hook list twice.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since no longer limiting max_sectors to BLK_DEF_MAX_SECTORS (commit 34b48db66e),
data corruption may occur on ST380013AS drive configured on 82801JI (ICH10 Family)
SATA controller. This patch will allow the driver to limit max_sectors as before
# cat /sys/block/sdb/queue/max_sectors_kb
512
I was able to double the max_sectors_kb value up to 16384 on linux-4.2.0-rc2
before seeing corruption, but seems safer to use previous limit. Without this
patch max_sectors_kb will be 32767.
tj: Minor comment update.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.19 and later
Fixes: 34b48db66e ("block: remove artifical max_hw_sectors cap")
Some devices lose data on TRIM whether queued or not. This patch adds
a horkage to disable TRIM.
tj: Collapsed unnecessary if() nesting.
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.
Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
this_cpu_*() ops have been protected against both preemption and
interrupts for quite a while now. We apparently forgot to update the
comment. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Adds a new single-purpose PIDs subsystem to limit the number of
tasks that can be forked inside a cgroup. Essentially this is an
implementation of RLIMIT_NPROC that applies to a cgroup rather than a
process tree.
However, it should be noted that organisational operations (adding and
removing tasks from a PIDs hierarchy) will *not* be prevented. Rather,
the number of tasks in the hierarchy cannot exceed the limit through
forking. This is due to the fact that, in the unified hierarchy, attach
cannot fail (and it is not possible for a task to overcome its PIDs
cgroup policy limit by attaching to a child cgroup -- even if migrating
mid-fork it must be able to fork in the parent first).
PIDs are fundamentally a global resource, and it is possible to reach
PID exhaustion inside a cgroup without hitting any reasonable kmemcg
policy. Once you've hit PID exhaustion, you're only in a marginally
better state than OOM. This subsystem allows PID exhaustion inside a
cgroup to be prevented.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add a new cgroup subsystem callback can_fork that conditionally
states whether or not the fork is accepted or rejected by a cgroup
policy. In addition, add a cancel_fork callback so that if an error
occurs later in the forking process, any state modified by can_fork can
be reverted.
Allow for a private opaque pointer to be passed from cgroup_can_fork to
cgroup_post_fork, allowing for the fork state to be stored by each
subsystem separately.
Also add a tagging system for cgroup_subsys.h to allow for CGROUP_<TAG>
enumerations to be be defined and used. In addition, explicitly add a
CGROUP_CANFORK_COUNT macro to make arrays easier to define.
This is in preparation for implementing the pids cgroup subsystem.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Tejun Heo <tj@kernel.org>