This patch refactors and reorders the FW API files in preparation of
upgrading the code to support new FW.
- Make use of the BIT macro in appropriate places.
- Whitespace changes to align values and code blocks.
- Comments are updated (spelling mistakes, removed if not clear).
- Group together code blocks which are related or deal with similar
matters.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to map a device node to a logical CPU number to avoid
duplication. Currently this is open coded in different places (e.g
gic-v3, coresight). The helper tries to map device node to a "possible"
logical CPU id, which may not be online yet. It is the responsibility
of the user to make sure that the CPU is online. The helper uses
of_cpu_device_node_get() to retrieve the device node for a given CPU
(which uses per_cpu data if available else falls back to slower
of_get_cpu_node()).
Cc: devicetree@vger.kernel.org
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When running consumer and/or producer operations and empty checks in
parallel its possible to have the empty check run past the end of the
array. The scenario occurs when an empty check is run while
__ptr_ring_discard_one() is in progress. Specifically after the
consumer_head is incremented but before (consumer_head >= ring_size)
check is made and the consumer head is zeroe'd.
To resolve this, without having to rework how consumer/producer ops
work on the array, simply add an extra dummy slot to the end of the
array. Even if we did a rework to avoid the extra slot it looks
like the normal case checks would suffer some so best to just
allocate an extra pointer.
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
i7300_idle.h is not being called by any source file and contains calls to
pci_get_bus_and_slot() that we are trying to deprecate. Remove unused file.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cgroup_subsys structure references a documentation file that has been
renamed after the v1/v2 split. Since the v2 documentation doesn't
currently contain any information on kernel interfaces for controllers,
point the user to the v1 docs.
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If a CPU is entering a low power idle state where it doesn't lose any
context, then there is no need to call cpu_pm_enter()/cpu_pm_exit().
Add a new macro(CPU_PM_CPU_IDLE_ENTER_RETENTION) to be used by cpuidle
drivers when they are entering retention state. By not calling
cpu_pm_enter and cpu_pm_exit we reduce the latency involved in
entering and exiting the retention idle states.
CPU_PM_CPU_IDLE_ENTER_RETENTION assumes that no state is lost and
hence CPU PM notifiers will not be called. We may need a broader
change if we need to support partial retention states effeciently.
On ARM64 based Qualcomm Server Platform we measured below overhead for
for calling cpu_pm_enter and cpu_pm_exit for retention states.
workload: stress --hdd #CPUs --hdd-bytes 32M -t 30
Average overhead of cpu_pm_enter - 1.2us
Average overhead of cpu_pm_exit - 3.1us
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428b ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: stable@vger.kernel.org # 2.6.32+
This driver was merged in 2011 as a tool for detecting the orientation
of a screen. The device driver assumes board file setup using the
platform data from <linux/input/gpio_tilt.h>. But no boards in the
kernel tree defines this platform data.
As I am faced with refactoring drivers to use GPIO descriptors and
pass decriptor tables from boards, or use the device tree device
drivers like these creates a serious problem: I cannot fix them and
cannot test them, not even compile-test them with a system actually
using it (no in-tree boardfile).
I suggest to delete this driver and rewrite it using device tree if
it is still in use on actively maintained systems.
I can also offer to rewrite it out of the blue using device tree if
someone promise to test it and help me iterate it.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Patchwork-Id: 10133609
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This gets rid of the deprecated do_gettimeofday() call in favor
of ktime_get(), which is also more reliable as it uses monotonic
times. The code now gets a bit simpler.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Patchwork-Id: 10076621
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
struct timeval is not y2038 safe, and what mlc->instart do is
scheduling a task in a fixed timeout, so jiffies is the
simplest choice here.
In hilse_donode(), the expires in mod_timer equals
jiffies + intimeout - (now - instart)
If we use jiffies in 'now', the expires equals
instart + intimeout
So, all we need to do is that making sure expires is a future
timestamp before passed it to mod_timer.
[arnd: slightly simplified patch further]
Link: https://lists.linaro.org/pipermail/y2038/2015-October/000937.html
Signed-off-by: WEN Pingbo <pingbo.wen@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Patchwork-Id: 10076615
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Since mlc->lcv_t is only interested in seconds, directly using time64_t
here.
This gets rid of the deprecated do_gettimeofday() and avoids problems
with time going backwards since we now use the monotonic clocksource.
Signed-off-by: WEN Pingbo <pingbo.wen@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Patchwork-Id: 10076611
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
- Move errseq.rst into core-api
- Add errseq to the core-api index
- Promote the header to a more prominent header type, otherwise we get three
entries in the table of contents.
- Reformat the table to look nicer and be a little more proportional in
terms of horizontal width per bit (the SF bit is still disproportionately
large, but there's no way to fix that).
- Include errseq kernel-doc in the errseq.rst
- Neaten some kernel-doc markup
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Replace spaces with tabs in the definition of AT24_FLAG_NO_RDROL.
Fixes: 9d404411091c ("eeprom: at24: support eeproms that do not auto-rollover reads")
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Some multi-address eeproms in the at24 family may not automatically
roll-over reads to the next slave address. On those eeproms, reads
that straddle slave boundaries will not work correctly.
Solution:
Mark such eeproms with a flag that prevents reads straddling
slave boundaries. Add the AT24_FLAG_NO_RDROL flag to the eeprom
entry in the device_id table, or add 'no-read-rollover' to the
eeprom devicetree entry.
Note that I have not personally enountered an at24 chip that
does not support read rollovers. They may or may not exist.
However, my hardware requires this functionality because of
a quirk.
Signed-off-by: Sven Van Asbroeck <svendev@arcx.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Pull timer fixes from Thomas Gleixner:
"A pile of fixes for long standing issues with the timer wheel and the
NOHZ code:
- Prevent timer base confusion accross the nohz switch, which can
cause unlocked access and data corruption
- Reinitialize the stale base clock on cpu hotplug to prevent subtle
side effects including rollovers on 32bit
- Prevent an interrupt storm when the timer softirq is already
pending caused by tick_nohz_stop_sched_tick()
- Move the timer start tracepoint to a place where it actually makes
sense
- Add documentation to timerqueue functions as they caused confusion
several times now"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timerqueue: Document return values of timerqueue_add/del()
timers: Invoke timer_start_debug() where it makes sense
nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
timers: Reinitialize per cpu bases on hotplug
timers: Use deferrable base independent of base::nohz_active
Pull irq fixes from Thomas Gleixner:
"A rather large update after the kaisered maintainer finally found time
to handle regression reports.
- The larger part addresses a regression caused by the x86 vector
management rework.
The reservation based model does not work reliably for MSI
interrupts, if they cannot be masked (yes, yet another hw
engineering trainwreck). The reason is that the reservation mode
assigns a dummy vector when the interrupt is allocated and switches
to a real vector when the interrupt is requested.
If the MSI entry cannot be masked then the initialization might
raise an interrupt before the interrupt is requested, which ends up
as spurious interrupt and causes device malfunction and worse. The
fix is to exclude MSI interrupts which do not support masking from
reservation mode and assign a real vector right away.
- Extend the extra lockdep class setup for nested interrupts with a
class for the recently added irq_desc::request_mutex so lockdep can
differeniate and does not emit false positive warnings.
- A ratelimit guard for the bad irq printout so in case a bad irq
comes back immediately the system does not drown in dmesg spam"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI
genirq/irqdomain: Rename early argument of irq_domain_activate_irq()
x86/vector: Use IRQD_CAN_RESERVE flag
genirq: Introduce IRQD_CAN_RESERVE flag
genirq/msi: Handle reactivation only on success
gpio: brcmstb: Make really use of the new lockdep class
genirq: Guard handle_bad_irq log messages
kernel/irq: Extend lockdep class for request mutex
Report to the user ifindex and namespace information of offloaded
programs. If device has disappeared return -ENODEV. Specify the
namespace using dev/inode combination.
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ns_get_path() takes struct task_struct and proc_ns_ops as its
parameters. For path resolution directly from a namespace,
e.g. based on a networking device's net name space, we need
more flexibility. Add a ns_get_path_cb() helper which will
allow callers to use any method of obtaining the name space
reference. Convert ns_get_path() to use ns_get_path_cb().
Following patches will bring a networking user.
CC: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Bound programs are quite useless after their device disappears.
They are simply waiting for reference count to go to zero,
don't list them in BPF_PROG_GET_NEXT_ID by freeing their ID
early.
Note that orphaned offload programs will return -ENODEV on
BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To allow verifier instruction callbacks without any extra locking
NETDEV_UNREGISTER notification would wait on a waitqueue for verifier
to finish. This design decision was made when rtnl lock was providing
all the locking. Use the read/write lock instead and remove the
workqueue.
Verifier will now call into the offload code, so dev_ops are moved
to offload structure. Since verifier calls are all under
bpf_prog_is_dev_bound() we no longer need static inline implementations
to please builds with CONFIG_NET=n.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We currently use aux->offload to indicate that program is bound
to a specific device. This forces us to keep the offload structure
around even after the device is gone. Add a bool member to
struct bpf_prog_aux to indicate if offload was requested.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull x86 page table isolation updates from Thomas Gleixner:
"This is the final set of enabling page table isolation on x86:
- Infrastructure patches for handling the extra page tables.
- Patches which map the various bits and pieces which are required to
get in and out of user space into the user space visible page
tables.
- The required changes to have CR3 switching in the entry/exit code.
- Optimizations for the CR3 switching along with documentation how
the ASID/PCID mechanism works.
- Updates to dump pagetables to cover the user space page tables for
W+X scans and extra debugfs files to analyze both the kernel and
the user space visible page tables
The whole functionality is compile time controlled via a config switch
and can be turned on/off on the command line as well"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
x86/ldt: Make the LDT mapping RO
x86/mm/dump_pagetables: Allow dumping current pagetables
x86/mm/dump_pagetables: Check user space page table for WX pages
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
x86/mm/pti: Add Kconfig
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
x86/mm: Use INVPCID for __native_flush_tlb_single()
x86/mm: Optimize RESTORE_CR3
x86/mm: Use/Fix PCID to optimize user/kernel switches
x86/mm: Abstract switching CR3
x86/mm: Allow flushing for future ASID switches
x86/pti: Map the vsyscall page if needed
x86/pti: Put the LDT in its own PGD if PTI is on
x86/mm/64: Make a full PGD-entry size hole in the memory map
x86/events/intel/ds: Map debug buffers in cpu_entry_area
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
x86/mm/pti: Map ESPFIX into user space
x86/mm/pti: Share entry text PMD
x86/entry: Align entry text section to PMD boundary
...
net/ipv6/ip6_gre.c is a case of parallel adds.
include/trace/events/tcp.h is a little bit more tricky. The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power management fix from Rafael Wysocki:
"This fixes a schedutil cpufreq governor regression from the 4.14 cycle
that may cause a CPU idleness check to return incorrect results in
some cases which leads to suboptimal decisions (Joel Fernandes)"
* tag 'pm-4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: schedutil: Use idle_calls counter of the remote CPU
Pull rdma fixes from Jason Gunthorpe:
"This is the next batch of for-rc patches from RDMA. It includes the
fix for the ipoib regression I mentioned last time, and the result of
a fairly major debugging effort to get iser working reliably on cxgb4
hardware - it turns out the cxgb4 driver was not handling QP error
flushing properly causing iser to fail.
- cxgb4 fix for an iser testing failure as debugged by Steve and
Sagi. The problem was a driver bug in the handling of shutting down
a QP.
- Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
error unwind and a use after free bug
- Improper congestion counter values on mlx5 when link aggregation is
enabled
- ipoib lockdep regression introduced in this merge window
- hfi1 regression supporting the device in a VM introduced in a
recent patch
- Typo that breaks future uAPI compatibility in the verbs core
- More SELinux related oops fixing
- Fix an oops during error unwind in mlx5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
IB/core: Verify that QP is security enabled in create and destroy
IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
IB/mlx5: Serialize access to the VMA list
IB/hfi: Only read capability registers if the capability exists
IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
IB/mlx5: Fix congestion counters in LAG mode
RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
iw_cxgb4: when flushing, complete all wrs in a chain
iw_cxgb4: reflect the original WR opcode in drain cqes
iw_cxgb4: Only validate the MSN for successful completions
Saeed Mahameed says:
====================
Mellanox, mlx5 E-Switch updates 2017-12-19
This series includes updates for mlx5 E-Switch infrastructures,
to be merged into net-next and rdma-next trees.
Mark's patches provide E-Switch refactoring that generalize the mlx5
E-Switch vf representors interfaces and data structures. The serious is
mainly focused on moving ethernet (netdev) specific representors logic out
of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
provides better separation and allows future support for other types of vf
representors (e.g. RDMA).
Gal's patches at the end of this serious, provide a simple syntax fix and
two other patches that handles vport ingress/egress ACL steering name
spaces to be aligned with the Firmware/Hardware specs.
V1->V2:
- Addressed coding style comments in patches #1 and #7
- The series is still based on rc4, as now I see net-next is also @rc4.
V2->V3:
- Fixed compilation warning, reported by Dave.
Please pull and let me know if there's any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
divider_recalc_rate() is an helper function used by clock divider of
different types, so the structure containing the 'hw' pointer is not
always a 'struct clk_divider'
At the following line:
> div = _get_div(table, val, flags, divider->width);
in several cases, the value of 'divider->width' is garbage as the actual
structure behind this memory is not a 'struct clk_divider'
Fortunately, this width value is used by _get_val() only when
CLK_DIVIDER_MAX_AT_ZERO flag is set. This has never been the case so
far when the structure is not a 'struct clk_divider'. This is probably
why we did not notice this bug before
Fixes: afe76c8fd0 ("clk: allow a clk divider with max divisor when zero")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Sylvain Lemieux <slemieux.tyco@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Each vport has its own root flow table for the ACL flow tables and root
flow table is per namespace, therefore we should create a namespace for
each vport.
Fixes: efdc810ba3 ("net/mlx5: Flow steering, Add vport ACL support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.
Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.
In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.
Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.
Fixes: 674e75411f (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Adds a start argument to the shm_register callback to allow the callback
to check memory type of the passed pages.
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Patches for 4.16 that are dependent on patches sent to 4.15-rc.
These are small clean ups for the vmw_pvrdma and i40iw drivers.
* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
i40iw: Change accelerated flag to bool
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-28
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Fix incorrect state pruning related to recognition of zero initialized
stack slots, where stacksafe exploration would mistakenly return a
positive pruning verdict too early ignoring other slots, from Gianluca.
2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one
in maximum call depth check, and rework maximum stack depth tracking
logic to fix a bypass of the total stack size check reported by Jann.
Also fix a bug in arm64 JIT where prog->jited_len was uninitialized.
Addition of various test cases to BPF selftests, from Alexei.
3) Addition of a BPF selftest to test_verifier that is related to BPF to
BPF calls which demonstrates a late caller stack size increase and
thus out of bounds access. Fixed above in 2). Test case from Jann.
4) Addition of correlating BPF helper calls, BPF to BPF calls as well
as BPF maps to bpftool xlated dump in order to allow for better
BPF program introspection and debugging, from Daniel.
5) Fixing several bugs in BPF to BPF calls kallsyms handling in order
to get it actually to work for subprogs, from Daniel.
6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple
of build errors for libbpf on sparc64, from David.
7) Allow narrower context access for BPF dev cgroup typed programs in
order to adapt to LLVM code generation. Also adjust memlock rlimit
in the test_dev_cgroup BPF selftest, from Yonghong.
8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py
relies on netdevsim device being available, from Jakub.
9) Reduce scope of xdp_do_generic_redirect_map() to being static,
from Xiongwei.
10) Minor cleanups and spelling fixes in BPF verifier, from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of computing max stack depth for current call chain
during the main verifier pass track stack depth of each
function independently and after do_check() is done do
another pass over all instructions analyzing depth
of all possible call stacks.
Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-12-22
1) Separate ESP handling from segmentation for GRO packets.
This unifies the IPsec GSO and non GSO codepath.
2) Add asynchronous callbacks for xfrm on layer 2. This
adds the necessary infrastructure to core networking.
3) Allow to use the layer2 IPsec GSO codepath for software
crypto, all infrastructure is there now.
4) Also allow IPsec GSO with software crypto for local sockets.
5) Don't require synchronous crypto fallback on IPsec offloading,
it is not needed anymore.
6) Check for xdo_dev_state_free and only call it if implemented.
From Shannon Nelson.
7) Check for the required add and delete functions when a driver
registers xdo_dev_ops. From Shannon Nelson.
8) Define xfrmdev_ops only with offload config.
From Shannon Nelson.
9) Update the xfrm stats documentation.
From Shannon Nelson.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the hwlock id 0 is valid for hardware spinlock core, but now id 0
is treated as one invalid value for regmap. Thus we should add one extra
flag for regmap config to indicate if a hardware spinlock should be used,
then id 0 can be valid for regmap to request.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with
two exceptions. First, it prints stack for multiple hits and not only
once as WARN_ONCE() does. Second, the user can disable prints of
WARN_ONCE by setting CONFIG_BUG to N.
The multiple prints of dump stack are actually not needed, because calls
without rtnl lock are programming errors and user can't do anything
about them except to complain to the mailing list after first occurrence
of such failure.
The user who disabled BUG/WARN prints did it explicitly because by default
in upstream kernel and distributions this option is enabled. It means
that user doesn't want to see prints about missing locks too.
This patch replaces open-coded variant in favor of already existing
macro and change error prints to be once only.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original purpose of the per-superblock d_anon list was to
keep disconnected dentries in the cache between consecutive
requests to the NFS server. Dentries can be disconnected if
a client holds a file open and repeatedly performs IO on it,
and if the server drops the dentry, whether due to memory
pressure, server restart, or "echo 3 > /proc/sys/vm/drop_caches".
This purpose was thwarted by commit 75a6f82a0d ("freeing unlinked
file indefinitely delayed") which caused disconnected dentries
to be freed as soon as their refcount reached zero.
This means that, when a dentry being used by nfsd gets disconnected, a
new one needs to be allocated for every request (unless requests
overlap). As the dentry has no name, no parent, and no children,
there is little of value to cache. As small memory allocations are
typically fast (from per-cpu free lists) this likely has little cost.
This means that the original purpose of s_anon is no longer relevant:
there is no longer any need to keep disconnected dentries on a list so
they appear to be hashed.
However, s_anon now has a new use. When you mount an NFS filesystem,
the dentry stored in s_root is just a placebo. The "real" root dentry
is allocated using d_obtain_root() and so it kept on the s_anon list.
I don't know the reason for this, but suspect it related to NFSv4
where a mount of "server:/some/path" require NFS to look up the root
filehandle on the server, then walk down "/some" and "/path" to get
the filehandle to mount.
Whatever the reason, NFS depends on the s_anon list and on
shrink_dcache_for_umount() pruning all dentries on this list. So we
cannot simply remove s_anon.
We could just leave the code unchanged, but apart from that being
potentially confusing, the (unfair) bit-spin-lock which protects
s_anon can become a bottle neck when lots of disconnected dentries are
being created.
So this patch renames s_anon to s_roots, and stops storing
disconnected dentries on the list. Only dentries obtained with
d_obtain_root() are now stored on this list. There are many fewer of
these (only NFS and NILFS2 use the call, and only during filesystem
mount) so contention on the bit-lock will not be a problem.
Possibly an alternate solution should be found for NFS and NILFS2, but
that would require understanding their needs first.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>