introduce BPF_PROG_QUERY command to retrieve a set of either
attached programs to given cgroup or a set of effective programs
that will execute for events within a cgroup
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.
The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
that cgroup program gets run in addition to the program in this cgroup.
NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.
Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.
To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
on powerpc because it is called multiple times for the boot CPU.
The first call is via:
start_wd_on_cpu+0x80/0x2f0
watchdog_nmi_reconfigure+0x124/0x170
softlockup_reconfigure_threads+0x110/0x130
lockup_detector_init+0xbc/0xe0
kernel_init_freeable+0x18c/0x37c
kernel_init+0x2c/0x160
ret_from_kernel_thread+0x5c/0xbc
And then again via the CPU hotplug registration:
start_wd_on_cpu+0x80/0x2f0
cpuhp_invoke_callback+0x194/0x620
cpuhp_thread_fun+0x7c/0x1b0
smpboot_thread_fn+0x290/0x2a0
kthread+0x168/0x1b0
ret_from_kernel_thread+0x5c/0xbc
This can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().
Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
In may, Steven sent a patch deleting the bounce buffer handling
and the CONFIG_MMC_BLOCK_BOUNCE option.
I chose the less invasive path of making it a runtime config
option, and we merged that successfully for kernel v4.12.
The code is however just standing in the way and taking up
space for seemingly no gain on any systems in wide use today.
Pierre says the code was there to improve speed on TI SDHCI
controllers on certain HP laptops and possibly some Ricoh
controllers as well. Early SDHCI controllers lacked the
scatter-gather feature, which made software bounce buffers
a significant speed boost.
We are clearly talking about the list of SDHCI PCI-based
MMC/SD card readers found in the pci_ids[] list in
drivers/mmc/host/sdhci-pci-core.c.
The TI SDHCI derivative is not supported by the upstream
kernel. This leaves the Ricoh.
What we can however notice is that the x86 defconfigs in the
kernel did not enable CONFIG_MMC_BLOCK_BOUNCE option, which
means that any such laptop would have to have a custom
configured kernel to actually take advantage of this
bounce buffer speed-up. It simply seems like there was
a speed optimization for the Ricoh controllers that noone
was using. (I have not checked the distro defconfigs but
I am pretty sure the situation is the same there.)
Bounce buffers increased performance on the OMAP HSMMC
at one point, and was part of the original submission in
commit a45c6cb816 ("[ARM] 5369/1: omap mmc: Add new
omap hsmmc controller for 2430 and 34xx, v3")
This optimization was removed in
commit 0ccd76d4c2 ("omap_hsmmc: Implement scatter-gather
emulation")
which found that scatter-gather emulation provided even
better performance.
The same was introduced for SDHCI in
commit 2134a922c6 ("sdhci: scatter-gather (ADMA) support")
I am pretty positively convinced that software
scatter-gather emulation will do for any host controller what
the bounce buffers were doing. Essentially, the bounce buffer
was a reimplementation of software scatter-gather-emulation in
the MMC subsystem, and it should be done away with.
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Juha Yrjola <juha.yrjola@solidboot.com>
Cc: Steven J. Hill <Steven.Hill@cavium.com>
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Suggested-by: Steven J. Hill <Steven.Hill@cavium.com>
Suggested-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
pfn_to_section_nr() has no issue even if it is defined as macro. But
section_nr_to_pfn() has overflow issue if sec is defined as int.
section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT. If sec is
defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
value.
__remove_section() calculates start_pfn using section_nr_to_pfn() and
scn_nr defined as int. So if hot-removed memory address is over 16TB,
overflow issue occurs and section_nr_to_pfn() does not calculate correct
pfn.
To make callers use proper arg, the patch changes the macros to inline
functions.
Fixes: 815121d2b5 ("memory_hotplug: clear zone when removing the memory")
Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.com
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".
It looks like this code was always wrong, then commit 948b701a60
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.
This patch (of 6):
load_script() can simply use i_name instead, it points into bprm->buf[]
and nobody can change this memory until we call prepare_binprm().
The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.
While at it, do whitespace/style cleanups.
NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.
Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Travis Gummels <tgummels@redhat.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea has noticed that the oom_reaper doesn't invalidate the range via
mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can
corrupt the memory of the kvm guest for example.
tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not
sufficient as per Andrea:
"mmu_notifier_invalidate_range cannot be used in replacement of
mmu_notifier_invalidate_range_start/end. For KVM
mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
notifier implementation has to implement either ->invalidate_range
method or the invalidate_range_start/end methods, not both. And if you
implement invalidate_range_start/end like KVM is forced to do, calling
mmu_notifier_invalidate_range in common code is a noop for KVM.
For those MMU notifiers that can get away only implementing
->invalidate_range, the ->invalidate_range is implicitly called by
mmu_notifier_invalidate_range_end(). And only those secondary MMUs
that share the same pagetable with the primary MMU (like AMD iommuv2)
can get away only implementing ->invalidate_range"
As the callback is allowed to sleep and the implementation is out of
hand of the MM it is safer to simply bail out if there is an mmu
notifier registered. In order to not fail too early make the
mm_has_notifiers check under the oom_lock and have a little nap before
failing to give the current oom victim some more time to exit.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org
Fixes: aac4536355 ("mm, oom: introduce oom reaper")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Device alias can be set by either rtnetlink (rtnl is held) or sysfs.
rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
Add an extra mutex for it and use rcu to protect concurrent accesses.
This allows the sysfs path to not take rtnl and would later allow
to not hold it when dumping ifalias.
Based on suggestion from Eric Dumazet.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow the ipmr module to do partial multicast forwarding
according to the device parent ID, add the device parent ID field to the
VIF struct. This way, the forwarding path can use the parent ID field
without invoking switchdev calls, which requires the RTNL lock.
When a new VIF is added, set the device parent ID field in it by invoking
the switchdev_port_attr_get call.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
used to allow partial offloading of MFC multicast routes.
Switchdev drivers can offload MFC multicast routes to the hardware by
registering to the FIB notification chain. When one of the route output
interfaces is not offload-able, i.e. has different parent ID, the route
cannot be fully offloaded by the hardware. Examples to non-offload-able
devices are a management NIC, dummy device, pimreg device, etc.
Similar problem exists in the bridge module, as one bridge can hold
interfaces with different parent IDs. At the bridge, the problem is solved
by the offload_fwd_mark skb field.
Currently, when a route cannot go through full offload, the only solution
for a switchdev driver is not to offload it at all and let the packet go
through slow path.
Using the offload_mr_fwd_mark field, a driver can indicate that a packet
was already forwarded by hardware to all the devices with the same parent
ID as the input device. Further patches in this patch-set are going to
enhance ipmr to skip multicast forwarding to devices with the same parent
ID if a packets is marked with that field.
The reason why the already existing "offload_fwd_mark" bit cannot be used
is that a switchdev driver would want to make the distinction between a
packet that has already gone through L2 forwarding but did not go through
multicast forwarding, and a packet that has already gone through both L2
and multicast forwarding.
For example: when a packet is ingressing from a switchport enslaved to a
bridge, which is configured with multicast forwarding, the following
scenarios are possible:
- The packet can be trapped to the CPU due to exception while multicast
forwarding (for example, MTU error). In that case, it had already gone
through L2 forwarding in the hardware, thus A switchdev driver would
want to set the skb->offload_fwd_mark and not the
skb->offload_mr_fwd_mark.
- The packet can also be trapped due to a pimreg/dummy device used as one
of the output interfaces. In that case, it can go through both L2 and
(partial) multicast forwarding inside the hardware, thus a switchdev
driver would want to set both the skb->offload_fwd_mark and
skb->offload_mr_fwd_mark.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellaox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull staging/IIO fixes from Greg KH:
"Here are some small staging/IIO driver fixes for 4.14-rc4
Most of these have been in my tree for a while due to travels, sorry
for the delay. They resolve a number of small issues reported by
people, mostly for the iio drivers. Nothing major in here, full
details are in the shortlog.
All have been linux-next for a few weeks with no reported issues"
* tag 'staging-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
iio: core: Return error for failed read_reg
iio: ad7793: Fix the serial interface reset
iio: ad_sigma_delta: Implement a dedicated reset function
IIO: BME280: Updates to Humidity readings need ctrl_reg write!
iio: adc: mcp320x: Fix readout of negative voltages
iio: adc: mcp320x: Fix oops on module unload
iio: adc: stm32: fix bad error check on max_channels
iio: trigger: stm32-timer: fix a corner case to write preset
iio: trigger: stm32-timer: preset shouldn't be buffered
iio: adc: twl4030: Return an error if we can not enable the vusb3v1 regulator in 'twl4030_madc_probe()'
iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
staging: rtl8723bs: avoid null pointer dereference on pmlmepriv
staging: rtl8723bs: add missing range check on id
staging: vchiq_2835_arm: Fix NULL ptr dereference in free_pagelist
staging: speakup: fix speakup-r empty line lockup
staging: pi433: Move limit check to switch default to kill warning
staging: r8822be: fix null pointer dereferences with a null driver_adapter
staging: mt29f_spinand: Enable the read ECC before program the page
...
Pull driver core fixes from Greg KH:
"Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of
mine...) So there's a simple removal of the last user, and then
finally the macro is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year
in the future we can drop the workaround, once users update to the
latest version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer
overflow fix to match the PCI fixes you took through the PCI tree in
the same area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay"
* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove DRIVER_ATTR
fpga: altera-cvp: remove DRIVER_ATTR() usage
driver core: platform: Don't read past the end of "driver_override" buffer
base: arch_topology: fix section mismatch build warnings
driver core: suppress sending MODALIAS in UNBIND uevents
In order to support things like networking over Thunderbolt cable, there
needs to be a way to switch the ring to a mode where it can be polled
with the interrupt masked. We implement such mode so that the caller can
allocate a ring by passing pointer to a function that is then called
when an interrupt is triggered. Completed frames can be fetched using
tb_ring_poll() and the interrupt can be re-enabled when the caller is
finished with polling by using tb_ring_poll_complete().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are used by Thunderbolt services to send and receive frames over
the high-speed DMA rings.
We also put the functions to tb_ namespace to make sure we do not
collide with others and add missing kernel-doc comments for the exported
functions.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When two hosts are connected over a Thunderbolt cable, there is a
protocol they can use to communicate capabilities supported by the host.
The discovery protocol uses automatically configured control channel
(ring 0) and is build on top of request/response transactions using
special XDomain primitives provided by the Thunderbolt base protocol.
The capabilities consists of a root directory block of basic properties
used for identification of the host, and then there can be zero or more
directories each describing a Thunderbolt service and its capabilities.
Once both sides have discovered what is supported the two hosts can
setup high-speed DMA paths and transfer data to the other side using
whatever protocol was agreed based on the properties. The software
protocol used to communicate which DMA paths to enable is service
specific.
This patch adds support for the XDomain discovery protocol to the
Thunderbolt bus. We model each remote host connection as a Linux XDomain
device. For each Thunderbolt service found supported on the XDomain
device, we create Linux Thunderbolt service device which Thunderbolt
service drivers can then bind to based on the protocol identification
information retrieved from the property directory describing the
service.
This code is based on the work done by Amir Levy and Michael Jamet.
Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A Thunderbolt service might need to find the physical port from a link
the cable is connected to. For instance networking driver uses this
information to generate MAC address according the Apple ThunderboltIP
protocol.
Move this function to thunderbolt.h and rename it to
tb_phy_port_from_link() to reflect the fact that it does not take switch
as parameter.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thunderbolt XDomain discovery protocol uses directories which contain
properties and other directories to exchange information about what
capabilities the remote host supports. This also includes identification
information like device ID and name.
This adds support for parsing and formatting these properties and
establishes an API drivers can use in addition to the core Thunderbolt
driver. This API is exposed in a new header: include/linux/thunderbolt.h.
This code is based on the work done by Amir Levy and Michael Jamet.
Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fixes from Thomas Gleixner:
"This adds a new timer wheel function which is required for the
conversion of the timer callback function from the 'unsigned long
data' argument to 'struct timer_list *timer'. This conversion has two
benefits:
1) It makes struct timer_list smaller
2) Many callers hand in a pointer to the timer or to the structure
containing the timer, which happens via type casting both at setup
and in the callback. This change gets rid of the typecasts.
Once the conversion is complete, which is planned for 4.15, the old
setup function and the intermediate typecast in the new setup function
go away along with the data field in struct timer_list.
Merging this now into mainline allows a smooth queueing of the actual
conversion in the affected maintainer trees without creating
dependencies"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
um/time: Fixup namespace collision
timer: Prepare to change timer callback argument type
Pull smp/hotplug fixes from Thomas Gleixner:
"This addresses the fallout of the new lockdep mechanism which covers
completions in the CPU hotplug code.
The lockdep splats are false positives, but there is no way to
annotate that reliably. The solution is to split the completions for
CPU up and down, which requires some reshuffling of the failure
rollback handling as well"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Hotplug state fail injection
smp/hotplug: Differentiate the AP completion between up and down
smp/hotplug: Differentiate the AP-work lockdep class between up and down
smp/hotplug: Callback vs state-machine consistency
smp/hotplug: Rewrite AP state machine core
smp/hotplug: Allow external multi-instance rollback
smp/hotplug: Add state diagram
Pull scheduler fixes from Thomas Gleixner:
"The scheduler pull request comes with the following updates:
- Prevent a divide by zero issue by validating the input value of
sysctl_sched_time_avg
- Make task state printing consistent all over the place and have
explicit state characters for IDLE and PARKED so they wont be
displayed as 'D' state which confuses tools"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/sysctl: Check user input value of sysctl_sched_time_avg
sched/debug: Add explicit TASK_PARKED printing
sched/debug: Ignore TASK_IDLE for SysRq-W
sched/debug: Add explicit TASK_IDLE printing
sched/tracing: Use common task-state helpers
sched/tracing: Fix trace_sched_switch task-state printing
sched/debug: Remove unused variable
sched/debug: Convert TASK_state to hex
sched/debug: Implement consistent task-state printing
With DSA, a master net device (CPU facing interface) has a dsa_ptr
pointer to which hangs a dsa_switch_tree. This is not correct because a
master interface is wired to a dedicated switch port, and because we can
theoretically have several master interfaces pointing to several CPU
ports of the same switch fabric.
Change the master interface's dsa_ptr for the CPU dsa_port pointer.
This is a step towards supporting multiple CPU ports.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the PF allocates a default number of queues for each VF and
cannot be changed. This patch enables the VF to request a different
number of queues allocated to it. This patch also adds a new virtchnl
op and capability flag to facilitate this negotiation.
After the PF receives a request message, it will set a requested number
of queues for that VF. Then when the VF resets, its VSI will get a new
number of queues allocated to it.
This is a best effort request and since we only allocate a guaranteed
default number, if the VF tries to ask for more than the guaranteed
number, there may not be enough in HW to accommodate it unless other
queues for other VFs are freed. It should also be noted decreasing the
number queues allocated to a VF to below the default will NOT enable the
allocation of more than 32 VFs per PF and will not free queues guaranteed
to each VF by default.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull IOMMU fixes from Joerg Roedel:
- A comment fix for 'struct iommu_ops'
- Format string fixes for AMD IOMMU, unfortunatly I missed that during
review.
- Limit mediatek physical addresses to 32 bit for v7s to fix a warning
triggered in io-page-table code.
- Fix dma-sync in io-pgtable-arm-v7s code
* tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix comment for iommu_ops.map_sg
iommu/amd: pr_err() strings should end with newlines
iommu/mediatek: Limit the physical address in 32bit for v7s
iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.
Fixes: aee12a0a37 ("ebtables: remove nf_hook_register usage")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it
its own print state because it will not in fact get woken by regular
wakeups and is a long-term state.
This requires moving TASK_PARKED into the TASK_REPORT mask, and since
that latter needs to be a contiguous bitmask, we need to shuffle the
bits around a bit.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch allows userspace to specify a name for a map
during BPF_MAP_CREATE.
The map's name can later be exported to user space
via BPF_OBJ_GET_INFO_BY_FD.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds name and load_time to struct bpf_prog_aux. They
are also exported to bpf_prog_info.
The bpf_prog's name is passed by userspace during BPF_PROG_LOAD.
The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes
and the name stored in the kernel is always \0 terminated.
The kernel will reject name that contains characters other than
isalnum() and '_'. It will also reject name that is not null
terminated.
The existing 'user->uid' of the bpf_prog_aux is also exported to
the bpf_prog_info as created_by_uid.
The existing 'used_maps' of the bpf_prog_aux is exported to
the newly added members 'nr_map_ids' and 'map_ids' of
the bpf_prog_info. On the input, nr_map_ids tells how
big the userspace's map_ids buffer is. On the output,
nr_map_ids tells the exact user_map_cnt and it will only
copy up to the userspace's map_ids buffer is allowed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Negative ARP header length are not a thing.
Constify arguments while I'm at it.
Space savings:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3)
function old new delta
arpt_do_table 1163 1160 -3
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modern kernel callback systems pass the structure associated with a
given callback to the callback function. The timer callback remains one
of the legacy cases where an arbitrary unsigned long argument continues
to be passed as the callback argument. This has several problems:
- This bloats the timer_list structure with a normally redundant
.data field.
- No type checking is being performed, forcing callbacks to do
explicit type casts of the unsigned long argument into the object
that was passed, rather than using container_of(), as done in most
of the other callback infrastructure.
- Neighboring buffer overflows can overwrite both the .function and
the .data field, providing attackers with a way to elevate from a buffer
overflow into a simplistic ROP-like mechanism that allows calling
arbitrary functions with a controlled first argument.
- For future Control Flow Integrity work, this creates a unique function
prototype for timer callbacks, instead of allowing them to continue to
be clustered with other void functions that take a single unsigned long
argument.
This adds a new timer initialization API, which will ultimately replace
the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
named timer_setup() (to mirror hrtimer_setup(), making instances of its
use much easier to grep for).
In order to support the migration of existing timers into the new
callback arguments, timer_setup() casts its arguments to the existing
legacy types, and explicitly passes the timer pointer as the legacy
data argument. Once all setup_*timer() callers have been replaced with
timer_setup(), the casts can be removed, and the data argument can be
dropped with the timer expiration code changed to just pass the timer
to the callback directly.
Since the regular pattern of using container_of() during local variable
declaration repeats the need for the variable type declaration
to be included, this adds a helper modeled after other from_*()
helpers that wrap container_of(), named from_timer(). This helper uses
typeof(*variable), removing the type redundancy and minimizing the need
for line wraps in forthcoming conversions from "unsigned data long" to
"struct timer_list *" in the timer callbacks:
-void callback(unsigned long data)
+void callback(struct timer_list *t)
{
- struct some_data_structure *local = (struct some_data_structure *)data;
+ struct some_data_structure *local = from_timer(local, t, timer);
Finally, in order to support the handful of timer users that perform
open-coded assignments of the .function (and .data) fields, provide
cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used
temporarily. Once conversion has been completed, these can be globally
trivially removed.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
Added check for the maximal number of flow counters attached
to rule (FTE).
Fixes: bd5251dbf1 ('net/mlx5_core: Introduce flow steering destination of type counter')
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, FPGA capability is located in (mdev)->caps.hca_cur,
change the location to be (mdev)->caps.fpga,
since hca_cur is reserved for HCA device capabilities.
Fixes: e29341fb3a ("net/mlx5: FPGA, Add basic support for Innova")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>