KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.
Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.
The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
As we are about to hold the kvm->lock during the create operation on KVM
devices, we should move the call to xics_debugfs_init into its own
function, since holding a mutex over extended amounts of time might not
be a good idea.
Introduce an init operation on the kvm_device_ops struct which cannot
fail and call this, if configured, after the device has been created.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Create a new header file for TI DA8XX SoC CFGCHIPx registers.
This will be used by a number of planned drivers including a new USB
PHY driver and common clock framework drivers.
The same defines *will* be removed from the platform_data header,
once all the users start using the new syscon device header.
This also fixes the following compiler error caused due to
a dependent patch not merged.
drivers/phy/phy-da8xx-usb.c:19:37:
fatal error: linux/mfd/da8xx-cfgchip.h: No such file or directory
#include <linux/mfd/da8xx-cfgchip.h>
Signed-off-by: David Lechner <david@lechnology.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Due to the (indirect) nesting of min(..., min(...)), sparse will
show a variable shadowing warning whenever bvec.h is included.
Avoid that by assigning the inner min() to a temporary variable first.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The Nomadik variant has a few special quirks that need to be respected
to make the driver work:
- The block need to be clocked during writing of the TIMn registers
or the bus will stall.
- Special bits in the control register select how many of the output
display lines get activated.
- Special bits in the control register select how to manage the
different 565 and 5551 modes.
- There is a packed 24bit graphics mode, i.e 888 pixels can be stored
in memory is three consecutive bytes, not evenly aligned to a 32bit
word.
This patch uses the vendor data pointer from the AMBA matching mechanism
to track the quirks for this variant, and adds two hooks that variants
can use to initialize boards and panels during start-up. These will
later be used to adopt a Nomadik board profile.
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
There are CLCDs connected with the pads in BGR rather than RGB
order. It really doesn't matter since the CLCD has a flag and
a bit to switch the position of the RGB and BGR components.
This is needed to put something logical into the
arm,pl11x,tft-r0g0b0-pads property of the device tree on the
Nomadik which will then be <16 8 0>.
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
If the device is probed from device tree, we can support
backlight. This is used with some systems such as the
ST Microelectronics Nomadik.
We have to add HAS_IOMEM to the dependencies of CLCD since
the backlight class device will now be selected, and if it
gets selected on an arch that does not have IOMEM,
compilation will fail.
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signalling doesn't need to be enabled at sync_file creation, it is only
required if userspace waiting the fence to signal through poll().
Thus we delay fence_add_callback() until poll is called. It only adds the
callback the first time poll() is called. This avoid re-adding the same
callback multiple times.
v2: rebase and update to work with new fence support for sync_file
v3: use atomic operation to set enabled and protect fence_add_callback()
v4: use user bit from fence flags (comment from Chris Wilson)
v5: use ternary if on poll return (comment from Chris Wilson)
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
[sumits: remove unused var status]
Link: http://patchwork.freedesktop.org/patch/msgid/1470404378-27961-1-git-send-email-gustavo@padovan.org
Creates a function that given an sync file descriptor returns a
fence containing all fences in the sync_file.
v2: Comments by Daniel Vetter
- Adapt to new version of fence_collection_init()
- Hold a reference for the fence we return
v3:
- Adapt to use fput() directly
- rename to sync_file_get_fence() as we always return one fence
v4: Adapt to use fence_array
v5: set fence through fence_get()
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Create sync_file->fence to abstract the type of fence we are using for
each sync_file. If only one fence is present we use a normal struct fence
but if there is more fences to be added to the sync_file a fence_array
is created.
This change cleans up sync_file a bit. We don't need to have sync_file_cb
array anymore. Instead, as we always have one fence, only one fence
callback is registered per sync_file.
v2: Comments from Chris Wilson and Christian König
- Not using fence_ops anymore
- fence_is_array() was created to differentiate fence from fence_array
- fence_array_teardown() is now exported and used under fence_is_array()
- struct sync_file lost num_fences member
v3: Comments from Chris Wilson and Christian König
- struct sync_file lost status member in favor of fence_is_signaled()
- drop use of fence_array_teardown()
- use sizeof(*fence) to allocate only an array on fence pointers
v4: Comments from Chris Wilson
- use sizeof(*fence) to reallocate array
- fix typo in comments
- protect num_fences sum against overflows
- use array->base instead of casting the to struct fence
v5: fixes checkpatch warnings
v6: fix case where all fences are signaled.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
This patch adds the driver implementation for ethtool link_ksettings
callbacks. qed driver now defines/uses the qed specific masks for
representing link capability values. qede driver maps these values to
to new link modes defined by the kernel implementation of link_ksettings.
Please consider applying this to 'net-next' branch.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.
The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some versions of tps65218 do not seem to support poweroff modes properly
if DCDC3 regulator is shut-down. Thus, keep it enabled even during
poweroff if the version info matches the broken silicon revision.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Version information will be needed to handle some error cases under the
regulator driver, so store the information once during MFD probe.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
cgroup_path() and friends used to format the path from the end and
thus the resulting path usually didn't start at the start of the
passed in buffer. Also, when the buffer was too small, the partial
result was truncated from the head rather than tail and there was no
way to tell how long the full path would be. These make the functions
less robust and more awkward to use.
With recent updates to kernfs_path(), cgroup_path() and friends can be
made to behave in strlcpy() style.
* cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now
always return the length of the full path. If buffer is too small,
it contains nul terminated truncated output.
* All users updated accordingly.
v2: cgroup_path() usage in kernel/sched/debug.c converted.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
It doesn't have any in-kernel user and the same result can be obtained
from kernfs_path(@kn, NULL, 0). Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
kernfs_path*() functions always return the length of the full path but
the path content is undefined if the length is larger than the
provided buffer. This makes its behavior different from strlcpy() and
requires error handling in all its users even when they don't care
about truncation. In addition, the implementation can actully be
simplified by making it behave properly in strlcpy() style.
* Update kernfs_path_from_node_locked() to always fill up the buffer
with path. If the buffer is not large enough, the output is
truncated and terminated.
* kernfs_path() no longer needs error handling. Make it a simple
inline wrapper around kernfs_path_from_node().
* sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling.
Updated accordingly.
* cgroup_path()'s use of kernfs_path() updated to retain the old
behavior.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The dummy version of kernfs_path_from_node() was missing. This
currently doesn't break anything. Let's add it for consistency and to
ease adding wrappers around it.
v2: Removed stray ';' which was causing build failures.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This message is currently really useless since it always prints a value
that comes from the printk() we just did, e.g.:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 31996, name: trinity-c1
Preemption disabled at:[<ffffffff8119db33>] down_trylock+0x13/0x80
BUG: sleeping function called from invalid context at include/linux/freezer.h:56
in_atomic(): 0, irqs_disabled(): 0, pid: 31996, name: trinity-c1
Preemption disabled at:[<ffffffff811aaa37>] console_unlock+0x2f7/0x930
Here, both down_trylock() and console_unlock() is somewhere in the
printk() path.
We should save the value before calling printk() and use the saved value
instead. That immediately reveals the offending callsite:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
Bug report:
http://marc.info/?l=linux-netdev&m=146925979821849&w=2
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russel <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.
This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.
This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Fixed modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For perf record -b, which requires the pmu::sched_task callback the
current code is rather expensive:
7.68% sched-pipe [kernel.vmlinux] [k] perf_pmu_sched_task
5.95% sched-pipe [kernel.vmlinux] [k] __switch_to
5.20% sched-pipe [kernel.vmlinux] [k] __intel_pmu_disable_all
3.95% sched-pipe perf [.] worker_thread
The problem is that it will iterate all registered PMUs, most of which
will not have anything to do. Avoid this by keeping an explicit list
of PMUs that have requested the callback.
The perf_sched_cb_{inc,dec}() functions already takes the required pmu
argument, and now that these functions are no longer called from NMI
context we can use them to manage a list.
With this patch applied the function doesn't show up in the top 4
anymore (it dropped to 18th place).
6.67% sched-pipe [kernel.vmlinux] [k] __switch_to
6.18% sched-pipe [kernel.vmlinux] [k] __intel_pmu_disable_all
3.92% sched-pipe [kernel.vmlinux] [k] switch_mm_irqs_off
3.71% sched-pipe perf [.] worker_thread
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's a perf stat bug easy to observer on a machine with only one cgroup:
$ perf stat -e cycles -I 1000 -C 0 -G /
# time counts unit events
1.000161699 <not counted> cycles /
2.000355591 <not counted> cycles /
3.000565154 <not counted> cycles /
4.000951350 <not counted> cycles /
We'd expect some output there.
The underlying problem is that there is an optimization in
perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
if the old and new cgroups in a task switch are the same.
This optimization interacts with the current code in two ways
that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
cgroup event matches the current task. These are:
1. On creation of the first cgroup event in a CPU: In current code,
cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
aforesaid optimization, perf_cgroup_sched_in will run until the next
cgroup switches in that CPU. This may happen late or never happen,
depending on system's number of cgroups, CPU load, etc.
2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
because cpuctx->cgrp == NULL until a cgroup switch occurs and
perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
mirroring what list_del_event does when removing a cgroup event from CPU
context, as introduced in:
commit 68cacd2916 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
With this patch, cpuctx->cgrp is always set/clear when installing/removing
the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
correctly set, event_filter_match works as intended when events are
sched in/out.
After the fix, the output is as expected:
$ perf stat -e cycles -I 1000 -a -G /
# time counts unit events
1.004699159 627342882 cycles /
2.007397156 615272690 cycles /
3.010019057 616726074 cycles /
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
this update allows to use registers map as following :
regs[reg_index + offset] instead of
regs[reg_index] + offset
This makes code clearer and will facilitate the addition of STMPE1600
on which LSB and MSB registers are respectively located at addr and addr + 1.
Despite for all others STMPE variant, LSB and MSB registers are respectively
located in reverse order at addr + 1 and addr.
For variant which have 3 registers's bank, we use LSB,CSB and MSB indexes
which contains respectively LSB (or LOW), CSB (or MID) and MSB (or HIGH)
register addresses (STMPE1801/STMPE24xx).
For variant which have 2 registers's bank, we use LSB and CSB indexes only.
In this case the CSB index contains the MSB regs address (STMPE 1601).
Signed-off-by: Patrice Chotard <patrice.chotard@st.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As STMPE1801/1601/24xx has a SYS_CTRL register and
STMPE1601/2403 has even a SYS_CTRL2 register, add
STMPE_IDX_SYS_CTRL/2 and update driver code accordingly
This update prepares the ground for not yet supported STMPE1600
which share similar REG_SYS_CTRL register.
Signed-off-by: Patrice Chotard <patrice.chotard@st.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This reverts commit 874f9c7da9.
Geert Uytterhoeven reports:
"This change seems to have an (unintendent?) side-effect.
Before, pr_*() calls without a trailing newline characters would be
printed with a newline character appended, both on the console and in
the output of the dmesg command.
After this commit, no new line character is appended, and the output
of the next pr_*() call of the same type may be appended, like in:
- Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
- Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
+ Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"
Joe Perches says:
"No, that is not intentional.
The newline handling code inside vprintk_emit is a bit involved and
for now I suggest a revert until this has all the same behavior as
earlier"
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Requested-by: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to the KVM API documentation a successful MSI injection
should return a value > 0 on success.
Return possible errors in vgic_its_trigger_msi() and report a
successful injection back to userland, while also reporting the
case where the MSI could not be delivered due to the guest not
having the LPI mapped, for instance.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Let's follow other driver registration functions and
automatically set the driver's owner member to THIS_MODULE when
ulpi_driver_register() is called. This allows ulpi driver writers
to forget about this boiler plate detail and avoids common bugs
in the process.
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enable equivalent function on a v5 CCP. Add support for a
version 5 CCP which enables AES/XTS/SHA services. Also,
more work on the data structures to virtualize
functionality.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Bharat Kumar Gogada reported issues with the generic MSI code, where the
end-point ended up with garbage in its MSI configuration (both for the vector
and the message).
It turns out that the two MSI paths in the kernel are doing slightly different
things:
generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI
And it turns out that end-points are allowed to latch the content of the MSI
configuration registers as soon as MSIs are enabled. In Bharat's case, the
end-point ends up using whatever was there already, which is not what you
want.
In order to make things converge, we introduce a new MSI domain flag
(MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
this flag forces the programming of the end-point as soon as the MSIs are
allocated.
A consequence of this is that we have an extra activate in irq_startup, but
that should be without much consequence.
tglx:
- Several people reported a VMWare regression with PCI/MSI-X passthrough. It
turns out that the patch also cures that issue.
- We need to have a look at the MSI disable interrupt path, where we write
the msg to all zeros without disabling MSI in the PCI device. Is that
correct?
Fixes: 52f518a3a7 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
Reported-and-tested-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Reported-and-tested-by: Foster Snowhill <forst@forstwoof.ru>
Reported-by: Matthias Prager <linux@matthiasprager.de>
Reported-by: Jason Taylor <jason.taylor@simplivity.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
MFW now supports the Selection field for IEEE mode. Add driver changes to
use the newer MFW masks to read/write the port-id value.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a new file creation we need to make sure new file is created with the
right label. New file is created in upper/ so effectively file should get
label as if task had created file in upper/.
We switched to mounter's creds for actual file creation. Also if there is a
whiteout present, then file will be created in work/ dir first and then
renamed in upper. In none of the cases file will be labeled as we want it to
be.
This patch introduces a new hook dentry_create_files_as(), which determines
the label/context dentry will get if it had been created by task in upper
and modify passed set of creds appropriately. Caller makes use of these new
creds for file creation.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: fix whitespace issues found with checkpatch.pl]
[PM: changes to use stat->mode in ovl_create_or_link()]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Provide a security hook which is called when xattrs of a file are being
copied up. This hook is called once for each xattr and LSM can return
0 if the security module wants the xattr to be copied up, 1 if the
security module wants the xattr to be discarded on the copy, -EOPNOTSUPP
if the security module does not handle/manage the xattr, or a -errno
upon an error.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: whitespace cleanup for checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Provide a security hook to label new file correctly when a file is copied
up from lower layer to upper layer of a overlay/union mount.
This hook can prepare a new set of creds which are suitable for new file
creation during copy up. Caller will use new creds to create file and then
revert back to old creds and release new creds.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: whitespace cleanup to appease checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.
Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().
Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().
Fixes: 608cd71a9c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.
That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.
In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.
So switch the interface to pass in an error label, rather than checking
the error value in the caller. Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).
So rather than
if (unsafe_get_user(x, ptr))
... handle error ..
the interface is now
unsafe_get_user(x, ptr, label);
where an error during the user mode fetch will now just cause a jump to
'label' in the caller.
Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.
Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched). But that is hopefully not a limitation in the
long term.
[ Also note that it might be a good idea to switch unsafe_get_user() to
actually _return_ the value it fetches from user space, but this
commit only changes the error handling semantics ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is required to correctly interpret INET_DIAG_INFO messages exported
by sctp_diag module.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same kind of recursive sane default limit and policy
countrol that has been implemented for the user namespace
is desirable for the other namespaces, so generalize
the user namespace refernce count into a ucount.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add a structure that is per user and per user ns and use it to hold
the count of user namespaces. This makes prevents one user from
creating denying service to another user by creating the maximum
number of user namespaces.
Rename the sysctl export of the maximum count from
/proc/sys/userns/max_user_namespaces to /proc/sys/user/max_user_namespaces
to reflect that the count is now per user.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Export the export the maximum number of user namespaces as
/proc/sys/userns/max_user_namespaces.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>