hda_intel.c:azx_probe() defers initialization of an audio controller
on the discrete GPU if the GPU is powered off. The power state of the
GPU is determined by calling vga_switcheroo_get_client_state().
vga_switcheroo_get_client_state() returns VGA_SWITCHEROO_INIT if
vga_switcheroo is not enabled, i.e. if no second GPU or no handler
has registered.
This can go wrong in the following scenario:
- Driver for the integrated GPU is not loaded.
- Driver for the discrete GPU registers with vga_switcheroo, uses driver
power control to power down the GPU, handler cuts power to the GPU.
- Driver for the audio controller gets loaded after the GPU was powered
down, calls vga_switcheroo_get_client_state() which returns
VGA_SWITCHEROO_INIT instead of VGA_SWITCHEROO_OFF.
- Consequence: azx_probe() tries to initialize the audio controller even
though the GPU is powered down.
The power state VGA_SWITCHEROO_INIT was introduced by c8e9cf7bb2
("vga_switcheroo: Add a helper function to get the client state").
It is not apparent what its benefit might be. The idea seems to
be to initialize the audio controller even if the power state is
VGA_SWITCHEROO_OFF (were vga_switcheroo enabled), but as shown
above this can fail.
Drop VGA_SWITCHEROO_INIT to solve this.
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Updated register headers for GFX 8.1 for Stoney
- Add some new CZ revisions
- minor pageflip optimizations
- Fencing clean up
- Warning fix
- More fence cleanup
- oops fix
- Fiji fixes
* 'drm-next-4.4' of git://people.freedesktop.org/~agd5f/linux: (29 commits)
drm/amdgpu: group together common fence implementation
drm/amdgpu: remove AMDGPU_FENCE_OWNER_MOVE
drm/amdgpu: remove now unused fence functions
drm/amdgpu: fix fence fallback check
drm/amdgpu: fix stoping the scheduler timeout
drm/amdgpu: cleanup on error in amdgpu_cs_ioctl()
drm/amdgpu: update Fiji's Golden setting
drm/amdgpu: update Fiji's rev id
drm/amdgpu: extract common code in vi_common_early_init
drm/amd/scheduler: don't oops on failure to load
drm/amdgpu: don't oops on failure to load (v2)
drm/amdgpu: don't VT switch on suspend
drm/amdgpu: Make amdgpu_mn functions inline
drm/amdgpu: remove amdgpu_fence_ref/unref
drm/amdgpu: use common fence for sync
drm/amdgpu: use the new fence_is_later
drm/amdgpu: use common fences for VMID management v2
drm/amdgpu: move ring_from_fence to common code
drm/amdgpu: switch to common fence_wait_any_timeout v2
drm/amdgpu: remove unneeded fence functions
...
Pull s390 updates from Martin Schwidefsky:
"There is only one new feature in this pull for the 4.4 merge window,
most of it is small enhancements, cleanup and bug fixes:
- Add the s390 backend for the software dirty bit tracking. This
adds two new pgtable functions pte_clear_soft_dirty and
pmd_clear_soft_dirty which is why there is a hit to
arch/x86/include/asm/pgtable.h in this pull request.
- A series of cleanup patches for the AP bus, this includes the
removal of the support for two outdated crypto cards (PCICC and
PCICA).
- The irq handling / signaling on buffer full in the runtime
instrumentation code is dropped.
- Some micro optimizations: remove unnecessary memory barriers for a
couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for
spin_unlock. Use the builtin bswap if available and make
test_and_set_bit_lock more cache friendly.
- Statistics and a tracepoint for the diagnose calls to the
hypervisor.
- The CPU measurement facility support to sample KVM guests is
improved.
- The vector instructions are now always enabled for user space
processes if the hardware has the vector facility. This simplifies
the FPU handling code. The fpu-internal.h header is split into fpu
internals, api and types just like x86.
- Cleanup and improvements for the common I/O layer.
- Rework udelay to solve a problem with kprobe. udelay has busy loop
semantics but still uses an idle processor state for the wait"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits)
s390: remove runtime instrumentation interrupts
s390/cio: de-duplicate subchannel validation
s390/css: unneeded initialization in for_each_subchannel
s390/Kconfig: use builtin bswap
s390/dasd: fix disconnected device with valid path mask
s390/dasd: fix invalid PAV assignment after suspend/resume
s390/dasd: fix double free in dasd_eckd_read_conf
s390/kernel: fix ptrace peek/poke for floating point registers
s390/cio: move ccw_device_stlck functions
s390/cio: move ccw_device_call_handler
s390/topology: reduce per_cpu() invocations
s390/nmi: reduce size of percpu variable
s390/nmi: fix terminology
s390/nmi: remove casts
s390/nmi: remove pointless error strings
s390: don't store registers on disabled wait anymore
s390: get rid of __set_psw_mask()
s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
s390/dasd: fix list_del corruption after lcu changes
s390/spinlock: remove unneeded serializations at unlock
...
The value returned from iommu_tbl_range_alloc() (and the one passed
in as a fourth argument to iommu_tbl_range_free) is not a DMA address,
it is rather an index into the IOMMU page table.
Therefore using DMA_ERROR_CODE is not appropriate.
Use a more type matching error code define, IOMMU_ERROR_CODE, and
update all users of this interface.
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull parisc updates from Helge Deller:
"The most important change is that we reduce L1_CACHE_BYTES to 16
bytes, for which a trivial patch for XPS in the network layer was
needed. Then we wire up the sys_membarrier and userfaultfd syscalls
and added two other small cleanups"
* 'parisc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Change L1_CACHE_BYTES to 16
net/xps: Fix calculation of initial number of xps queues
parisc: reduce syslog debug output
parisc: serial/mux: Convert to uart_console_device instead of open-coded
parisc: Wire up userfaultfd syscall
parisc: allocate sys_membarrier system call number
Pull networking updates from David Miller:
Changes of note:
1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.
2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
David Ahern.
3) Allow the user to ask for the statistics to be filtered out of
ipv4/ipv6 address netlink dumps. From Sowmini Varadhan.
4) More work to pass the network namespace context around deep into
various packet path APIs, starting with the netfilter hooks. From
Eric W Biederman.
5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
Richter.
6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.
7) Support Very High Throughput in wireless MESH code, from Bob
Copeland.
8) Allow setting the ageing_time in switchdev/rocker. From Scott
Feldman.
9) Properly autoload L2TP type modules, from Stephen Hemminger.
10) Fix and enable offload features by default in 8139cp driver, from
David Woodhouse.
11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
Jiri Benc.
12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
Opstad.
13) Fix IPSEC flowcache overflows on large systems, from Steffen
Klassert.
14) Convert bridging to track VLANs using rhashtable entries rather than
a bitmap. From Nikolay Aleksandrov.
15) Make TCP listener handling completely lockless, this is a major
accomplishment. Incoming request sockets now live in the
established hash table just like any other socket too.
From Eric Dumazet.
15) Provide more bridging attributes to netlink, from Nikolay
Aleksandrov.
16) Use hash based algorithm for ipv4 multipath routing, this was very
long overdue. From Peter Nørlund.
17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann.
18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.
19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This
influences the port binding selection logic used by SO_REUSEPORT.
20) Add ipv6 support to VRF, from David Ahern.
21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.
22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.
23) Implement RACK loss recovery in TCP, from Yuchung Cheng.
24) Support multipath routes in MPLS, from Roopa Prabhu.
25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
Dumazet.
26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
Sudarsana Kalluru.
27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
Sowa.
28) Support ipv6 geneve tunnels, from John W Linville.
29) Add flood control support to switchdev layer, from Ido Schimmel.
30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
Hannes Frederic Sowa.
31) Support persistent maps and progs in bpf, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
sh_eth: use DMA barriers
switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
net: sched: kill dead code in sch_choke.c
irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
net: dsa: mv88e6xxx: include DSA ports in VLANs
net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
net/core: fix for_each_netdev_feature
vlan: Invoke driver vlan hooks only if device is present
arcnet/com20020: add LEDS_CLASS dependency
bpf, verifier: annotate verbose printer with __printf
dp83640: Only wait for timestamps for packets with timestamping enabled.
ptp: Change ptp_class to a proper bitmask
dp83640: Prune rx timestamp list before reading from it
dp83640: Delay scheduled work.
dp83640: Include hash in timestamp/packet matching
ipv6: fix tunnel error handling
net/mlx5e: Fix LSO vlan insertion
net/mlx5e: Re-eanble client vlan TX acceleration
net/mlx5e: Return error in case mlx5e_set_features() fails
net/mlx5e: Don't allow more than max supported channels
...
Pull crypto update from Herbert Xu:
"API:
- Add support for cipher output IVs in testmgr
- Add missing crypto_ahash_blocksize helper
- Mark authenc and des ciphers as not allowed under FIPS.
Algorithms:
- Add CRC support to 842 compression
- Add keywrap algorithm
- A number of changes to the akcipher interface:
+ Separate functions for setting public/private keys.
+ Use SG lists.
Drivers:
- Add Intel SHA Extension optimised SHA1 and SHA256
- Use dma_map_sg instead of custom functions in crypto drivers
- Add support for STM32 RNG
- Add support for ST RNG
- Add Device Tree support to exynos RNG driver
- Add support for mxs-dcp crypto device on MX6SL
- Add xts(aes) support to caam
- Add ctr(aes) and xts(aes) support to qat
- A large set of fixes from Russell King for the marvell/cesa driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
hwrng: exynos - Add Device Tree support
hwrng: exynos - Fix missing configuration after suspend to RAM
hwrng: exynos - Add timeout for waiting on init done
dt-bindings: rng: Describe Exynos4 PRNG bindings
crypto: marvell/cesa - use __le32 for hardware descriptors
crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
crypto: marvell/cesa - use gfp_t for gfp flags
crypto: marvell/cesa - use dma_addr_t for cur_dma
crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
crypto: caam - fix indentation of close braces
crypto: caam - only export the state we really need to export
crypto: caam - fix non-block aligned hash calculation
crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
crypto: caam - print errno code when hash registration fails
crypto: marvell/cesa - fix memory leak
crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
crypto: marvell/cesa - rearrange handling for sw padded hashes
...
There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system. There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines. However, there are still those users
that want userspace drivers even under those conditions. The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has. In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.
This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver. This
should make it very clear that this mode is not safe. Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode. Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported. This patch includes no-iommu support for the vfio-pci bus
driver only.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
The function is not used outside device assignment, and
kvm_arch_set_irq_inatomic has a different prototype. Move it here and
make it static to avoid confusion.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine. So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi. Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.
Conflicts:
arch/x86/include/asm/kvm_host.h
This patch makes audit_string_contains_control return bool to improve
readability due to this particular function only using either one or
zero as its return value.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
This patch makes audit_dummy_context return bool due to this
particular function only using either one or zero as its return
value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle were:
- sched/fair load tracking fixes and cleanups (Byungchul Park)
- Make load tracking frequency scale invariant (Dietmar Eggemann)
- sched/deadline updates (Juri Lelli)
- stop machine fixes, cleanups and enhancements for bugs triggered by
CPU hotplug stress testing (Oleg Nesterov)
- scheduler preemption code rework: remove PREEMPT_ACTIVE and related
cleanups (Peter Zijlstra)
- Rework the sched_info::run_delay code to fix races (Peter Zijlstra)
- Optimize per entity utilization tracking (Peter Zijlstra)
- ... misc other fixes, cleanups and smaller updates"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
sched: Start stopper early
stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
sched/x86: Fix typo in __switch_to() comments
sched/core: Remove a parameter in the migrate_task_rq() function
sched/core: Drop unlikely behind BUG_ON()
sched/core: Fix task and run queue sched_info::run_delay inconsistencies
sched/numa: Fix task_tick_fair() from disabling numa_balancing
sched/core: Add preempt_count invariant check
sched/core: More notrace annotations
sched/core: Kill PREEMPT_ACTIVE
sched/core, sched/x86: Kill thread_info::saved_preempt_count
sched/core: Simplify preempt_count tests
sched/core: Robustify preemption leak checks
sched/core: Stop setting PREEMPT_ACTIVE
...
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Improve accuracy of perf/sched clock on x86. (Adrian Hunter)
- Intel DS and BTS updates. (Alexander Shishkin)
- Intel cstate PMU support. (Kan Liang)
- Add group read support to perf_event_read(). (Peter Zijlstra)
- Branch call hardware sampling support, implemented on x86 and
PowerPC. (Stephane Eranian)
- Event groups transactional interface enhancements. (Sukadev
Bhattiprolu)
- Enable proper x86/intel/uncore PMU support on multi-segment PCI
systems. (Taku Izumi)
- ... misc fixes and cleanups.
The perf tooling team was very busy again with 200+ commits, the full
diff doesn't fit into lkml size limits. Here's an (incomplete) list
of the tooling highlights:
New features:
- Change the default event used in all tools (record/top): use the
most precise "cycles" hw counter available, i.e. when the user
doesn't specify any event, it will try using cycles:ppp, cycles:pp,
etc and fall back transparently until it finds a working counter.
(Arnaldo Carvalho de Melo)
- Integration of perf with eBPF that, given an eBPF .c source file
(or .o file built for the 'bpf' target with clang), will get it
automatically built, validated and loaded into the kernel via the
sys_bpf syscall, which can then be used and seen using 'perf trace'
and other tools.
(Wang Nan)
Various user interface improvements:
- Automatic pager invocation on long help output. (Namhyung Kim)
- Search for more options when passing args to -h, e.g.: (Arnaldo
Carvalho de Melo)
$ perf report -h interface
Usage: perf report [<options>]
--gtk Use the GTK2 interface
--stdio Use the stdio interface
--tui Use the TUI interface
- Show ordered command line options when -h is used or when an
unknown option is specified. (Arnaldo Carvalho de Melo)
- If options are passed after -h, show just its descriptions, not all
options. (Arnaldo Carvalho de Melo)
- Implement column based horizontal scrolling in the hists browser
(top, report), making it possible to use the TUI for things like
'perf mem report' where there are many more columns than can fit in
a terminal. (Arnaldo Carvalho de Melo)
- Enhance the error reporting of tracepoint event parsing, e.g.:
$ oldperf record -e sched:sched_switc usleep 1
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Run 'perf list' for a list of valid events
Now we get the much nicer:
$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ can't access trace events
Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'
And after we have those mount point permissions fixed:
$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
I.e. basically now the event parsing routing uses the strerror_open()
routines introduced by and used in 'perf trace' work. (Jiri Olsa)
- Fail properly when pattern matching fails to find a tracepoint,
i.e. '-e non:existent' was being correctly handled, with a proper
error message about that not being a valid event, but '-e
non:existent*' wasn't, fix it. (Jiri Olsa)
- Do event name substring search as last resort in 'perf list'.
(Arnaldo Carvalho de Melo)
E.g.:
# perf list clock
List of pre-defined events (to be used in -e):
cpu-clock [Software event]
task-clock [Software event]
uncore_cbox_0/clockticks/ [Kernel PMU event]
uncore_cbox_1/clockticks/ [Kernel PMU event]
kvm:kvm_pvclock_update [Tracepoint event]
kvm:kvm_update_master_clock [Tracepoint event]
power:clock_disable [Tracepoint event]
power:clock_enable [Tracepoint event]
power:clock_set_rate [Tracepoint event]
syscalls:sys_enter_clock_adjtime [Tracepoint event]
syscalls:sys_enter_clock_getres [Tracepoint event]
syscalls:sys_enter_clock_gettime [Tracepoint event]
syscalls:sys_enter_clock_nanosleep [Tracepoint event]
syscalls:sys_enter_clock_settime [Tracepoint event]
syscalls:sys_exit_clock_adjtime [Tracepoint event]
syscalls:sys_exit_clock_getres [Tracepoint event]
syscalls:sys_exit_clock_gettime [Tracepoint event]
syscalls:sys_exit_clock_nanosleep [Tracepoint event]
syscalls:sys_exit_clock_settime [Tracepoint event]
Intel PT hardware tracing enhancements:
- Accept a zero --itrace period, meaning "as often as possible". In
the case of Intel PT that is the same as a period of 1 and a unit
of 'instructions' (i.e. --itrace=i1i). (Adrian Hunter)
- Harmonize itrace's synthesized callchains with the existing
--max-stack tool option. (Adrian Hunter)
- Allow time to be displayed in nanoseconds in 'perf script'.
(Adrian Hunter)
- Fix potential infinite loop when handling Intel PT timestamps.
(Adrian Hunter)
- Slighly improve Intel PT debug logging. (Adrian Hunter)
- Warn when AUX data has been lost, just like when processing
PERF_RECORD_LOST. (Adrian Hunter)
- Further document export-to-postgresql.py script. (Adrian Hunter)
- Add option to synthesize branch stack from auxtrace data. (Adrian
Hunter)
Misc notable changes:
- Switch the default callchain output mode to 'graph,0.5,caller', to
make it look like the default for other tools, reducing the
learning curve for people used to 'caller' based viewing. (Arnaldo
Carvalho de Melo)
- various call chain usability enhancements. (Namhyung Kim)
- Introduce the 'P' event modifier, meaning 'max precision level,
please', i.e.:
$ perf record -e cycles:P usleep 1
Is now similar to:
$ perf record usleep 1
Useful, for instance, when specifying multiple events. (Jiri Olsa)
- Add 'socket' sort entry, to sort by the processor socket in 'perf
top' and 'perf report'. (Kan Liang)
- Introduce --socket-filter to 'perf report', for filtering by
processor socket. (Kan Liang)
- Add new "Zoom into Processor Socket" operation in the perf hists
browser, used in 'perf top' and 'perf report'. (Kan Liang)
- Allow probing on kmodules without DWARF. (Masami Hiramatsu)
- Fix 'perf probe -l' for probes added to kernel module functions.
(Masami Hiramatsu)
- Preparatory work for the 'perf stat record' feature that will allow
generating perf.data files with counting data in addition to the
sampling mode we have now (Jiri Olsa)
- Update libtraceevent KVM plugin. (Paolo Bonzini)
- ... plus lots of other enhancements that I failed to list properly,
by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He
Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan
Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim,
Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane
Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang
Nan, Yang Shi and Yunlong Song"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits)
perf unwind: Pass symbol source to libunwind
tools build: Fix libiberty feature detection
perf tools: Compile scriptlets to BPF objects when passing '.c' to --event
perf record: Add clang options for compiling BPF scripts
perf bpf: Attach eBPF filter to perf event
perf tools: Make sure fixdep is built before libbpf
perf script: Enable printing of branch stack
perf trace: Add cmd string table to decode sys_bpf first arg
perf bpf: Collect perf_evsel in BPF object files
perf tools: Load eBPF object into kernel
perf tools: Create probe points for BPF programs
perf tools: Enable passing bpf object file to --event
perf ebpf: Add the libbpf glue
perf tools: Make perf depend on libbpf
perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore
perf tools: Enable pre-event inherit setting by config terms
perf symbols: we can now read separate debug-info files based on a build ID
perf symbols: Fix type error when reading a build-id
perf tools: Search for more options when passing args to -h
perf stat: Cache aggregated map entries in extra cpumap
...
This seems to be a mis-reading of how alpha memory ordering works, and
is not backed up by the alpha architecture manual. The helper functions
don't do anything special on any other architectures, and the arguments
that support them being safe on other architectures also argue that they
are safe on alpha.
Basically, the "control dependency" is between a previous read and a
subsequent write that is dependent on the value read. Even if the
subsequent write is actually done speculatively, there is no way that
such a speculative write could be made visible to other cpu's until it
has been committed, which requires validating the speculation.
Note that most weakely ordered architectures (very much including alpha)
do not guarantee any ordering relationship between two loads that depend
on each other on a control dependency:
read A
if (val == 1)
read B
because the conditional may be predicted, and the "read B" may be
speculatively moved up to before reading the value A. So we require the
user to insert a smp_rmb() between the two accesses to be correct:
read A;
if (A == 1)
smp_rmb()
read B
Alpha is further special in that it can break that ordering even if the
*address* of B depends on the read of A, because the cacheline that is
read later may be stale unless you have a memory barrier in between the
pointer read and the read of the value behind a pointer:
read ptr
read offset(ptr)
whereas all other weakly ordered architectures guarantee that the data
dependency (as opposed to just a control dependency) will order the two
accesses. As a result, alpha needs a "smp_read_barrier_depends()" in
between those two reads for them to be ordered.
The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
had was a control dependency to a subsequent *write*, however, and
nobody can finalize such a subsequent write without having actually done
the read. And were you to write such a value to a "stale" cacheline
(the way the unordered reads came to be), that would seem to lose the
write entirely.
So the things that make alpha able to re-order reads even more
aggressively than other weak architectures do not seem to be relevant
for a subsequent write. Alpha memory ordering may be strange, but
there's no real indication that it is *that* strange.
Also, the alpha architecture reference manual very explicitly talks
about the definition of "Dependence Constraints" in section 5.6.1.7,
where a preceding read dominates a subsequent write.
Such a dependence constraint admittedly does not impose a BEFORE (alpha
architecture term for globally visible ordering), but it does guarantee
that there can be no "causal loop". I don't see how you could avoid
such a loop if another cpu could see the stored value and then impact
the value of the first read. Put another way: the read and the write
could not be seen as being out of order wrt other cpus.
So I do not see how these "x_ctrl()" functions can currently be necessary.
I may have to eat my words at some point, but in the absense of clear
proof that alpha actually needs this, or indeed even an explanation of
how alpha could _possibly_ need it, I do not believe these functions are
called for.
And if it turns out that alpha really _does_ need a barrier for this
case, that barrier still should not be "smp_read_barrier_depends()".
We'd have to make up some new speciality barrier just for alpha, along
with the documentation for why it really is necessary.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul E McKenney <paulmck@us.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull locking changes from Ingo Molnar:
"The main changes in this cycle were:
- More gradual enhancements to atomic ops: new atomic*_read_ctrl()
ops, synchronize atomic_{read,set}() ordering requirements between
architectures, add atomic_long_t bitops. (Peter Zijlstra)
- Add _{relaxed|acquire|release}() variants for inc/dec atomics and
use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
This enables weakly ordered architectures (such as arm64) to make
use of more locking related optimizations. (Davidlohr Bueso)
- Implement atomic[64]_{inc,dec}_relaxed() on ARM. (Will Deacon)
- Futex kernel data cache footprint micro-optimization. (Rasmus
Villemoes)
- pvqspinlock runtime overhead micro-optimization. (Waiman Long)
- misc smaller fixlets"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
locking/rwsem: Use acquire/release semantics
locking/mcs: Use acquire/release semantics
locking/rtmutex: Use acquire/release semantics
locking/mutex: Use acquire/release semantics
locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
atomic: Implement atomic_read_ctrl()
atomic, arch: Audit atomic_{read,set}()
atomic: Add atomic_long_t bitops
futex: Force hot variables into a single cache line
locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
locking/osq: Relax atomic semantics
locking/qrwlock: Rename ->lock to ->wait_lock
locking/Documentation/lockstat: Fix typo - lokcing -> locking
locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
Pull RCU changes from Ingo Molnar:
"The main changes in this cycle were:
- Improvements to expedited grace periods (Paul E McKenney)
- Performance improvements to and locktorture tests for percpu-rwsem
(Oleg Nesterov, Paul E McKenney)
- Torture-test changes (Paul E McKenney, Davidlohr Bueso)
- Documentation updates (Paul E McKenney)
- Miscellaneous fixes (Paul E McKenney, Boqun Feng, Oleg Nesterov,
Patrick Marlier)"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
rcu: Better hotplug handling for synchronize_sched_expedited()
rcu: Enable stall warnings for synchronize_rcu_expedited()
rcu: Add tasks to expedited stall-warning messages
rcu: Add online/offline info to expedited stall warning message
rcu: Consolidate expedited CPU selection
rcu: Prepare for consolidating expedited CPU selection
cpu: Remove try_get_online_cpus()
rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()
rcu: Stop silencing lockdep false positive for expedited grace periods
rcu: Switch synchronize_sched_expedited() to IPI
locktorture: Fix module unwind when bad torture_type specified
torture: Forgive non-plural arguments
rcutorture: Fix unused-function warning for torturing_tasks()
rcutorture: Fix module unwind when bad torture_type specified
rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
locking/percpu-rwsem: Clean up the lockdep annotations in percpu_down_read()
locking/percpu-rwsem: Fix the comments outdated by rcu_sync
locking/percpu-rwsem: Make use of the rcu_sync infrastructure
locking/percpu-rwsem: Make percpu_free_rwsem() after kzalloc() safe
...
Pull EFI changes from Ingo Molnar:
"The main changes in this cycle were:
- further EFI code generalization to make it more workable for ARM64
- various extensions, such as 64-bit framebuffer address support,
UEFI v2.5 EFI_PROPERTIES_TABLE support
- code modularization simplifications and cleanups
- new debugging parameters
- various fixes and smaller additions"
* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
efi: Fix warning of int-to-pointer-cast on x86 32-bit builds
efi: Use correct type for struct efi_memory_map::phys_map
x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled
efi: Add "efi_fake_mem" boot option
x86/efi: Rename print_efi_memmap() to efi_print_memmap()
efi: Auto-load the efi-pstore module
efi: Introduce EFI_NX_PE_DATA bit and set it from properties table
efi: Add support for UEFIv2.5 Properties table
efi: Add EFI_MEMORY_MORE_RELIABLE support to efi_md_typeattr_format()
efifb: Add support for 64-bit frame buffer addresses
efi/arm64: Clean up efi_get_fdt_params() interface
arm64: Use core efi=debug instead of uefi_debug command line parameter
efi/x86: Move efi=debug option parsing to core
drivers/firmware: Make efi/esrt.c driver explicitly non-modular
efi: Use the generic efi.memmap instead of 'memmap'
acpi/apei: Use appropriate pgprot_t to map GHES memory
arm64, acpi/apei: Implement arch_apei_get_mem_attributes()
arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT
acpi, x86: Implement arch_apei_get_mem_attributes()
efi, x86: Rearrange efi_mem_attributes()
...
Pull irq updates from Thomas Gleixner:
"The irq departement delivers:
- Rework the irqdomain core infrastructure to accomodate ACPI based
systems. This is required to support ARM64 without creating
artificial device tree nodes.
- Sanitize the ACPI based ARM GIC initialization by making use of the
new firmware independent irqdomain core
- Further improvements to the generic MSI management
- Generalize the irq migration on CPU hotplug
- Improvements to the threaded interrupt infrastructure
- Allow the migration of "chained" low level interrupt handlers
- Allow optional force masking of interrupts in disable_irq[_nosysnc]
- Support for two new interrupt chips - Sigh!
- A larger set of errata fixes for ARM gicv3
- The usual pile of fixes, updates, improvements and cleanups all
over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
Document that IRQ_NONE should be returned when IRQ not actually handled
PCI/MSI: Allow the MSI domain to be device-specific
PCI: Add per-device MSI domain hook
of/irq: Use the msi-map property to provide device-specific MSI domain
of/irq: Split of_msi_map_rid to reuse msi-map lookup
irqchip/gic-v3-its: Parse new version of msi-parent property
PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
of/irq: Use of_msi_get_domain instead of open-coded "msi-parent" parsing
of/irq: Add support code for multi-parent version of "msi-parent"
irqchip/gic-v3-its: Add handling of PCI requester id.
PCI/MSI: Add helper function pci_msi_domain_get_msi_rid().
of/irq: Add new function of_msi_map_rid()
Docs: dt: Add PCI MSI map bindings
irqchip/gic-v2m: Add support for multiple MSI frames
irqchip/gic-v3: Fix translation of LPIs after conversion to irq_fwspec
irqchip/mxs: Add Alphascale ASM9260 support
irqchip/mxs: Prepare driver for hardware with different offsets
irqchip/mxs: Panic if ioremap or domain creation fails
irqdomain: Documentation updates
irqdomain/msi: Use fwnode instead of of_node
...
Pull timer updates from Thomas Gleixner:
"The timer departement provides:
- More y2038 work in the area of ntp and pps.
- Optimization of posix cpu timers
- New time related selftests
- Some new clocksource drivers
- The usual pile of fixes, cleanups and improvements"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
timeconst: Update path in comment
timers/x86/hpet: Type adjustments
clocksource/drivers/armada-370-xp: Implement ARM delay timer
clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
clocksource/drivers/imx: Allow timer irq affinity change
clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
clocksource/drivers/h8300_*: Remove unneeded memset()s
clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
clocksource/drivers/em_sti: Remove unneeded memset()s
clocksource/drivers/mediatek: Use GPT as sched clock source
clockevents/drivers/mtk: Fix spurious interrupt leading to crash
posix_cpu_timer: Reduce unnecessary sighand lock contention
posix_cpu_timer: Convert cputimer->running to bool
posix_cpu_timer: Check thread timers only when there are active thread timers
posix_cpu_timer: Optimize fastpath_timer_check()
timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
timers: Use __fls in apply_slack()
clocksource: Remove return statement from void functions
net: sfc: avoid using timespec
ntp/pps: use y2038 safe types in pps_event_time
...
Pull ARM updates from Russell King:
"In this ARM merge, we remove more lines than we add. Changes include:
- Enable imprecise aborts early, so that bus errors aren't masked
until later in the boot. This has the side effect that boot
loaders which provoke these aborts can cause the kernel to crash
early in boot, so we install a handler to report this event around
the site where these are enabled.
- Remove the buggy but impossible to enable cmpxchg syscall code.
- Add unwinding annotations to some assembly code.
- Add support for atomic half-word exchange for ARMv6k+.
- Reduce ioremap() alignment for SMP/LPAE cases where we don't need
the large alignment.
- Addition of an "optimal" 3G configuration for systems with 1G of
RAM.
- Increase vmalloc space by 128M.
- Constify some SMP operations structures, which have never been
writable.
- Improve ARMs dma_mmap() support for mapping DMA coherent mappings
into userspace.
- Fix to the NMI backtrace code in the IPI case on ARM where the
failing CPU gets stuck for 10s waiting for its own IPI to be
delivered.
- Removal of legacy PM support from the AMBA bus driver.
- Another fix for the previous fix of vdsomunge"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (23 commits)
ARM: 8449/1: fix bug in vdsomunge swab32 macro
arm: add missing of_node_put
ARM: 8447/1: catch pending imprecise abort on unmask
ARM: 8446/1: amba: Remove unused callbacks for legacy system PM
ARM: 8443/1: Adding support for atomic half word exchange
ARM: clean up TWD after previous patch
ARM: 8441/2: twd: Don't set CLOCK_EVT_FEAT_C3STOP unconditionally
ARM: 8440/1: remove obsolete documentation
ARM: make highpte an expert option
ARM: 8433/1: add a VMSPLIT_3G_OPT config option
ARM: 8439/1: Fix backtrace generation when IPI is masked
ARM: 8428/1: kgdb: Fix registers on sleeping tasks
ARM: 8427/1: dma-mapping: add support for offset parameter in dma_mmap()
ARM: 8426/1: dma-mapping: add missing range check in dma_mmap()
ARM: remove user cmpxchg syscall
ARM: 8438/1: Add unwinding to __clear_user_std()
ARM: 8436/1: hw_breakpoint: remove unnecessary header
ARM: 8434/2: Revert "7655/1: smp_twd: make twd_local_timer_of_register() no-op for nosmp"
ARM: 8432/1: move VMALLOC_END from 0xff000000 to 0xff800000
ARM: 8430/1: use default ioremap alignment for SMP or LPAE
...
Pull LED updates from Jacek Anaszewski:
- Move the out-of-LED-tree led-sead3 driver to the LED subsystem.
- Add 'invert' sysfs attribute to the heartbeat trigger.
- Add Device Tree support to the leds-netxbig driver and add related DT
nodes to the kirkwood-netxbig.dtsi and kirkwood-net5big.dts files.
Remove static LED setup from the related board files.
- Remove redundant brightness conversion operation from leds-netxbig.
- Improve leds-bcm6328 driver: improve default-state handling, add more
init configuration options, print invalid LED instead of warning only
about maximum LED value.
- Add a shutdown function for setting gpio-leds into off state when
shutting down.
- Fix DT flash timeout property naming in leds-aat1290.txt.
- Switch to using devm prefixed version of led_classdev_register()
(leds-cobalt-qube, leds-hp6xx, leds-ot200, leds-ipaq-micro,
leds-netxbig, leds-locomo, leds-menf21bmc, leds-net48xx, leds-wrap).
- Add missing of_node_put (leds-powernv, leds-bcm6358, leds-bcm6328,
leds-88pm860x).
- Coding style fixes and cleanups: led-class/led-core, leds-ipaq-micro.
* tag 'leds_for_4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (27 commits)
leds: 88pm860x: add missing of_node_put
leds: bcm6328: add missing of_node_put
leds: bcm6358: add missing of_node_put
powerpc/powernv: add missing of_node_put
leds: leds-wrap.c: Use devm_led_classdev_register
leds: aat1290: Fix property naming of flash-timeout-us
leds: leds-net48xx: Use devm_led_classdev_register
leds: leds-menf21bmc.c: Use devm_led_class_register
leds: leds-locomo.c: Use devm_led_classdev_register
leds: leds-gpio: add shutdown function
Documentation: leds: update DT bindings for leds-bcm6328
leds-bcm6328: add more init configuration options
leds-bcm6328: simplify and improve default-state handling
leds-bcm6328: print invalid LED
leds: netxbig: set led_classdev max_brightness
leds: netxbig: convert to use the devm_ functions
ARM: mvebu: remove static LED setup for netxbig boards
ARM: Kirkwood: add LED DT entries for netxbig boards
leds: netxbig: add device tree binding
leds: triggers: add invert to heartbeat
...
Now that max_stack_lock is a global variable, it requires a naming
convention that is unlikely to collide. Rename it to the same naming
convention that the other stack_trace variables have.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A stack frame may be used in a different way depending on cpu architecture.
Thus it is not always appropriate to slurp the stack contents, as current
check_stack() does, in order to calcurate a stack index (height) at a given
function call. At least not on arm64.
In addition, there is a possibility that we will mistakenly detect a stale
stack frame which has not been overwritten.
This patch makes check_stack() a weak function so as to later implement
arch-specific version.
Link: http://lkml.kernel.org/r/1446182741-31019-5-git-send-email-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The arguments passed around for getacl and setacl xdr encoding, struct
nfs_setaclargs and struct nfs_getaclargs, both contain an array of
pages, an offset into the first page, and the length of the page data.
The offset is unused as it is always zero; remove it.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
For cases where CONFIG_LBDAF is not set. The struct ppa_addr exceeds its
type on 32 bit architectures. ppa_addr requires a 64bit integer to hold
the generic ppa format. We therefore refactor it to u64 and
replaces the sector_t usages with u64 for physical addresses.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
As pointed out by Nikolay and further explained by Geert, the initial
for_each_netdev_feature macro was broken, as feature would get set outside
of the block of code it was intended to run in, thus only ever working for
the first feature bit in the mask. While less pretty this way, this is
tested and confirmed functional with multiple feature bits set in
NETIF_F_UPPER_DISABLES.
[root@dell-per730-01 ~]# ethtool -K bond0 lro off
...
[ 242.761394] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[ 243.552178] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83
[ 244.353978] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[ 245.147420] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71
[root@dell-per730-01 ~]# ethtool -K bond0 gro off
...
[ 251.925645] bond0: Disabling feature 0x0000000000004000 on lower dev p5p2.
[ 252.713693] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83
[ 253.499085] bond0: Disabling feature 0x0000000000004000 on lower dev p5p1.
[ 254.290922] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71
Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the definition of PTP_CLASS_L2 to not have any bits overlapping with
the other defined protocol values, allowing the PTP_CLASS_* definitions to
be for simple filtering on packet type.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* pci/host-altera:
PCI: altera: Add Altera PCIe MSI driver
PCI: altera: Add Altera PCIe host controller driver
ARM: Add msi.h to Kbuild
* pci/host-designware:
PCI: designware: Make "clocks" and "clock-names" optional DT properties
PCI: designware: Make driver arch-agnostic
ARM/PCI: Replace pci_sys_data->align_resource with global function pointer
PCI: designware: Use of_pci_get_host_bridge_resources() to parse DT
Revert "PCI: designware: Program ATU with untranslated address"
PCI: designware: Move calculation of bus addresses to DRA7xx
PCI: designware: Make "num-lanes" an optional DT property
PCI: designware: Require config accesses to be naturally aligned
PCI: designware: Simplify dw_pcie_cfg_read/write() interfaces
PCI: designware: Use exact access size in dw_pcie_cfg_read()
PCI: spear: Fix dw_pcie_cfg_read/write() usage
PCI: designware: Set up high part of MSI target address
PCI: designware: Make get_msi_addr() return phys_addr_t, not u32
PCI: designware: Implement multivector MSI IRQ setup
PCI: designware: Factor out MSI msg setup
PCI: Add msi_controller setup_irqs() method for special multivector setup
PCI: designware: Fix PORT_LOGIC_LINK_WIDTH_MASK
* pci/host-generic:
PCI: generic: Fix address window calculation for non-zero starting bus
PCI: generic: Pass starting bus number to pci_scan_root_bus()
PCI: generic: Allow multiple hosts with different map_bus() methods
arm64: dts: Drop linux,pci-probe-only from the Seattle DTS
powerpc/PCI: Fix lookup of linux,pci-probe-only property
PCI: generic: Fix lookup of linux,pci-probe-only property
of/pci: Add of_pci_check_probe_only to parse "linux,pci-probe-only"
* pci/host-imx6:
PCI: imx6: Add PCIE_PHY_RX_ASIC_OUT_VALID definition
PCI: imx6: Return real error code from imx6_add_pcie_port()
* pci/host-iproc:
PCI: iproc: Fix header comment "Corporation" misspelling
PCI: iproc: Add outbound mapping support
PCI: iproc: Update PCIe device tree bindings
PCI: iproc: Improve link detection logic
PCI: iproc: Fix PCIe reset logic
PCI: iproc: Call pci_fixup_irqs() for ARM64 as well as ARM
PCI: iproc: Remove unused struct iproc_pcie.irqs[]
PCI: iproc: Fix code comment to match code
* pci/host-mvebu:
PCI: mvebu: Remove code restricting accesses to slot 0
PCI: mvebu: Add PCI Express root complex capability block
PCI: mvebu: Improve clock/reset handling
PCI: mvebu: Use gpio_desc to carry around gpio
PCI: mvebu: Use devm_kcalloc() to allocate an array
PCI: mvebu: Use gpio_set_value_cansleep()
PCI: mvebu: Split port parsing and resource claiming from port setup
PCI: mvebu: Fix memory leaks and refcount leaks
PCI: mvebu: Move port parsing and resource claiming to separate function
PCI: mvebu: Use port->name rather than "PCIe%d.%d"
PCI: mvebu: Report full node name when reporting a DT error
PCI: mvebu: Use for_each_available_child_of_node() to walk child nodes
PCI: mvebu: Use of_get_available_child_count()
PCI: mvebu: Use exact config access size; don't read/modify/write
PCI: mvebu: Return zero for reserved or unimplemented config space
* pci/host-rcar:
PCI: rcar: Fix I/O offset for multiple host bridges
PCI: rcar: Set root bus nr to that provided in DT
PCI: rcar: Remove dependency on ARM-specific struct hw_pci
PCI: rcar: Make PCI aware of the I/O resources
PCI: rcar: Build pcie-rcar.c only on ARM
PCI: rcar: Build pci-rcar-gen2.c only on ARM
* pci/host-tegra:
PCI: tegra: Wrap static pgprot_t initializer with __pgprot()
* pci/host-xgene:
PCI/MSI: xgene: Remove msi_controller assignment
Up to now, a new timeout value is only evaluated against min_timeout
if max_timeout is provided. This does not really make sense; a driver
can have a minimum timeout even if it does not have a maximum timeout.
Ensure that it is not smaller than min_timeout, even if max_timeout
is not set.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
This patch adds device tree support for the netxbig LEDs.
This also introduces a additionnal DT binding for the GPIO extension bus
(netxbig-gpio-ext) used to configure the LEDs. Since this bus could also
be used to control other devices, then it seems more suitable to have it
in a separate DT binding.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
There are some netdev features, which when disabled on an upper device,
such as a bonding master or a bridge, must be disabled and cannot be
re-enabled on underlying devices.
This is a rework of an earlier more heavy-handed appraoch, which simply
disables and prevents re-enabling of netdev features listed in a new
define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper
device that disables a flag in that feature mask, the disabling will
propagate down the stack, and any lower device that has any upper device
with one of those flags disabled should not be able to enable said flag.
Initially, only LRO is included for proof of concept, and because this
code effectively does the same thing as dev_disable_lro(), though it will
also activate from the ethtool path, which was one of the goals here.
[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -K bond0 lro off
[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: off
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: off
dmesg dump:
[ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83
[ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71
This has been successfully tested with bnx2x, qlcnic and netxen network
cards as slaves in a bond interface. Turning LRO on or off on the master
also turns it on or off on each of the slaves, new slaves are added with
LRO in the same state as the master, and LRO can't be toggled on the
slaves.
Also, this should largely remove the need for dev_disable_lro(), and most,
if not all, of its call sites can be replaced by simply making sure
NETIF_F_LRO isn't included in the relevant device's feature flags.
Note that this patch is driven by bug reports from users saying it was
confusing that bonds and slaves had different settings for the same
features, and while it won't be 100% in sync if a lower device doesn't
support a feature like LRO, I think this is a good step in the right
direction.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work adds support for "persistent" eBPF maps/programs. The term
"persistent" is to be understood that maps/programs have a facility
that lets them survive process termination. This is desired by various
eBPF subsystem users.
Just to name one example: tc classifier/action. Whenever tc parses
the ELF object, extracts and loads maps/progs into the kernel, these
file descriptors will be out of reach after the tc instance exits.
So a subsequent tc invocation won't be able to access/relocate on this
resource, and therefore maps cannot easily be shared, f.e. between the
ingress and egress networking data path.
The current workaround is that Unix domain sockets (UDS) need to be
instrumented in order to pass the created eBPF map/program file
descriptors to a third party management daemon through UDS' socket
passing facility. This makes it a bit complicated to deploy shared
eBPF maps or programs (programs f.e. for tail calls) among various
processes.
We've been brainstorming on how we could tackle this issue and various
approches have been tried out so far, which can be read up further in
the below reference.
The architecture we eventually ended up with is a minimal file system
that can hold map/prog objects. The file system is a per mount namespace
singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
mounts within a given namespace will point to the same instance. The
file system allows for creating a user-defined directory structure.
The objects for maps/progs are created/fetched through bpf(2) with
two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
along with a pathname is being passed to bpf(2) that in turn creates
(we call it eBPF object pinning) the file system nodes. Only the pathname
is being passed to bpf(2) for getting a new BPF file descriptor to an
existing node. The user can use that to access maps and progs later on,
through bpf(2). Removal of file system nodes is being managed through
normal VFS functions such as unlink(2), etc. The file system code is
kept to a very minimum and can be further extended later on.
The next step I'm working on is to add dump eBPF map/prog commands
to bpf(2), so that a specification from a given file descriptor can
be retrieved. This can be used by things like CRIU but also applications
can inspect the meta data after calling BPF_OBJ_GET.
Big thanks also to Alexei and Hannes who significantly contributed
in the design discussion that eventually let us end up with this
architecture here.
Reference: https://lkml.org/lkml/2015/10/15/925
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a bpf_map_get() function that we're going to use later on and
align/clean the remaining helpers a bit so that we have them a bit
more consistent:
- __bpf_map_get() and __bpf_prog_get() that both work on the fd
struct, check whether the descriptor is eBPF and return the
pointer to the map/prog stored in the private data.
Also, we can return f.file->private_data directly, the function
signature is enough of a documentation already.
- bpf_map_get() and bpf_prog_get() that both work on u32 user fd,
call their respective __bpf_map_get()/__bpf_prog_get() variants,
and take a reference.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull regmap updates from Mark Brown:
"Quite a few new features for regmap this time, mostly expanding things
around the edges of the existing functionality to cover more devices
rather than thinsg with wide applicability:
- Support for offload of the update_bits() operation to hardware
where devices implement bit level access.
- Support for a few extra operations that need scratch buffers on
fast_io devices where we can't sleep.
- Expanded the feature set of regmap_irq to cope with some extra
register layouts.
- Cleanups to the debugfs code"
* tag 'regmap-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: Allow installing custom reg_update_bits function
regmap: debugfs: simplify regmap_reg_ranges_read_file() slightly
regmap: debugfs: use memcpy instead of snprintf
regmap: debugfs: use snprintf return value in regmap_reg_ranges_read_file()
regmap: Add generic macro to define regmap_irq
regmap: debugfs: Remove scratch buffer for register length calculation
regmap: irq: add ack_invert flag for chips using cleared bits as ack
regmap: irq: add support for chips who have separate unmask registers
regmap: Allocate buffers with GFP_ATOMIC when fast_io == true
Support for message signing was merged into 3.19, along with
nocephx_require_signatures option. But, all that option does is allow
the kernel client to talk to clusters that don't support MSG_AUTH
feature bit. That's pretty useless, given that it's been supported
since bobtail.
Meanwhile, if one disables message signing on the server side with
"cephx sign messages = false", it becomes impossible to use the kernel
client since it expects messages to be signed if MSG_AUTH was
negotiated. Add nocephx_sign_messages option to support this use case.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
supported_features and required_features serve no purpose at all, while
nocrc and tcp_nodelay belong to ceph_options::flags.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
We can use msg->con instead - at the point we sign an outgoing message
or check the signature on the incoming one, msg->con is always set. We
wouldn't know how to sign a message without an associated session (i.e.
msg->con == NULL) and being able to sign a message using an explicitly
provided authorizer is of no use.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>