Pull nfsd updates from Bruce Fields:
"Various bugfixes, a RDMA update from Chuck Lever, and support for a
new pnfs layout type from Christoph Hellwig. The new layout type is a
variant of the block layout which uses SCSI features to offer improved
fencing and device identification.
(Also: note this pull request also includes the client side of SCSI
layout, with Trond's permission.)"
* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
nfsd: recover: fix memory leak
nfsd: fix deadlock secinfo+readdir compound
nfsd4: resfh unused in nfsd4_secinfo
svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
svcrdma: Remove close_out exit path
svcrdma: Hook up the logic to return ERR_CHUNK
svcrdma: Use correct XID in error replies
svcrdma: Make RDMA_ERROR messages work
rpcrdma: Add RPCRDMA_HDRLEN_ERR
svcrdma: svc_rdma_post_recv() should close connection on error
svcrdma: Close connection when a send error occurs
nfsd: Lower NFSv4.1 callback message size limit
svcrdma: Do not send Write chunk XDR pad with inline content
svcrdma: Do not write xdr_buf::tail in a Write chunk
svcrdma: Find client-provided write and reply chunks once per reply
nfsd: Update NFS server comments related to RDMA support
nfsd: Fix a memory leak when meeting unsupported state_protect_how4
nfsd4: fix bad bounds checking
Pull perf fixes from Ingo Molnar:
"This tree contains various perf fixes on the kernel side, plus three
hw/event-enablement late additions:
- Intel Memory Bandwidth Monitoring events and handling
- the AMD Accumulated Power Mechanism reporting facility
- more IOMMU events
... and a final round of perf tooling updates/fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf llvm: Use strerror_r instead of the thread unsafe strerror one
perf llvm: Use realpath to canonicalize paths
perf tools: Unexport some methods unused outside strbuf.c
perf probe: No need to use formatting strbuf method
perf help: Use asprintf instead of adhoc equivalents
perf tools: Remove unused perf_pathdup, xstrdup functions
perf tools: Do not include stringify.h from the kernel sources
tools include: Copy linux/stringify.h from the kernel
tools lib traceevent: Remove redundant CPU output
perf tools: Remove needless 'extern' from function prototypes
perf tools: Simplify die() mechanism
perf tools: Remove unused DIE_IF macro
perf script: Remove lots of unused arguments
perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
perf machine: Rename perf_event__preprocess_sample to machine__resolve
perf tools: Add cpumode to struct perf_sample
perf tests: Forward the perf_sample in the dwarf unwind test
perf tools: Remove misplaced __maybe_unused
perf list: Fix documentation of :ppp
perf bench numa: Fix assertion for nodes bitfield
...
Pull networking bugfixes from David Miller:
"Several bug fixes rolling in, some for changes introduced in this
merge window, and some for problems that have existed for some time:
1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda.
2) The new DST_CACHE should be a silent config option, from Dave
Jones.
3) inet_current_timestamp() unintentionally truncates timestamps to
16-bit, from Deepa Dinamani.
4) Missing reference to netns in ppp, from Guillaume Nault.
5) Free memory reference in hv_netvsc driver, from Haiyang Zhang.
6) Missing kernel doc documentation for function arguments in various
spots around the networking, from Luis de Bethencourt.
7) UDP stopped receiving broadcast packets properly, due to
overzealous multicast checks, fix from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
net: ping: make ping_v6_sendmsg static
hv_netvsc: Fix the order of num_sc_offered decrement
net: Fix typos and whitespace.
hv_netvsc: Fix the array sizes to be max supported channels
hv_netvsc: Fix accessing freed memory in netvsc_change_mtu()
ppp: take reference on channels netns
net: Reset encap_level to avoid resetting features on inner IP headers
net: mediatek: fix checking for NULL instead of IS_ERR() in .probe
net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY
at803x: fix reset handling
AF_VSOCK: Shrink the area influenced by prepare_to_wait
Revert "vsock: Fix blocking ops call in prepare_to_wait"
macb: fix PHY reset
ipv4: initialize flowi4_flags before calling fib_lookup()
fsl/fman: Workaround for Errata A-007273
ipv4: fix broadcast packets reception
net: hns: bug fix about the overflow of mss
net: hns: adds limitation for debug port mtu
net: hns: fix the bug about mtu setting
net: hns: fixes a bug of RSS
...
Bunch of small fixupes all over. Plus a dma-buf patch that Sumit asked me
to cherry-pick since that's the only one he had in his tree.
There's a sparse issue outstanding in the color mgr stuff, but Lionel is
still working on something that actually appeases sparse.
* tag 'topic/drm-misc-2016-03-22' of git://anongit.freedesktop.org/drm-intel:
dma-buf/fence: fix fence_is_later v2
dma-buf: Update docs for SYNC ioctl
drm: remove excess description
dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access()
drm/atmel-hlcdc: use helper to get crtc state
drm/atomic: use helper to get crtc state
Fix typos. Capitalize CPU, NAPI, RCU consistently. Align structure
indentation. No functional change intended; only comment and whitespace
changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On parisc and metag the stack grows upwards, so for those we need to
scan the stack downwards in order to calculate how much stack a process
has used.
Tested on a 64bit parisc kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
Pull more mailbox updates from Jassi Brar:
"Device tree bindings and driver for TI's Message-Manager controller.
Due to some last minute cosmetic changes, the driver was not included
in the first pull request, otherwise the driver has been reviewed
twice"
* 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: Introduce TI message manager driver
Documentation: dt: mailbox: Add TI Message Manager
Pull clk updates from Stephen Boyd:
"The clk changes for this release cycle are mostly dominated by new
device support in terms of LoC, but there has been some cleanup in the
core as well as the usual minor clk additions to various drivers.
Core:
- parent tracking has been simplified
- CLK_IS_ROOT is now a no-op flag, cleaning up drivers has started
- of_clk_init() doesn't consider disabled DT nodes anymore
- clk_unregister() had an error path bug squashed
- of_clk_get_parent_count() has been fixed to only return unsigned ints
- HAVE_MACH_CLKDEV is removed now that the last arch user (ARM) is gone
New Drivers:
- NXP LPC18xx creg
- QCOM IPQ4019 GCC
- TI dm814x ADPLL
- i.MX6QP
Updates:
- Cyngus audio clks found on Broadcom iProc devices
- Non-critical fixes for BCM2385 PLLs
- Samsung exynos5433 updates for clk id errors, HDMI support,
suspend/resume simplifications
- USB, CAN, LVDS, and FCP clks on shmobile devices
- sunxi got support for more clks on new SoCs and went through a
minor refactoring/rewrite to use a simpler factor clk construct
- rockchip added some more clk ids and added suport for fraction
dividers
- QCOM GDSCs in msm8996
- A new devm helper to make adding custom actions simpler (acked by Greg)"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (197 commits)
clk: bcm2835: fix check of error code returned by devm_ioremap_resource()
clk: renesas: div6: use RENESAS for #define
clk: renesas: Rename header file renesas.h
clk: max77{686,802}: Remove CLK_IS_ROOT
clk: versatile: Remove CLK_IS_ROOT
clk: sunxi: Remove use of variable length array
clk: fixed-rate: Remove CLK_IS_ROOT
clk: qcom: Remove CLK_IS_ROOT
doc: dt: add documentation for lpc1850-creg-clk driver
clk: add lpc18xx creg clk driver
clk: lpc32xx: fix compilation warning
clk: xgene: Add missing parenthesis when clearing divider value
clk: mb86s7x: Remove CLK_IS_ROOT
clk: x86: Remove clkdev.h and clk.h includes
clk: x86: Remove CLK_IS_ROOT
clk: mvebu: Remove CLK_IS_ROOT
clk: renesas: move drivers to renesas directory
clk: si5{14,351,70}: Remove CLK_IS_ROOT
clk: scpi: Remove CLK_IS_ROOT
clk: s2mps11: Remove CLK_IS_ROOT
...
Merge third patch-bomb from Andrew Morton:
- more ocfs2 changes
- a few hotfixes
- Andy's compat cleanups
- misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
panic, ipmi, kgdb, profile, kfifo, ubsan, etc.
- many rapidio updates: fixes, new drivers.
- kcov: kernel code coverage feature. Like gcov, but not
"prohibitively expensive".
- extable code consolidation for various archs
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
ia64/extable: use generic search and sort routines
x86/extable: use generic search and sort routines
s390/extable: use generic search and sort routines
alpha/extable: use generic search and sort routines
kernel/...: convert pr_warning to pr_warn
drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
memremap: add MEMREMAP_WC flag
memremap: don't modify flags
kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
ipc/sem: make semctl setting sempid consistent
ubsan: fix tree-wide -Wmaybe-uninitialized false positives
kfifo: fix sparse complaints
scripts/gdb: account for changes in module data structure
scripts/gdb: add cmdline reader command
scripts/gdb: add version command
kernel: add kcov code coverage
profile: hide unused functions when !CONFIG_PROC_FS
hpwdt: use nmi_panic() when kernel panics in NMI handler
...
Pull more rdma updates from Doug Ledford:
"Round two of 4.6 merge window patches.
This is a monster pull request. I held off on the hfi1 driver updates
(the hfi1 driver is intimately tied to the qib driver and the new
rdmavt software library that was created to help both of them) in my
first pull request. The hfi1/qib/rdmavt update is probably 90% of
this pull request. The hfi1 driver is being left in staging so that
it can be fixed up in regards to the API that Al and yourself didn't
like. Intel has agreed to do the work, but in the meantime, this
clears out 300+ patches in the backlog queue and brings my tree and
their tree closer to sync.
This also includes about 10 patches to the core and a few to mlx5 to
create an infrastructure for configuring SRIOV ports on IB devices.
That series includes one patch to the net core that we sent to netdev@
and Dave Miller with each of the three revisions to the series. We
didn't get any response to the patch, so we took that as implicit
approval.
Finally, this series includes Intel's new iWARP driver for their x722
cards. It's not nearly the beast as the hfi1 driver. It also has a
linux-next merge issue, but that has been resolved and it now passes
just fine.
Summary:
- A few minor core fixups needed for the next patch series
- The IB SRIOV series. This has bounced around for several versions.
Of note is the fact that the first patch in this series effects the
net core. It was directed to netdev and DaveM for each iteration
of the series (three versions total). Dave did not object, but did
not respond either. I've taken this as permission to move forward
with the series.
- The new Intel X722 iWARP driver
- A huge set of updates to the Intel hfi1 driver. Of particular
interest here is that we have left the driver in staging since it
still has an API that people object to. Intel is working on a fix,
but getting these patches in now helps keep me sane as the upstream
and Intel's trees were over 300 patches apart"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
IB/ipoib: Allow mcast packets from other VFs
IB/mlx5: Implement callbacks for manipulating VFs
net/mlx5_core: Implement modify HCA vport command
net/mlx5_core: Add VF param when querying vport counter
IB/ipoib: Add ndo operations for configuring VFs
IB/core: Add interfaces to control VF attributes
IB/core: Support accessing SA in virtualized environment
IB/core: Add subnet prefix to port info
IB/mlx5: Fix decision on using MAD_IFC
net/core: Add support for configuring VF GUIDs
IB/{core, ulp} Support above 32 possible device capability flags
IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
net/mlx5_core: Introduce offload arithmetic hardware capabilities
net/mlx5_core: Refactor device capability function
net/mlx5_core: Fix caching ATOMIC endian mode capability
ib_srpt: fix a WARN_ON() message
i40iw: Replace the obsolete crypto hash interface with shash
IB/hfi1: Add SDMA cache eviction algorithm
IB/hfi1: Switch to using the pin query function
IB/hfi1: Specify mm when releasing pages
...
Add a flag to memremap() for writecombine mappings. Mappings satisfied
by this flag will not be cached, however writes may be delayed or
combined into more efficient bursts. This is most suitable for buffers
written sequentially by the CPU for use by other DMA devices.
Signed-off-by: Brian Starkey <brian.starkey@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fix complaints by the sparse tool when using kfifo_put() with
non scalar types like structures (i.e.
drivers/iio/industrialio-event.c).
Casting a pointer to the value and read this pointer instead of directly
casting the value will fix this.
The generated code is equal.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.
kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).
Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.
This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.
We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:
https://github.com/google/syzkaller/wiki/Found-Bugs
We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.
Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.
kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.
Based on a patch by Quentin Casasnovas.
[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1717f2096b ("panic, x86: Fix re-entrance problem due to panic
on NMI") and commit 58c5661f21 ("panic, x86: Allow CPUs to save
registers even if looping in NMI context") introduced nmi_panic() which
prevents concurrent/recursive execution of panic(). It also saves
registers for the crash dump on x86.
However, there are some cases where NMI handlers still use panic().
This patch set partially replaces them with nmi_panic() in those cases.
Even this patchset is applied, some NMI or similar handlers (e.g. MCE
handler) continue to use panic(). This is because I can't test them
well and actual problems won't happen. For example, the possibility
that normal panic and panic on MCE happen simultaneously is very low.
This patch (of 3):
Convert nmi_panic() to a proper function and export it instead of
exporting internal implementation details to modules, for obvious
reasons.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current_user_ns() macro currently returns &init_user_ns when user
namespaces are disabled, and that causes several warnings when building
with gcc-6.0 in code that compares the result of the macro to
&init_user_ns itself:
fs/xfs/xfs_ioctl.c: In function 'xfs_ioctl_setattr_check_projid':
fs/xfs/xfs_ioctl.c:1249:22: error: self-comparison always evaluates to true [-Werror=tautological-compare]
if (current_user_ns() == &init_user_ns)
This is a legitimate warning in principle, but here it isn't really
helpful, so I'm reprasing the definition in a way that shuts up the
warning. Apparently gcc only warns when comparing identical literals,
but it can figure out that the result of an inline function can be
identical to a constant expression in order to optimize a condition yet
not warn about the fact that the condition is known at compile time.
This is exactly what we want here, and it looks reasonable because we
generally prefer inline functions over macros anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add mport character device driver to provide user space interface to
basic RapidIO subsystem operations.
See included Documentation/rapidio/mport_cdev.txt for more details.
[akpm@linux-foundation.org: fix printk warning on i386]
[dan.carpenter@oracle.com: mport_cdev: fix some error codes]
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add new Port Write handler registration interfaces that attach PW
handlers to local mport device objects. This is different from old
interface that attaches PW callback to individual RapidIO device. The
new interfaces are intended for use for common event handling (e.g.
hot-plug notifications) while the old interface is available for
individual device drivers.
This patch is based on patch proposed by Andre van Herk but preserves
existing per-device interface and adds lock protection for list
handling.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add common mport removal support functions into the RapidIO subsystem
core.
Changes to the existing mport registration process have been made to
avoid race conditions with active subsystem interfaces immediately after
mport device registration: part of initialization code from
rio_register_mport() have been moved into separate function
rio_mport_initialize() to allow to perform mport registration as the
final step of setup process.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make net allocation/release routines available to all components of
RapidIO subsystem by moving code from rio-scan enumerator.
Make destination ID allocation method private to existing enumerator
because other enumeration methods can use their own algorithm.
Setup net device object as a parent of all RapidIO devices residing in
it and register net as a child of active mport device.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:
- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)
Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.
To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).
If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.
Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 07d777fe8c "tracing: Add percpu buffers for trace_printk()"
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- Add support for multiple NFSv4.1 callbacks in flight
- Initial patchset for RPC multipath support
- Adapt RPC/RDMA to use the new completion queue API
Bugfixes and cleanups:
- nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
- Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
- Fix RPC/RDMA credit accounting
- Properly handle RDMA_ERROR replies
- xprtrdma: Do not wait if ib_post_send() fails
- xprtrdma: Segment head and tail XDR buffers on page boundaries
- xprtrdma cleanups for dprintk, physical_op_map and unused macros"
* tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits)
nfs/blocklayout: make sure making a aligned read request
nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
nfs: remove nfs_inode_dio_wait
nfs: remove nfs4_file_fsync
xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
xprtrdma: Use an anonymous union in struct rpcrdma_mw
xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
xprtrdma: Serialize credit accounting again
xprtrdma: Properly handle RDMA_ERROR replies
rpcrdma: Add RPCRDMA_HDRLEN_ERR
xprtrdma: Do not wait if ib_post_send() fails
xprtrdma: Segment head and tail XDR buffers on page boundaries
xprtrdma: Clean up dprintk format string containing a newline
xprtrdma: Clean up physical_op_map()
xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
SUNRPC: Allow addition of new transports to a struct rpc_clnt
NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
SUNRPC: Make NFS swap work with multipath
...
.//include/linux/netdevice.h:1826: warning: No description found for parameter 'ptype_all'
.//include/linux/netdevice.h:1826: warning: No description found for parameter 'ptype_specific'
Introduced by commit 7866a62104 ("dev: add per net_device packet type
chains")
Cc: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flags IFF_XMIT_DST_RELEASE_PERM, IFF_IPVLAN_MASTER and
IFF_IPVLAN_SLAVE are missing descriptions for the Documentation. Adding
them.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Suggested-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions ntb_peer_db_addr and ntb_peer_spad_addr were required by
the api. The functions already support returning an error, so any
existing calling code should already check for it. Any existing code
using drivers that implement the functions will not be affected.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Pull MMC updates from Ulf Hansson:
"MMC core:
- Fix ABI regression of MMC BLK ioctl
- Remove the unused MMC_DATA_STREAM flag
- Enable asynchronous system PM for the host device
- Minor fixes and clean-ups
SDHCI host:
Throughout the years, the numbers of SDHCI variants have increased
and so has also the numbers of SDHCI callbacks/quirks. The purpose
of these callbacks/quirks were to enable SDHCI to deal with variant
specific requirements, but unfortunate this method didn't scale.
Instead we have ended up with a mess. Not only did the code become
suboptimal but also highly fragile.
Lately many discussions of how to move forward with SDHCI has taken
place at the MMC mailing list. Step by step, we aim to turn
SDHCI's common code into a set of library functions. This will
enable for optimizations and allow some of the existing callbacks
and quirks to be removed, which also should help to make the code
less fragile.
Therefore I am also really pleased to announce that Adrian Hunter
(Intel) has volunteered to step in as the maintainer for SDHCI.
Future wise, I hope the community around SDHCI will continue to
grow and that this release cycle can be the starting point of
moving SDHCI into a better shape. As a matter of fact, already in
this cycle the re-factoring has begun, but of course there are also
fixes and new features included. Some highlights:
- sdhci-iproc: Add support for Broadcom's BCM2835 eMMC IP
- sdhci-acpi: Add support for QCOM controllers
- sdhci-pic32: Add new SDHCI variant for PIC32MZDA
Other hosts:
- atmel-mci: Fix a NULL pointer dereference
- mediatek: Add SD write-protect support
- mmc_spi: Fix card detect in GPIO case
- tmio/sdhi: Add r8a7795 support
- tmio/sdhi: Some fixes and clean-ups
- dw_mmc: Add HW reset support
- dw_mmc: Some fixes and clean-ups
- sunxi: Add support for MMC DDR52 mode"
* tag 'mmc-v4.6' of git://git.linaro.org/people/ulf.hansson/mmc: (123 commits)
mmc: sdhci-of-at91: fix wake-up issue when using runtime pm
mmc: sdhci-pci: Do not set DMA mask in enable_dma()
mmc: sdhci-acpi: Remove enable_dma() hook
mmc: sdhci: Set DMA mask when adding host
mmc: block: fix ABI regression of mmc_blk_ioctl
mmc: atmel-mci: Check pdata for NULL before dereferencing it at DMA config
mmc: core: remove redundant memset of sdio_read_cccr
mmc: core: remove redundant memset of mmc_decode_cid
mmc: of_mmc_spi: fix unused warning
mmc: sdhci-of-arasan: add phy support for sdhci-of-arasan
mmc: sdhci-of-arasan: fix missing sdhci_pltfm_free for err handling
mmc: sdhci-of-arasan: remove disable clk_ahb from sdhci_arasan_resume
Documentation: bindings: add description of phy for sdhci-of-arasan
mmc: sdhci: Fix override of timeout clk wrt max_busy_timeout
mmc: mmci: Remove unnecessary header file
mmc: sdhci-acpi: add QCOM controllers
mmc: tegra: implement memcomp pad calibration
mmc: mediatek: Use mmc_regulator_set_vqmmc in start_signal_voltage_switch
mmc: mediatek: Change signal voltage error to dev_dbg()
mmc: sh_mmcif, tmio: Use ARCH_RENESAS
...
Implement the IB defined callbacks used to manipulate the policy for the
link state, set GUIDs or get statistics information. This functionality
is added into a new file that will be used to add any SRIOV related
functionality to the mlx5 IB layer.
The following callbacks have been added:
mlx5_ib_get_vf_config
mlx5_ib_set_vf_link_state
mlx5_ib_get_vf_stats
mlx5_ib_set_vf_guid
In addition, publish whether this device is based on a virtual function.
In mlx5 supported devices, virtual functions are implemented as vHCAs.
vHCAs have their own QP number space so it is possible that two vHCAs
will use a QP with the same number at the same time.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Implement the modify HCA vport commands used to modify the parameters of
virtual HCA's ports.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a vf parameter to mlx5_core_query_vport_counter so we can call it to
query counters of virtual functions. Also update current users of the
API.
PFs may call mlx5_core_query_vport_counter with other_vport set to
indicate that they are querying a virtual function. The virtual
function to be queried is given by the vf parameter. Virtual function
numbering is zero based so the first VF is 0 and so on. When a PF
queries its own function, the other_vport parameter is cleared.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.6 kernel.
Overall the coolest thing here for me is the nouveau maxwell signed
firmware support from NVidia, it's taken a long while to extract this
from them.
I also wish the ARM vendors just designed one set of display IP, ARM
display block proliferation is definitely increasing.
Core:
- drm_event cleanups
- Internal API cleanup making mode_fixup optional.
- Apple GMUX vga switcheroo support.
- DP AUX testing interface
Panel:
- Refactoring of DSI core for use over more transports.
New driver:
- ARM hdlcd driver
i915:
- FBC/PSR (framebuffer compression, panel self refresh) enabled by default.
- Ongoing atomic display support work
- Ongoing runtime PM work
- Pixel clock limit checks
- VBT DSI description support
- GEM fixes
- GuC firmware scheduler enhancements
amdkfd:
- Deferred probing fixes to avoid make file or link ordering.
amdgpu/radeon:
- ACP support for i2s audio support.
- Command Submission/GPU scheduler/GPUVM optimisations
- Initial GPU reset support for amdgpu
vmwgfx:
- Support for DX10 gen mipmaps
- Pageflipping and other fixes.
exynos:
- Exynos5420 SoC support for FIMD
- Exynos5422 SoC support for MIPI-DSI
nouveau:
- GM20x secure boot support - adds acceleration for Maxwell GPUs.
- GM200 support
- GM20B clock driver support
- Power sensors work
etnaviv:
- Correctness fixes for GPU cache flushing
- Better support for i.MX6 systems.
imx-drm:
- VBlank IRQ support
- Fence support
- OF endpoint support
msm:
- HDMI support for 8996 (snapdragon 820)
- Adreno 430 support
- Timestamp queries support
virtio-gpu:
- Fixes for Android support.
rockchip:
- Add support for Innosilicion HDMI
rcar-du:
- Support for 4 crtcs
- R8A7795 support
- RCar Gen 3 support
omapdrm:
- HDMI interlace output support
- dma-buf import support
- Refactoring to remove a lot of legacy code.
tilcdc:
- Rewrite of pageflipping code
- dma-buf support
- pinctrl support
vc4:
- HDMI modesetting bug fixes
- Significant 3D performance improvement.
fsl-dcu (FreeScale):
- Lots of fixes
tegra:
- Two small fixes
sti:
- Atomic support for planes
- Improved HDMI support"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1063 commits)
drm/amdgpu: release_pages requires linux/pagemap.h
drm/sti: restore mode_fixup callback
drm/amdgpu/gfx7: add MTYPE definition
drm/amdgpu: removing BO_VAs shouldn't be interruptible
drm/amd/powerplay: show uvd/vce power gate enablement for tonga.
drm/amd/powerplay: show uvd/vce power gate info for fiji
drm/amdgpu: use sched fence if possible
drm/amdgpu: move ib.fence to job.fence
drm/amdgpu: give a fence param to ib_free
drm/amdgpu: include the right version of gmc header files for iceland
drm/radeon: fix indentation.
drm/amd/powerplay: add uvd/vce dpm enabling flag to fix the performance issue for CZ
drm/amdgpu: switch back to 32bit hw fences v2
drm/amdgpu: remove amdgpu_fence_is_signaled
drm/amdgpu: drop the extra fence range check v2
drm/amdgpu: signal fences directly in amdgpu_fence_process
drm/amdgpu: cleanup amdgpu_fence_wait_empty v2
drm/amdgpu: keep all fences in an RCU protected array v2
drm/amdgpu: add number of hardware submissions to amdgpu_fence_driver_init_ring
drm/amdgpu: RCU protected amd_sched_fence_release
...
Add two new NLAs to support configuration of Infiniband node or port
GUIDs. New applications can choose to use this interface to configure
GUIDs with iproute2 with commands such as:
ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78
A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the
request to change the GUID.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Define the necessary hardware structures for the offload
arithmetic capabilities and read/cache them on driver load.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Device capability function was called similar in all places.
It was called twice for every queried parameter, while the
difference between calls was in HCA capability mode only.
The change proposed unify these calls into one function.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull arm[64] perf updates from Will Deacon:
"I have another mixed bag of ARM-related perf patches here.
It's about 25% CPU and 75% interconnect, but with drivers/bus/
languishing without an obvious maintainer or tree, Olof and I agreed
to keep all of these PMU patches together. I suspect a whole load of
code from drivers/bus/arm-* can be moved under drivers/perf/, so
that's on the radar for the future.
Summary:
- Initial support for ARMv8.1 CPU PMUs
- Support for the CPU PMU in Cavium ThunderX
- CPU PMU support for systems running 32-bit Linux in secure mode
- Support for the system PMU in ARM CCI-550 (Cache Coherent Interconnect)"
* tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (26 commits)
drivers/perf: arm_pmu: avoid NULL dereference when not using devicetree
arm64: perf: Extend ARMV8_EVTYPE_MASK to include PMCR.LC
arm-cci: remove unused variable
arm-cci: don't return value from void function
arm-cci: make private functions static
arm-cci: CoreLink CCI-550 PMU driver
arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU
arm-cci: CCI-500: Work around PMU counter writes
arm-cci: Provide hook for writing to PMU counters
arm-cci: Add helper to enable PMU without synchornising counters
arm-cci: Add routines to save/restore all counters
arm-cci: Get the status of a counter
arm-cci: write_counter: Remove redundant check
arm-cci: Delay PMU counter writes to pmu::pmu_enable
arm-cci: Refactor CCI PMU enable/disable methods
arm-cci: Group writes to counter
arm-cci: fix handling cpumask_any_but return value
arm-cci: simplify sysfs attr handling
drivers/perf: arm_pmu: implement CPU_PM notifier
arm64: dts: Add Cavium ThunderX specific PMU
...
Pull UDF and quota updates from Jan Kara:
"This contains a rewrite of UDF handling of filename encoding to fix
remaining overflow issues from Andrew Gabbasov and quota changes to
support new Q_[X]GETNEXTQUOTA quotactl for VFS quota formats"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix possible GPF due to uninitialised pointers
ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes
quota: Forbid Q_GETQUOTA and Q_GETNEXTQUOTA for frozen filesystem
quota: Fix possible races during quota loading
ocfs2: Implement get_next_id()
quota_v2: Implement get_next_id() for V2 quota format
quota: Add support for ->get_nextdqblk() for VFS quota
udf: Merge linux specific translation into CS0 conversion function
udf: Remove struct ustr as non-needed intermediate storage
udf: Use separate buffer for copying split names
udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
udf: Join functions for UTF8 and NLS conversions
udf: Parameterize output length in udf_put_filename
quota: Allow Q_GETQUOTA for frozen filesystem
quota: Fixup comments about return value of Q_[X]GETNEXTQUOTA
Pull xfs updates from Dave Chinner:
"There's quite a lot in this request, and there's some cross-over with
ext4, dax and quota code due to the nature of the changes being made.
As for the rest of the XFS changes, there are lots of little things
all over the place, which add up to a lot of changes in the end.
The major changes are that we've reduced the size of the struct
xfs_inode by ~100 bytes (gives an inode cache footprint reduction of
>10%), the writepage code now only does a single set of mapping tree
lockups so uses less CPU, delayed allocation reservations won't
overrun under random write loads anymore, and we added compile time
verification for on-disk structure sizes so we find out when a commit
or platform/compiler change breaks the on disk structure as early as
possible.
Change summary:
- error propagation for direct IO failures fixes for both XFS and
ext4
- new quota interfaces and XFS implementation for iterating all the
quota IDs in the filesystem
- locking fixes for real-time device extent allocation
- reduction of duplicate information in the xfs and vfs inode, saving
roughly 100 bytes of memory per cached inode.
- buffer flag cleanup
- rework of the writepage code to use the generic write clustering
mechanisms
- several fixes for inode flag based DAX enablement
- rework of remount option parsing
- compile time verification of on-disk format structure sizes
- delayed allocation reservation overrun fixes
- lots of little error handling fixes
- small memory leak fixes
- enable xfsaild freezing again"
* tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (66 commits)
xfs: always set rvalp in xfs_dir2_node_trim_free
xfs: ensure committed is initialized in xfs_trans_roll
xfs: borrow indirect blocks from freed extent when available
xfs: refactor delalloc indlen reservation split into helper
xfs: update freeblocks counter after extent deletion
xfs: debug mode forced buffered write failure
xfs: remove impossible condition
xfs: check sizes of XFS on-disk structures at compile time
xfs: ioends require logically contiguous file offsets
xfs: use named array initializers for log item dumping
xfs: fix computation of inode btree maxlevels
xfs: reinitialise per-AG structures if geometry changes during recovery
xfs: remove xfs_trans_get_block_res
xfs: fix up inode32/64 (re)mount handling
xfs: fix format specifier , should be %llx and not %llu
xfs: sanitize remount options
xfs: convert mount option parsing to tokens
xfs: fix two memory leaks in xfs_attr_list.c error paths
xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
...
Pull f2fs updates from Jaegeuk Kim:
"New Features:
- uplift filesystem encryption into fs/crypto/
- give sysfs entries to control memroy consumption
Enhancements:
- aio performance by preallocating blocks in ->write_iter
- use writepages lock for only WB_SYNC_ALL
- avoid redundant inline_data conversion
- enhance forground GC
- use wait_for_stable_page as possible
- speed up SEEK_DATA and fiiemap
Bug Fixes:
- corner case in terms of -ENOSPC for inline_data
- hung task caused by long latency in shrinker
- corruption between atomic write and f2fs_trace_pid
- avoid garbage lengths in dentries
- revoke atomicly written pages if an error occurs
In addition, there are various minor bug fixes and clean-ups"
* tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
f2fs: submit node page write bios when really required
f2fs: add missing argument to f2fs_setxattr stub
f2fs: fix to avoid unneeded unlock_new_inode
f2fs: clean up opened code with f2fs_update_dentry
f2fs: declare static functions
f2fs: use cryptoapi crc32 functions
f2fs: modify the readahead method in ra_node_page()
f2fs crypto: sync ext4_lookup and ext4_file_open
fs crypto: move per-file encryption from f2fs tree to fs/crypto
f2fs: mutex can't be used by down_write_nest_lock()
f2fs: recovery missing dot dentries in root directory
f2fs: fix to avoid deadlock when merging inline data
f2fs: introduce f2fs_flush_merged_bios for cleanup
f2fs: introduce f2fs_update_data_blkaddr for cleanup
f2fs crypto: fix incorrect positioning for GCing encrypted data page
f2fs: fix incorrect upper bound when iterating inode mapping tree
f2fs: avoid hungtask problem caused by losing wake_up
f2fs: trace old block address for CoWed page
f2fs: try to flush inode after merging inline data
f2fs: show more info about superblock recovery
...
Pull cgroup namespace support from Tejun Heo:
"These are changes to implement namespace support for cgroup which has
been pending for quite some time now. It is very straight-forward and
only affects what part of cgroup hierarchies are visible.
After unsharing, mounting a cgroup fs will be scoped to the cgroups
the task belonged to at the time of unsharing and the cgroup paths
exposed to userland would be adjusted accordingly"
* 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix and restructure error handling in copy_cgroup_ns()
cgroup: fix alloc_cgroup_ns() error handling in copy_cgroup_ns()
Add FS_USERNS_FLAG to cgroup fs
cgroup: Add documentation for cgroup namespaces
cgroup: mount cgroupns-root when inside non-init cgroupns
kernfs: define kernfs_node_dentry
cgroup: cgroup namespace setns support
cgroup: introduce cgroup namespaces
sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
kernfs: Add API to generate relative kernfs path
Support for TI Message Manager Module. This hardware block manages a
bunch of hardware queues meant for communication between processor
entities.
Clients sitting on top of this would manage the required protocol
for communicating with the counterpart entities.
For more details on TI Message Manager hardware block, see documentation
that will is available here: http://www.ti.com/lit/ug/spruhy8/spruhy8.pdf
Chapter 8.1(Message Manager)
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].
This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.
Here the gory details of how the computation is done:
* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
PTSC period
* Jmax: max compute unit accumulated power which is indicated by
MSR_C001007b[MaxCpuSwPwrAcc]
* Jx/Jy: compute unit accumulated power which is indicated by
MSR_C001007a[CpuSwPwrAcc]
* Tx/Ty: the value of performance timestamp counter which is indicated
by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power
i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].
ii. Read the full range of the cumulative energy value from the new
MSR MaxCpuSwPwrAcc.
Jmax = value returned.
iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.
iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.
v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:
if (Jy < Jx) // Rollover has occurred
Jdelta = (Jy + Jmax) - Jx
else
Jdelta = Jy - Jx
PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)
Simple example:
root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHK include/generated/timeconst.h
CHK include/generated/bounds.h
CHK include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
SKIPPED include/generated/compile.h
Building modules, stage 2.
Kernel: arch/x86/boot/bzImage is ready (#40)
MODPOST 4225 modules
Performance counter stats for 'system wide':
183.44 mWatts power/power-pkg/
341.837270111 seconds time elapsed
root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10
Performance counter stats for 'system wide':
0.18 mWatts power/power-pkg/
10.012551815 seconds time elapsed
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jacob.w.shin@gmail.com
Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com
[ Fixed the modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>