New versions of OPAL have a device node /ibm,opal/firmware/exports, each
property of which describes a range of memory in OPAL that Linux might
want to export to userspace for debugging.
This patch adds a sysfs file under 'opal/exports' for each property
found there, and makes it read-only by root.
Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Drop counting of props, rename to attr, free on sysfs error, c'log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When booting very large systems with a large initrd, we run out of
space early in boot for either RTAS or the flattened device tree (FDT).
Boot fails with messages like:
Could not allocate memory for RTAS
or
No memory for flatten_device_tree (no room)
Increasing the minimum RMA size to 512MB fixes the problem. This
should not have an impact on smaller LPARs (with 256MB memory),
as the firmware will cap the RMA to the memory assigned to the LPAR.
Fix is based on input/discussions with Michael Ellerman. Thanks to
Praveen K. Pandey for testing on a large system.
Reported-by: Praveen K. Pandey <preveen.pandey@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.
To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.
This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull Xtensa fixes from Max Filippov:
- make __pa work with uncached KSEG addresses, it fixes DMA memory
mmapping and DMA debug
- fix torn stack dump output
- wire up statx syscall
* tag 'xtensa-20170403' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: wire up statx system call
xtensa: fix stack dump output
xtensa: make __pa work with uncached KSEG addresses
Instead of expecting that GPIO is always interrupt source, let's use
interrupt specified in I2C client.
Reviewed-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull s390 fixes from Martin Schwidefsky:
"Four bug fixes, two of them for stable:
- avoid initrd corruptions in the kernel decompressor
- prevent inconsistent dumps if the boot CPU does not have address
zero
- fix the new pkey interface added with the merge window for 4.11
- a fix for a fix, another issue with user copy zero padding"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uaccess: get_user() should zero on failure (again)
s390/pkey: Fix wrong handling of secure key with old MKVP
s390/smp: fix ipl from cpu with non-zero address
s390/decompressor: fix initrd corruption caused by bss clear
Pull RAS fix from Thomas Gleixner:
"Prevent dmesg from being spammed when MCE logging is active"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Don't print MCEs when mcelog is active
The pnv_pci_get_{gpu|npu}_dev functions are used to find associations
between nvlink PCIe devices and standard PCIe devices. However they
lacked basic sanity checking which results in NULL pointer
dereferencing if they are incorrect called can be harder to spot than
an explicit WARN_ON.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The code to fix the problem it describes was removed in commit
c40785ad30 ("powerpc/dart: Use a cachable DART"), and it uses the
stupid comment style. Away it goooooooooooooes!
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Early on in do_page_fault() we call store_updates_sp(), regardless of
the type of exception. For an instruction miss this doesn't make
sense, because we only use this information to detect if a data miss
is the result of a stack expansion instruction or not.
Worse still, it results in a data miss within every userspace
instruction miss handler, because we try and load the very instruction
we are about to install a pte for!
A simple exec microbenchmark runs 6% faster on POWER8 with this fix:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
unsigned long left = atol(argv[1]);
char leftstr[16];
if (left-- == 0)
return 0;
sprintf(leftstr, "%ld", left);
execlp(argv[0], argv[0], leftstr, NULL);
perror("exec failed\n");
return 0;
}
Pass the number of iterations on the command line (eg 10000) and time
how long it takes to execute.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The Cubietruck has an AXP209 PMIC and can be power-supplied by ACIN via
the CHG-IN pin or by USB.
This enables the ACIN and the USB power supply subnode in the DT.
Signed-off-by: Alexander Syring <alex@asyring.de>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Change-recording override (CO) was never implemented in any
machine. According to the architecture it is unpredictable if a
translation-specification exception will be recognized if the bit is
set and EDAT1 does not apply.
Therefore the easiest solution is to simply ignore the bit.
This also fixes commit cd1836f583 ("KVM: s390:
instruction-execution-protection support"). A guest may enable
instruction-execution-protection (IEP) but not EDAT1. In such a case
the guest_translate() function (arch/s390/kvm/gaccess.c) will report a
specification exception on pages that have the IEP bit set while it
should not.
It might make sense to add full IEP support to guest_translate() and
the GACC_IFETCH case. However, as far as I can tell the GACC_IFETCH
case is currently only used after an instruction was executed in order
to fetch the failing instruction. So there is no additional problem
*currently*.
Fixes: cd1836f583 ("KVM: s390: instruction-execution-protection support")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add the Z2 clock (Cortex-A7 CPU core clock), which uses a fixed divider,
and link the first CPU node to it.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Unlike other R-Car Gen2 SoCs with Cortex-A15 CPU cores, R-Car V2H does
not have a programmable Z clock (Cortex-A15 CPU core clock), but uses a
fixed divider.
This is similar to the Z2 clock (Cortex-A7 CPU core clock) on R-Car E2.
Hence:
- Remove the Z clock output from the cpg_clocks node, as this implied
a programmable clock,
- Add the Z clock as a fixed factor clock,
- Let the first CPU node point to the new Z clock,
- Remove the Z clock index from the bindings (this definition was used
by r8a7792.dtsi only, and was not a contract between DT and driver).
Fixes: 7c4163aae3 ("ARM: dts: r8a7792: initial SoC device tree")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
The SSI-ALL gate clock is located in between the P clock and the
individual SSI[0-9] clocks, hence the former should be listed as their
parent.
Fixes: 072d326542 ("ARM: dts: r8a7793: add MSTP10 clocks to device tree")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
The SSI-ALL gate clock is located in between the P clock and the
individual SSI[0-9] clocks, hence the former should be listed as their
parent.
Fixes: ee9141522d ("ARM: shmobile: r8a7791: add MSTP10 support on DTSI")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
The SSI-ALL gate clock is located in between the P clock and the
individual SSI[0-9] clocks, hence the former should be listed as their
parent.
Fixes: bcde372254 ("ARM: shmobile: r8a7790: add MSTP10 support on DTSI")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Technically, the Ethernet block is run off the 133MHz Bus (B) clock, not
the 33MHz Peripheral 0 (P0) clock.
Fixes: 969244f9c7 ("ARM: dts: r7s72100: add ethernet clock to device tree")
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
numa_nodemask_from_meminfo() generates a nodemask of nodes which have
memory according to a meminfo descriptor.
The two callsites of that function both set bits in copies of the
numa_nodes_parsed nodemask. In both cases, the information in supplied
numa_meminfo is a subset of numa_nodes_parsed. So setting those bits
again is not really necessary.
Here are the three call paths which show that the supplied numa_meminfo
argument describes memory regions in nodes which are already in
numa_nodes_parsed:
x86_numa_init()
numa_init()
Case 1:
acpi_numa_init()
acpi_parse_memory_affinity()
numa_add_memblk()
node_set(numa_nodes_parsed)
acpi_parse_slit()
acpi_numa_slit_init()
numa_set_distance()
numa_alloc_distance()
numa_nodemask_from_meminfo()
Case 2:
amd_numa_init()
numa_add_memblk()
node_set(numa_nodes_parsed)
Case 3
dummy_numa_init()
node_set(numa_nodes_parsed)
numa_add_memblk()
numa_register_memblks()
numa_nodemask_from_meminfo()
Thus, in all three cases, the respective bit in numa_nodes_parsed is
set, which means it is not necessary to set it again in a copy of
numa_nodes_parsed.
So remove that function.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20170314030801.13656-2-richard.weiyang@gmail.com
[ Heavily massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch enables USB HS working in FS mode on stm32f429-disco
with 5V VBUS enable.
Signed-off-by: Bruno Herrera <bruherrera@gmail.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
This patch enables USB FS on stm32f469-disco with 5V VBUS enable.
Signed-off-by: Bruno Herrera <bruherrera@gmail.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
The kbuild test robot reported this build failure on a number
of architectures:
> make.cross ARCH=arm
> lib/lib.a(bug.o): In function `find_bug':
> >> lib/bug.c:135: undefined reference to `__start___bug_table'
> >> lib/bug.c:135: undefined reference to `__stop___bug_table'
Caused by:
19d436268d ("debug: Add _ONCE() logic to report_bug()")
Which moved the BUG_TABLE from RO_DATA_SECTION() to RW_DATA_SECTION(),
but a number of architectures don't use RW_DATA_SECTION(), so they
ended up with no __bug_table[] ...
Ideally all those would use RW_DATA_SECTION() in their linker scripts,
but that's for another day.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Link: http://lkml.kernel.org/r/20170330154927.o6qmgfp4bdhrajbm@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Update the Alpine clock-frequency values with valid default values. The
bootloader can still update these values if needed, but at least we can
boot if it does not.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Cosmetic cleanup to have consistent node definitions. Add a space before
the node units which do not have one.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
For an MCE (Machine Check Exception) that hits while in user mode
MSR(PR=1), print the task info to the console MCE error log. This may
help to identify an application that triggered the MCE.
After this patch the MCE console looks like:
Severe Machine check interrupt [Recovered]
NIP: [0000000010039778] PID: 762 Comm: ebizzy
Initiator: CPU
Error type: SLB [Multihit]
Effective address: 0000000010039778
Severe Machine check interrupt [Not recovered]
NIP: [0000000010039778] PID: 763 Comm: ebizzy
Initiator: CPU
Error type: UE [Page table walk ifetch]
Effective address: 0000000010039778
ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For D-side errors we print the load/store address that caused the
machine check as 'Effective address'. But the instruction that may have
caused the machine check can also be helpful, so in addition to printing
the NIP, also print the kernel function name as well.
After this patch the MCE console log would look like:
Severe Machine check interrupt [Recovered]
NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel]
Initiator: CPU
Error type: SLB [Parity]
Effective address: d000000026de0000
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Make sure to reserve the boot memory for the flattened device tree.
Otherwise it might get overwritten, e.g. when initial_boot_params is
copied, leading to a corrupted FDT and a boot hang/crash:
bootconsole [early0] enabled
Early console on uart16650 initialized at 0xf8001600
OF: fdt: Error -11 processing FDT
Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!
---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!
Guenter Roeck says:
> I think I found the problem. In unflatten_and_copy_device_tree(), with added
> debug information:
>
> OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca)
>
> ... and then initial_boot_params is copied to dt, which results in corrupted
> fdt since the memory overlaps. Looks like the initial_boot_params memory
> is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch().
Cc: stable@vger.kernel.org
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reference: http://lkml.kernel.org/r/20170226210338.GA19476@roeck-us.net
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Pull x86 fixes from Thomas Gleixner:
"This update provides:
- prevent KASLR from randomizing EFI regions
- restrict the usage of -maccumulate-outgoing-args and document when
and why it is required.
- make the Global Physical Address calculation for UV4 systems work
correctly.
- address a copy->paste->forgot-edit problem in the MCE exception
table entries.
- assign a name to AMD MCA bank 3, so the sysfs file registration
works.
- add a missing include in the boot code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Include missing header file
x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
x86/build: Mostly disable '-maccumulate-outgoing-args'
x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
x86/mce: Fix copy/paste error in exception table entries
x86/platform/uv: Fix calculation of Global Physical Address
Pull scheduler fixes from Thomas Gleixner:
"This update provides:
- make the scheduler clock switch to unstable mode smooth so the
timestamps stay at microseconds granularity instead of switching to
tick granularity.
- unbreak perf test tsc by taking the new offset into account which
was added in order to proveide better sched clock continuity
- switching sched clock to unstable mode runs all clock related
computations which affect the sched clock output itself from a work
queue. In case of preemption sched clock uses half updated data and
provides wrong timestamps. Keep the math in the protected context
and delegate only the static key switch to workqueue context.
- remove a duplicate header include"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/headers: Remove duplicate #include <linux/sched/debug.h> line
sched/clock: Fix broken stable to unstable transfer
sched/clock, x86/perf: Fix "perf test tsc"
sched/clock: Fix clear_sched_clock_stable() preempt wobbly
Pull parisc fixes from Helge Deller:
"Al Viro reported that - in case of read faults - our copy_from_user()
implementation may claim to have copied more bytes than it actually
did. In order to fix this bug and because of the way how gcc optimizes
register usage for inline assembly in C code, we had to replace our
pa_memcpy() function with a pure assembler implementation.
While fixing the memcpy bug we noticed some other issues with our
get_user() and put_user() functions, e.g. nested faults may return
wrong data. This is now fixed by a common fixup handler for
get_user/put_user in the exception handler which additionally makes
generated code smaller and faster.
The third patch is a trivial one-line fix for a patch which went in
during 4.11-rc and which avoids stalled CPU warnings after power
shutdown (for parisc machines which can't plug power off themselves).
Due to the rewrite of pa_memcpy() into assembly this patch got bigger
than what I wanted to have sent at this stage.
Those patches have been running in production during the last few days
on our debian build servers without any further issues"
* 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid stalled CPU warnings after system shutdown
parisc: Clean up fixup routines for get_user()/put_user()
parisc: Fix access fault handling in pa_memcpy()
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: do not sanitize kexec purgatory
drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
mm/hugetlb.c: don't call region_abort if region_chg fails
kasan: report only the first error by default
hugetlbfs: initialize shared policy as part of inode allocation
mm: fix section name for .data..ro_after_init
mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
mm: workingset: fix premature shadow node shrinking with cgroups
mm: rmap: fix huge file mmap accounting in the memcg stats
mm: move mm_percpu_wq initialization earlier
mm: migrate: fix remove_migration_pte() for ksm pages
Pull ARC fixes from Vineet Gupta:
"Accumulated fixes for ARC which I've been been sitting on for a while:
- reading clk from driver vs device tree [Vlad]
- fix support for UIO in VDK platform [Alexey]
- SLC busy bit reading workaround
- build warning with kprobes header reorg"
* tag 'arc-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: fix build warnings with !CONFIG_KPROBES
ARCv2: SLC: Make sure busy bit is set properly on SLC flushing
ARC: vdk: Fix support of UIO
ARCv2: make unimplemented vectors as no-ops rather than halt core
ARC: get rate from clk driver instead of reading device tree
ARC: [dts] add cpu nodes to ARCHS SMP device tree
ARC: [dts] add input clocks for cpu nodes
Not all user space application is ready to handle wide addresses. It's
known that at least some JIT compilers use higher bits in pointers to
encode their information. It collides with valid pointers with 512TB
addresses and leads to crashes.
To mitigate this, we are not going to allocate virtual address space
above 128TB by default.
But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 128TB.
If hint address set above 128TB, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 128TB window.
This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.
This is going to be a per mmap decision. ie, we can have some mmaps with
larger addresses and other that do not.
A sample memory layout looks like:
10000000-10010000 r-xp 00000000 fc:00 9057045 /home/max_addr_512TB
10010000-10020000 r--p 00000000 fc:00 9057045 /home/max_addr_512TB
10020000-10030000 rw-p 00010000 fc:00 9057045 /home/max_addr_512TB
10029630000-10029660000 rw-p 00000000 00:00 0 [heap]
7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0
7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so
7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so
7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so
7fff83690000-7fff836a0000 rw-p 00000000 00:00 0
7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0 [vdso]
7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so
7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so
7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so
7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0 [stack]
1000000000000-1000000010000 rw-p 00000000 00:00 0
1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>