fault_info[] and debug_fault_info[] are static arrays defining memory abort
exception handling functions looking into ESR fault status code encodings.
As esr_to_fault_info() is already available providing fault_info[] array
lookup, it really makes sense to have a corresponding debug_fault_info[]
array lookup function as well. This just adds an equivalent helper function
esr_to_debug_fault_info().
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Most memory abort exception handling related functions have the arguments
in the order (addr, esr, regs) except is_el1_permission_fault(). This
changes the argument order in this function as (addr, esr, regs) like
others.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Just replace hard code value of 63 (0x111111) with an existing macro
ESR_ELx_FSC when parsing for the status code during fault exception.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add arm64_force_sig_mceerr for consistency with arm64_force_sig_fault,
and use it in the one location that can take advantage of it.
This removes the fiddly filling out of siginfo before sending a signal
reporting an memory error to userspace.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Wrap force_sig_fault with a helper that calls arm64_show_signal
and call arm64_force_sig_fault where appropraite.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
As this work is truly common between all of the signal sending cases
there is no need to repeat it between the different cases.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Not all of the signals passed to __do_user_fault can be handled
the same way so expand the now tiny __do_user_fault in it's callers
and remove it.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
It gets easy to confuse what is going on when some code is shared and some not
so stop sharing the trivial bits of signal generation to make future updates
easier to understand.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
These two cases are practically the same and use siginfo differently
from the other signals sent from do_page_fault. So consolidate them
to make future changes easier.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Every caller passes in current for tsk so there is no need to pass
tsk. Instead make tsk a local variable initialized to current.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Instead of generating a struct siginfo before calling arm64_notify_die
pass the signal number, tne sicode and the fault address into
arm64_notify_die and have it call force_sig_fault instead of
force_sig_info to let the generic code generate the struct siginfo.
This keeps code passing just the needed information into
siginfo generating code, making it easier to see what
is happening and harder to get wrong. Further by letting
the generic code handle the generation of struct siginfo
it reduces the number of sites generating struct siginfo
making it possible to review them and verify that all
of the fiddly details for a structure passed to userspace
are handled properly.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This reverts commit 46053c7368.
This change breaks architectures setting up dma_ops in their own magic
way and not using arch_setup_dma_ops, so revert it.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Once swapper_pg_dir is in the rodata section, it will not be possible to
modify it directly, but we will need to modify it in some cases.
To enable this, we can use the fixmap when deliberately modifying
swapper_pg_dir. As the pgd is only transiently mapped, this provides
some resilience against illicit modification of the pgd, e.g. for
Kernel Space Mirror Attack (KSMA).
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: simplify ifdeffery, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since the address of swapper_pg_dir is fixed for a given kernel image,
it is an attractive target for manipulation via an arbitrary write. To
mitigate this we'd like to make it read-only by moving it into the
rodata section.
We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir
and reserved_ttbr0, so these will also need to move into rodata.
However, swapper_pg_dir is allocated along with some transient page
tables used for boot which we do not want to move into rodata.
As a step towards this, this patch separates the boot-time page tables
into a new init_pg_dir, and reduces swapper_pg_dir to the single page it
needs to be. This allows us to retain the relationship between
swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly
separating these from the boot-time page tables.
The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during
boot, and all of these levels will be freed when we switch to the
swapper_pg_dir, which is initialized by the existing code in
paging_init(). Since we start off on the init_pg_dir, we no longer need
to allocate a transient page table in paging_init() in order to ensure
that swapper_pg_dir isn't live while we initialize it.
There should be no functional change as a result of this patch.
Signed-off-by: Jun Yao <yaojun8558363@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
[Mark: place init_pg_dir after BSS, fold mm changes, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Whilst the symmetry of deferring to the existing sync callback in
__iommu_map_page() is nice, taking a round-trip through
iommu_iova_to_phys() is a pretty heavyweight way to get an address we
can trivially compute from the page we already have. Tweaking it to just
perform the cache maintenance directly when appropriate doesn't really
make the code any more complicated, and the runtime efficiency gain can
only be a benefit.
Furthermore, the sync operations themselves know they can only be
invoked on a managed DMA ops domain, so can use the fast specific domain
lookup to avoid excessive manipulation of the group refcount
(particularly in the scatterlist cases).
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For contiguous hugetlb, huge_ptep_set_access_flags performs a
get_clear_flush (which then flushes the TLBs) even when no change of ptes
is necessary.
Unfortunately, this behaviour can lead to back-to-back page faults being
generated when running with multiple threads that access the same
contiguous huge page.
Thread 1 | Thread 2
-----------------------------+------------------------------
hugetlb_fault |
huge_ptep_set_access_flags |
-> invalidate pte range | hugetlb_fault
continue processing | wait for hugetlb_fault_mutex
release mutex and return | huge_ptep_set_access_flags
| -> invalidate pte range
hugetlb_fault
...
This patch changes huge_ptep_set_access_flags s.t. we first read the
contiguous range of ptes (whilst preserving dirty information); the pte
range is only then invalidated where necessary and this prevents further
spurious page faults.
Fixes: d8bdcff287 ("arm64: hugetlb: Add break-before-make logic for contiguous entries")
Reported-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In the contiguous bit hugetlb break-before-make code we assume that all
hugetlb pages are young.
In fact, remove_migration_pte is able to place an old hugetlb pte so
this assumption is not valid.
This patch fixes the contiguous hugetlb scanning code to preserve young
ptes.
Fixes: d8bdcff287 ("arm64: hugetlb: Add break-before-make logic for contiguous entries")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
include/linux/mmzone.h describes ARCH_HAS_HOLES_MEMORYMODEL as
relevant when parts the memmap have been free()d. This would
happen on systems where memory is smaller than a sparsemem-section,
and the extra struct pages are expensive. pfn_valid() on these
systems returns true for the whole sparsemem-section, so an extra
memmap_valid_within() check is needed.
On arm64 we have nomap memory, so always provide pfn_valid() to test
for nomap pages. This means ARCH_HAS_HOLES_MEMORYMODEL's extra checks
are already rolled up into pfn_valid().
Remove it.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Common Not Private (CNP) is a feature of ARMv8.2 extension which
allows translation table entries to be shared between different PEs in
the same inner shareable domain, so the hardware can use this fact to
optimise the caching of such entries in the TLB.
CNP occupies one bit in TTBRx_ELy and VTTBR_EL2, which advertises to
the hardware that the translation table entries pointed to by this
TTBR are the same as every PE in the same inner shareable domain for
which the equivalent TTBR also has CNP bit set. In case CNP bit is set
but TTBR does not point at the same translation table entries for a
given ASID and VMID, then the system is mis-configured, so the results
of translations are UNPREDICTABLE.
For kernel we postpone setting CNP till all cpus are up and rely on
cpufeature framework to 1) patch the code which is sensitive to CNP
and 2) update TTBR1_EL1 with CNP bit set. TTBR1_EL1 can be
reprogrammed as result of hibernation or cpuidle (via __enable_mmu).
For these two cases we restore CnP bit via __cpu_suspend_exit().
There are a few cases we need to care of changes in TTBR0_EL1:
- a switch to idmap
- software emulated PAN
we rule out latter via Kconfig options and for the former we make
sure that CNP is set for non-zero ASIDs only.
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
[catalin.marinas@arm.com: default y for CONFIG_ARM64_CNP]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The cpu errata and feature enable callbacks are only called via their
respective arm64_cpu_capabilities structure and therefore shouldn't
exist in the global namespace.
Move the PAN, RAS and cache maintenance emulation enable callbacks into
the same files as their corresponding arm64_cpu_capabilities structures,
making them static in the process.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Being consistent in our capitalisation for page-table dumps helps when
grepping for things like "end".
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There is no reason to leave the per-device dma_ops around when
deconfiguring a device, so move this code from arm64 into the
common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they
hit a present non-table entry. In both cases we'll warn for non-present
entries, as the VM_WARN_ON() only checks the entry is not a table entry.
This has been observed to result in warnings when booting a v4.19-rc2
kernel under qemu.
Fix this by bailing out earlier for non-present entries.
Fixes: ec28bb9c9b ("arm64: Implement page table free interfaces")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Merge updates from Andrew Morton:
- a few misc things
- a few Y2038 fixes
- ntfs fixes
- arch/sh tweaks
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
mm/hmm.c: remove unused variables align_start and align_end
fs/userfaultfd.c: remove redundant pointer uwq
mm, vmacache: hash addresses based on pmd
mm/list_lru: introduce list_lru_shrink_walk_irq()
mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one()
mm/list_lru.c: move locking from __list_lru_walk_one() to its caller
mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node()
mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP
mm/sparse: delete old sparse_init and enable new one
mm/sparse: add new sparse_init_nid() and sparse_init()
mm/sparse: move buffer init/fini to the common place
mm/sparse: use the new sparse buffer functions in non-vmemmap
mm/sparse: abstract sparse buffer allocations
mm/hugetlb.c: don't zero 1GiB bootmem pages
mm, page_alloc: double zone's batchsize
mm/oom_kill.c: document oom_lock
mm/hugetlb: remove gigantic page support for HIGHMEM
mm, oom: remove sleep from under oom_lock
kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()
mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
...
Pull arm64 fixes from Will Deacon:
"A couple of arm64 fixes
- Fix boot on Hikey-960 by avoiding an IPI with interrupts disabled
- Fix address truncation in pfn_valid() implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()
arm64: Avoid calling stop_machine() when patching jump labels
ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input
before seeing if the PFN is valid. This leads to false positives when
some of the upper bits are set, but the lower bits match a valid PFN.
For example, the following userspace code looks up a bogus entry in
/proc/kpageflags:
int pagemap = open("/proc/self/pagemap", O_RDONLY);
int pageflags = open("/proc/kpageflags", O_RDONLY);
uint64_t pfn, val;
lseek64(pagemap, [...], SEEK_SET);
read(pagemap, &pfn, sizeof(pfn));
if (pfn & (1UL << 63)) { /* valid PFN */
pfn &= ((1UL << 55) - 1); /* clear flag bits */
pfn |= (1UL << 55);
lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET);
read(pageflags, &val, sizeof(val));
}
On ARM64 this causes the userspace process to crash with SIGSEGV rather
than reading (1 << KPF_NOPAGE). kpageflags_read() treats the offset as
valid, and stable_page_flags() will try to access an address between the
user and kernel address ranges.
Fixes: c1cc155261 ("arm64: MMU initialisation")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull arm64 updates from Will Deacon:
"A bunch of good stuff in here. Worth noting is that we've pulled in
the x86/mm branch from -tip so that we can make use of the core
ioremap changes which allow us to put down huge mappings in the
vmalloc area without screwing up the TLB. Much of the positive
diffstat is because of the rseq selftest for arm64.
Summary:
- Wire up support for qspinlock, replacing our trusty ticket lock
code
- Add an IPI to flush_icache_range() to ensure that stale
instructions fetched into the pipeline are discarded along with the
I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the
selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to
be constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core
code has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
arm64: alternative: Use true and false for boolean values
arm64: kexec: Add comment to explain use of __flush_icache_range()
arm64: sdei: Mark sdei stack helper functions as static
arm64, kaslr: export offset in VMCOREINFO ELF notes
arm64: perf: Add cap_user_time aarch64
efi/libstub: Only disable stackleak plugin for arm64
arm64: drop unused kernel_neon_begin_partial() macro
arm64: kexec: machine_kexec should call __flush_icache_range
arm64: svc: Ensure hardirq tracing is updated before return
arm64: mm: Export __sync_icache_dcache() for xen-privcmd
drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
arm64: Add support for STACKLEAK gcc plugin
arm64: Add stack information to on_accessible_stack
drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
arm64: fix ACPI dependencies
rseq/selftests: Add support for arm64
arm64: acpi: fix alignment fault in accessing ACPI
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: preserve early mapping of UEFI memory map longer for BGRT
drivers: acpi: add dependency of EFI for arm64
...
Pull ACPI updates from Rafael Wysocki:
"These revert two ACPICA commits that are not needed any more, rework
the property graphs support in ACPI to be more aligned with the
analogous DT code, add some new quirks and remove one that isn't
needed any more, add a special platform driver to enumerate multiple
I2C devices hooked up to the same device object in the ACPI tables and
update the battery and button drivers.
Specifics:
- Revert two ACPICA commits that are not needed any more (Erik
Schmauss).
- Rework property graph support in the ACPI device properties
framework to make it behave more like the analogous DT code and
update the documentation of it (Sakari Ailus).
- Change the default ACPI device status after initialization to
ACPI_STA_DEFAULT instead of 0 (Hans de Goede).
- Add a special platform driver for enumerating multiple I2C devices
hooked up to the same object in the ACPI tables (Hans de Goede).
- Fix the ACPI battery driver to avoid reporting full capacity on
systems without support for that and clean it up (Hans de Goede,
Dmitry Rozhkov, Lucas Rangit Magasweran).
- Add two system wakeup quirks to the ACPI EC driver (Aaron Ma, Mika
Westerberg).
- Add the touchscreen on Dell Venue Pro 7139 to the list of "always
present" devices to make it work (Tristian Celestin).
- Revert a special tables handling quirk for Dell XPS 9570 and
Precision M5530 which is not needed any more (Kai Heng Feng).
- Add support for a new OEM _OSI string to allow system vendors to
work around issues with NVidia HDMI audio (Alex Hung).
- Prevent the ACPI button driver from reporting excessive system
wakeup events and clean it up (Ravi Chandra Sadineni, Randy
Dunlap).
- Clean up two minor code style issues in the ACPI core and GHES
handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd)"
* tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
platform/x86: Add ACPI i2c-multi-instantiate pseudo driver
ACPI / x86: utils: Remove status workaround from acpi_device_always_present()
ACPI / scan: Create platform device for fwnodes with multiple i2c devices
ACPI / scan: Initialize status to ACPI_STA_DEFAULT
ACPI / EC: Add another entry for Thinkpad X1 Carbon 6th
ACPI: bus: Fix a pointer coding style issue
arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()
ACPI / scan: Add static attribute to indirect_io_hosts[]
ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity
ACPI / EC: Use ec_no_wakeup on ThinkPad X1 Yoga 3rd
ACPI / battery: get rid of negations in conditions
ACPI / battery: use specialized print macros
ACPI / battery: reorder headers alphabetically
ACPI / battery: drop inclusion of init.h
ACPI: battery: remove redundant old_present check on insertion
ACPI: property: graph: Update graph documentation to use generic references
ACPI: property: graph: Improve graph documentation for port/ep numbering
ACPI: property: graph: Fix graph documentation
ACPI: property: Update documentation for hierarchical data extension 1.1
ACPI: property: Document key numbering for hierarchical data extension refs
...
Pull x86 mm updates from Thomas Gleixner:
- Make lazy TLB mode even lazier to avoid pointless switch_mm()
operations, which reduces CPU load by 1-2% for memcache workloads
- Small cleanups and improvements all over the place
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove redundant check for kmem_cache_create()
arm/asm/tlb.h: Fix build error implicit func declaration
x86/mm/tlb: Make clear_asid_other() static
x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off()
x86/mm/tlb: Always use lazy TLB mode
x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
x86/mm/tlb: Make lazy TLB mode lazier
x86/mm/tlb: Restructure switch_mm_irqs_off()
x86/mm/tlb: Leave lazy TLB mode at page table free time
mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids
x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap: Update pgtable free interfaces with addr
x86/mm: Disable ioremap free page handling on x86-PAE
In order to remove the additional check before calling the
ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
After this cleanup, we can simply call the ghes_notify_sea() to let
APEI driver handle the SEA notification.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and
data segments") tried to initialize various left-over ad-hoc vma's
"properly", but actually made things worse for the temporary vma's used
for TLB flushing.
vma_init() doesn't actually initialize all of the vma, just a few
fields, so doing something like
- struct vm_area_struct vma = { .vm_mm = tlb->mm, };
+ struct vm_area_struct vma;
+
+ vma_init(&vma, tlb->mm);
was actually very bad: instead of having a nicely initialized vma with
every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
with only a couple of fields initialized. And they weren't even fields
that the code in question mostly cared about.
The flush_tlb_range() function takes a "struct vma" rather than a
"struct mm_struct", because a few architectures actually care about what
kind of range it is - being able to only do an ITLB flush if it's a
range that doesn't have data accesses enabled, for example. And all the
normal users already have the vma for doing the range invalidation.
But a few people want to call flush_tlb_range() with a range they just
made up, so they also end up using a made-up vma. x86 just has a
special "flush_tlb_mm_range()" function for this, but other
architectures (arm and ia64) do the "use fake vma" thing instead, and
thus got caught up in the vma_init() changes.
At the same time, the TLB flushing code really doesn't care about most
other fields in the vma, so vma_init() is just unnecessary and
pointless.
This fixes things by having an explicit "this is just an initializer for
the TLB flush" initializer macro, which is used by the arm/arm64/ia64
people who mis-use this interface with just a dummy vma.
Fixes: 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and data segments")
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kvm, mm: account shadow page tables to kmemcg
zswap: re-check zswap_is_full() after do zswap_shrink()
include/linux/eventfd.h: include linux/errno.h
mm: fix vma_is_anonymous() false-positives
mm: use vma_init() to initialize VMAs on stack and data segments
mm: introduce vma_init()
mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
ipc/sem.c: prevent queue.status tearing in semop
mm: disallow mappings that conflict for devm_memremap_pages()
kasan: only select SLUB_DEBUG with SYSFS=y
delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
The xen-privcmd driver, which can be modular, calls set_pte_at()
which in turn may call __sync_icache_dcache().
The call to __sync_icache_dcache() may be optimised out because it is
conditional on !pte_special(), and xen-privcmd calls pte_mkspecial().
But it seems unwise to rely on this optimisation.
Fixes: 3ad0876554 ("xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Arnd reports the following arm64 randconfig build error with the PSI
patches that add another page flag:
/git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
/git/arm-soc/include/linux/compiler.h:357:38: error: call to
'__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)
The additional page flag causes other information stored in
page->flags to get bumped into their own struct page member:
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
BITS_PER_LONG - NR_PAGEFLAGS
#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
#else
#define LAST_CPUPID_WIDTH 0
#endif
#if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
#define LAST_CPUPID_NOT_IN_PAGE_FLAGS
#endif
which in turn causes the struct page size to exceed the size set in
STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
VMEMMAP page array according to address space and struct page size.
However, the check is performed - and triggers here - on a !VMEMMAP
config, which consumes an additional 22 page bits for the sparse
section id. When VMEMMAP is enabled, those bits are returned, cpupid
doesn't need its own member, and the page passes the VMEMMAP check.
Restrict that check to the situation it was meant to check: that we
are sizing the VMEMMAP page array correctly.
Says Arnd:
Further experiments show that the build error already existed before,
but was only triggered with larger values of CONFIG_NR_CPU and/or
CONFIG_NODES_SHIFT that might be used in actual configurations but
not in randconfig builds.
With longer CPU and node masks, I could recreate the problem with
kernels as old as linux-4.7 when arm64 NUMA support got added.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Fixes: 1a2db30034 ("arm64, numa: Add NUMA support for arm64 platforms.")
Fixes: 3e1907d5bf ("arm64: mm: move vmemmap region right below the linear region")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
lkdtm calls flush_icache_range(), which results in an out-of-line call
to __flush_icache_range(), which is not exported to modules.
Export the symbol to modules to fix this build breakage.
Fixes: 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently numa_clear_node removes both cpu information from the NUMA
node cpumap as well as the NUMA node id from the cpu. Similarly
numa_store_cpu_info updates both percpu nodeid and NUMA cpumap.
However we need to retain the numa node id for the cpu and only remove
the cpu information from the numa node cpumap during CPU hotplug out.
The same can be extended for hotplugging in the CPU.
This patch separates out numa_{add,remove}_cpu from numa_clear_node and
numa_store_cpu_info.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
arm64 requires break-before-make. Originally, before
setting up new pmd/pud entry for huge mapping, in few
cases, the modifying pmd/pud entry was still valid
and pointing to next level page table as we only
clear off leaf PTE in unmap leg.
a) This was resulting into stale entry in TLBs (as few
TLBs also cache intermediate mapping for performance
reasons)
b) Also, modifying pmd/pud was the only reference to
next level page table and it was getting lost without
freeing it. So, page leaks were happening.
Implement pud_free_pmd_page() and pmd_free_pte_page() to
enforce BBM and also free the leaking page tables.
Implementation requires,
1) Clearing off the current pud/pmd entry
2) Invalidation of TLB
3) Freeing of the un-used next level page tables
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull in core ioremap changes from -tip, since we depend on these for
re-enabling huge I/O mappings on arm64.
Signed-off-by: Will Deacon <will.deacon@arm.com>
When invalidating the instruction cache for a kernel mapping via
flush_icache_range(), it is also necessary to flush the pipeline for
other CPUs so that instructions fetched into the pipeline before the
I-cache invalidation are discarded. For example, if module 'foo' is
unloaded and then module 'bar' is loaded into the same area of memory,
a CPU could end up executing instructions from 'foo' when branching into
'bar' if these instructions were fetched into the pipeline before 'foo'
was unloaded.
Whilst this is highly unlikely to occur in practice, particularly as
any exception acts as a context-synchronizing operation, following the
letter of the architecture requires us to execute an ISB on each CPU
in order for the new instruction stream to be visible.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When rewriting swapper using nG mappings, we must performance cache
maintenance around each page table access in order to avoid coherency
problems with the host's cacheable alias under KVM. To ensure correct
ordering of the maintenance with respect to Device memory accesses made
with the Stage-1 MMU disabled, DMBs need to be added between the
maintenance and the corresponding memory access.
This patch adds a missing DMB between writing a new page table entry and
performing a clean+invalidate on the same line.
Fixes: f992b4dfd5 ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings")
Cc: <stable@vger.kernel.org> # 4.16.x-
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
dma_alloc_*() buffers might be exposed to userspace via mmap() call, so
they should be cleared on allocation. In case of IOMMU-based dma-mapping
implementation such buffer clearing was missing in the code path for
DMA_ATTR_FORCE_CONTIGUOUS flag handling, because dma_alloc_from_contiguous()
doesn't honor __GFP_ZERO flag. This patch fixes this issue. For more
information on clearing buffers allocated by dma_alloc_* functions,
see commit 6829e274a6 ("arm64: dma-mapping: always clear allocated
buffers").
Fixes: 44176bb38f ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>