This is part of a larger series that aims at getting rid of the
copy_thread()/copy_thread_tls() split that makes the process creation
codepaths in the kernel more convoluted and error-prone than they need
to be.
It also unblocks implementing clone3() on architectures not support
copy_thread_tls(). Any architecture that wants to implement clone3()
will need to select HAVE_COPY_THREAD_TLS and thus need to implement
copy_thread_tls(). So both goals are connected but independently
beneficial.
HAVE_COPY_THREAD_TLS means that a given architecture supports
CLONE_SETTLS and not setting it should usually mean that the
architectures doesn't implement it but that's not how things are. In
fact all architectures support CLONE_TLS it's just that they don't
follow the calling convention that HAVE_COPY_THREAD_TLS implies. That
means all architectures can be switched over to select
HAVE_COPY_THREAD_TLS. Once that is done we can remove that macro (yay,
less code), the unnecessary do_fork() export in kernel/fork.c, and also
rename copy_thread_tls() back to copy_thread(). At this point
copy_thread() becomes the main architecture specific part of process
creation but it will be the same layout and calling convention for all
architectures. (Once that is done we can probably cleanup each
copy_thread() function even more but that's for the future.)
Since sparc does support CLONE_SETTLS there's no reason to not select
HAVE_COPY_THREAD_TLS. This brings us one step closer to getting rid of
the copy_thread()/copy_thread_tls() split we still have and ultimately
the HAVE_COPY_THREAD_TLS define in general. A lot of architectures have
already converted and sparc is one of the few hat haven't yet. This also
unblocks implementing the clone3() syscall on sparc which I will follow
up later (if no one gets there before me). Once that is done we can get
of another ARCH_WANTS_* macro.
This patch just switches sparc64 over to HAVE_COPY_THREAD_TLS but not
sparc32 which will be done in the next patch. Once Any architecture that
supports HAVE_COPY_THREAD_TLS cannot call the do_fork() helper anymore.
This is fine and intended since it should be removed in favor of the
new, cleaner _do_fork() calling convention based on struct
kernel_clone_args. In fact, most architectures have already switched.
With this patch, sparc joins the other arches which can't use the
fork(), vfork(), clone(), clone3() syscalls directly and who follow the
new process creation calling convention that is based on struct
kernel_clone_args which we introduced a while back. This means less
custom assembly in the architectures entry path to set up the registers
before calling into the process creation helper and it is easier to to
support new features without having to adapt calling conventions. It
also unifies all process creation paths between fork(), vfork(),
clone(), and clone3(). (We can't fix the ABI nightmare that legacy
clone() is but we can prevent stuff like this happening in the future.)
Note that sparc can't easily call into the syscalls directly because of
its return value conventions when a new process is created which
needs to clobber the UREG_I1 register in copy_thread{_tls()} and it
needs to restore it if process creation fails. That's not a big deal
since the new process creation calling convention makes things simpler.
This removes sparc_do_fork() and replaces it with 3 clean helpers,
sparc_fork(), sparc_vfork(), and sparc_clone(). That means a little more
C code until the next patch unifies sparc 32bit and sparc64. It has the
advantage that we can remove quite a bit of assembler and it makes the
whole syscall.S process creation bits easier to read.
The follow-up patch will remove the custom sparc_do_fork() helper for
32bi sparc and move sparc_fork(), sparc_vfork(), and sparc_clone() into
a common process.c file. This allows us to remove quite a bit of
custom assembly form 32bit sparc's entry.S file too and allows to remove
even more code because now all helpers are shared between 32bit sparc
and sparc64 instead of having to maintain two separate sparc_do_fork()
implementations.
For some more context, please see:
commit 606e9ad200
Merge: ac61145a72457677c70c
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat Jan 11 15:33:48 2020 -0800
Merge tag 'clone3-tls-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fixes from Christian Brauner:
"This contains a series of patches to fix CLONE_SETTLS when used with
clone3().
The clone3() syscall passes the tls argument through struct clone_args
instead of a register. This means, all architectures that do not
implement copy_thread_tls() but still support CLONE_SETTLS via
copy_thread() expecting the tls to be located in a register argument
based on clone() are currently unfortunately broken. Their tls value
will be garbage.
The patch series fixes this on all architectures that currently define
__ARCH_WANT_SYS_CLONE3. It also adds a compile-time check to ensure
that any architecture that enables clone3() in the future is forced to
also implement copy_thread_tls().
My ultimate goal is to get rid of the copy_thread()/copy_thread_tls()
split and just have copy_thread_tls() at some point in the not too
distant future (Maybe even renaming copy_thread_tls() back to simply
copy_thread() once the old function is ripped from all arches). This
is dependent now on all arches supporting clone3().
While all relevant arches do that now there are still four missing:
ia64, m68k, sh and sparc. They have the system call reserved, but not
implemented. Once they all implement clone3() we can get rid of
ARCH_WANT_SYS_CLONE3 and HAVE_COPY_THREAD_TLS.
Note that in the meantime, m68k has already switched to the new calling
convention.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: sparclinux@vger.kernel.org
See: d95b56c77e ("openrisc: Cleanup copy_thread_tls docs and comments")
See: 0b9f386c4b ("csky: Implement copy_thread_tls")
Link: https://lore.kernel.org/r/20200512171527.570109-2-christian.brauner@ubuntu.com
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.
This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.
There are a variety of indentation styles found.
a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'
In order to convert all of them to 1 tab + 'help', I ran the
following commend:
$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
"mm: consolidate pte_index() and pte_offset_*() definitions" was supposed
to remove arch/sparc/mm/srmmu.c:pte_offset_kernel().
Fixes: 974b9b2c68 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The powerpc 32-bit implementation of pgtable has nice shortcuts for
accessing kernel PMD and PTE for a given virtual address. Make these
helpers available for all architectures.
[rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()]
Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org
[akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the log-level of show_stack() depends on a platform
realization. It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on a platform or
user).
Furthermore, it forces the logic decision from user to an architecture
side. In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages. And in
result it not only may print unwanted messages from other CPUs, but also
omit printing at all in the unlucky case where the printk() was deferred.
Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier
approach than introducing more printk buffers. Also, it will consolidate
printings with headers.
Introduce show_stack_loglvl(), that eventually will substitute
show_stack().
[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/T/#u
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20200418201944.482088-35-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge still more updates from Andrew Morton:
"Various trees. Mainly those parts of MM whose linux-next dependents
are now merged. I'm still sitting on ~160 patches which await merges
from -next.
Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug,
panic, lib, sysctl, mm/gup, mm/pagemap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (52 commits)
doc: cgroup: update note about conditions when oom killer is invoked
module: move the set_fs hack for flush_icache_range to m68k
nommu: use flush_icache_user_range in brk and mmap
binfmt_flat: use flush_icache_user_range
exec: use flush_icache_user_range in read_code
exec: only build read_code when needed
m68k: implement flush_icache_user_range
arm: rename flush_cache_user_range to flush_icache_user_range
xtensa: implement flush_icache_user_range
sh: implement flush_icache_user_range
asm-generic: add a flush_icache_user_range stub
mm: rename flush_icache_user_range to flush_icache_user_page
arm,sparc,unicore32: remove flush_icache_user_range
riscv: use asm-generic/cacheflush.h
powerpc: use asm-generic/cacheflush.h
openrisc: use asm-generic/cacheflush.h
m68knommu: use asm-generic/cacheflush.h
microblaze: use asm-generic/cacheflush.h
ia64: use asm-generic/cacheflush.h
hexagon: use asm-generic/cacheflush.h
...
Pull sparc updates from David Miller:
- Rework the sparc32 page tables so that READ_ONCE(*pmd), as done by
generic code, operates on a word sized element. From Will Deacon.
- Some scnprintf() conversions, from Chen Zhou.
- A pin_user_pages() conversion from John Hubbard.
- Several 32-bit ptrace register handling fixes and such from Al Viro.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
fix a braino in "sparc32: fix register window handling in genregs32_[gs]et()"
sparc32: mm: Only call ctor()/dtor() functions for first and last user
sparc32: mm: Disable SPLIT_PTLOCK_CPUS
sparc32: mm: Don't try to free page-table pages if ctor() fails
sparc32: register memory occupied by kernel as memblock.memory
sparc: remove unused header file nfs_fs.h
sparc32: fix register window handling in genregs32_[gs]et()
sparc64: fix misuses of access_process_vm() in genregs32_[sg]et()
oradax: convert get_user_pages() --> pin_user_pages()
sparc: use scnprintf() in show_pciobppath_attr() in vio.c
sparc: use scnprintf() in show_pciobppath_attr() in pci.c
tty: vcc: Fix error return code in vcc_probe()
sparc32: mm: Reduce allocation size for PMD and PTE tables
sparc32: mm: Change pgtable_t type to pte_t * instead of struct page *
sparc32: mm: Restructure sparc32 MMU page-table layout
sparc32: mm: Fix argument checking in __srmmu_get_nocache()
sparc64: Replace zero-length array with flexible-array
sparc: mm: return true,false in kern_addr_valid()
Pull tty/serial driver updates from Greg KH:
"Here is the tty and serial driver updates for 5.8-rc1
Nothing huge at all, just a lot of little serial driver fixes, updates
for new devices and features, and other small things. Full details are
in the shortlog.
All of these have been in linux-next with no issues for a while"
* tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (67 commits)
tty: serial: qcom_geni_serial: Add 51.2MHz frequency support
tty: serial: imx: clear Ageing Timer Interrupt in handler
serial: 8250_fintek: Add F81966 Support
sc16is7xx: Add flag to activate IrDA mode
dt-bindings: sc16is7xx: Add flag to activate IrDA mode
serial: 8250: Support rs485 bus termination GPIO
serial: 8520_port: Fix function param documentation
dt-bindings: serial: Add binding for rs485 bus termination GPIO
vt: keyboard: avoid signed integer overflow in k_ascii
serial: 8250: Enable 16550A variants by default on non-x86
tty: hvc_console, fix crashes on parallel open/close
serial: imx: Initialize lock for non-registered console
sc16is7xx: Read the LSR register for basic device presence check
sc16is7xx: Allow sharing the IRQ line
sc16is7xx: Use threaded IRQ
sc16is7xx: Always use falling edge IRQ
tty: n_gsm: Fix bogus i++ in gsm_data_kick
tty: n_gsm: Remove unnecessary test in gsm_print_packet()
serial: stm32: add no_console_suspend support
tty: serial: fsl_lpuart: Use __maybe_unused instead of #if CONFIG_PM_SLEEP
...
lost npc in PTRACE_SETREGSET, breaking PTRACE_SETREGS as well
Fixes: cf51e129b9 "sparc32: fix register window handling in genregs32_[gs]et()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Most architectures define kmap_prot to be PAGE_KERNEL.
Let sparc and xtensa define there own and define PAGE_KERNEL as the
default if not overridden.
[akpm@linux-foundation.org: coding style fixes]
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-16-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every single architecture (including !CONFIG_HIGHMEM) calls...
pagefault_enable();
preempt_enable();
... before returning from __kunmap_atomic(). Lift this code into the
kunmap_atomic() macro.
While we are at it rename __kunmap_atomic() to kunmap_atomic_high() to
be consistent.
[ira.weiny@intel.com: don't enable pagefault/preempt twice]
Link: http://lkml.kernel.org/r/20200518184843.3029640-1-ira.weiny@intel.com
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20200507150004.1423069-8-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every arch has the same code to ensure atomic operations and a check for
!HIGHMEM page.
Remove the duplicate code by defining a core kmap_atomic() which only
calls the arch specific kmap_atomic_high() when the page is high memory.
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-7-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures do exactly the same thing for kunmap(); remove all the
duplicate definitions and lift the call to the core.
This also has the benefit of changing kmap_unmap() on a number of
architectures to be an inline call rather than an actual function.
[akpm@linux-foundation.org: fix CONFIG_HIGHMEM=n build on various architectures]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-5-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparc32 never registered the memory occupied by the kernel image with
memblock_add() and it only reserved this memory with meblock_reserve().
With openbios as system firmware, the memory occupied by the kernel is
reserved in openbios and removed from mem.available. The prom setup code
in the kernel uses mem.available to set up the memory banks and
essentially there is a hole for the memory occupied by the kernel image.
Later in bootmem_init() this memory is memblock_reserve()d.
Up until recently, memmap initialization would call __init_single_page()
for the pages in that hole, the free_low_memory_core_early() would mark
them as reserved and everything would be Ok.
After the change in memmap initialization introduced by the commit "mm:
memmap_init: iterate over memblock regions rather that check each PFN",
the hole is skipped and the page structs for it are not initialized. And
when they are passed from memblock to page allocator as reserved, the
latter gets confused.
Simply registering the memory occupied by the kernel with memblock_add()
resolves this issue.
Tested on qemu-system-sparc with Debian Etch [1] userspace.
[1] https://people.debian.org/~aurel32/qemu/sparc/debian_etch_sparc_small.qcow2
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20200517000050.GA87467@roeck-us.nlllllet/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SRMMU page-table allocator allocates multiple PTE tables per page,
since they are only 1K in size. However, this means that calls to
pgtable_pte_page_{ctor,dtor}() must be serialised and performed only by
the first and last page-table allocation for the page respectively.
Use the page reference count to track how many PTE tables we have
allocated for a given page returned by the SRMMU allocator and only
call the ctor()/dtor() functions for the first and last user respectively.
Cc: David S. Miller <davem@davemloft.net>
Fixes: 8c8f3156dd ("sparc32: mm: Reduce allocation size for PMD and PTE tables")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pages backing page-table allocations for SRMMU are allocated via
memblock as part of the "nocache" region initialisation during
srmmu_paging_init() and should not be freed even if a later call to
pgtable_pte_page_ctor() fails.
Remove the broken call to __free_page().
Cc: David S. Miller <davem@davemloft.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Fixes: 1ae9ae5f7d ("sparc: handle pgtable_page_ctor() fail")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc32 never registered the memory occupied by the kernel image with
memblock_add() and it only reserved this memory with meblock_reserve().
With openbios as system firmware, the memory occupied by the kernel is
reserved in openbios and removed from mem.available. The prom setup code in
the kernel uses mem.available to set up the memory banks and essentially
there is a hole for the memory occupied by the kernel image.
Later in bootmem_init() this memory is memblock_reserve()d.
Up until recently, memmap initialization would call __init_single_page()
for the pages in that hole, the free_low_memory_core_early() would mark
them as reserved and everything would be Ok.
After the change in memmap initialization introduced by the commit "mm:
memmap_init: iterate over memblock regions rather that check each PFN", the
hole is skipped and the page structs for it are not initialized. And when
they are passed from memblock to page allocator as reserved, the latter
gets confused.
Simply registering the memory occupied by the kernel with memblock_add()
resolves this issue.
Tested on qemu-system-sparc with Debian Etch [1] userspace.
[1] https://people.debian.org/~aurel32/qemu/sparc/debian_etch_sparc_small.qcow2
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200517000050.GA87467@roeck-us.nlllllet/
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:
- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"
- Floppy code cleanup series (Willy, Denis)
- Floppy contention fix (Jiri)
- Loop CONFIGURE support (Martijn)
- bcache fixes/improvements (Coly, Joe, Colin)
- q->queuedata cleanups (Christoph)
- Get rid of ioctl_by_bdev (Christoph, Stefan)
- md/raid5 allocation fixes (Coly)
- zero length array fixes (Gustavo)
- swim3 task state fix (Xu)"
* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_<level> uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...
Pull vfs updates from Al Viro:
"Assorted patches from Miklos.
An interesting part here is /proc/mounts stuff..."
The "/proc/mounts stuff" is using a cursor for keeeping the location
data while traversing the mount listing.
Also probably worth noting is the addition of faccessat2(), which takes
an additional set of flags to specify how the lookup is done
(AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH).
* 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add faccessat2 syscall
vfs: don't parse "silent" option
vfs: don't parse "posixacl" option
vfs: don't parse forbidden flags
statx: add mount_root
statx: add mount ID
statx: don't clear STATX_ATIME on SB_RDONLY
uapi: deprecate STATX_ALL
utimensat: AT_EMPTY_PATH support
vfs: split out access_override_creds()
proc/mounts: add cursor
aio: fix async fsync creds
vfs: allow unprivileged whiteout creation
Pull uaccess/csum updates from Al Viro:
"Regularize the sitation with uaccess checksum primitives:
- fold csum_partial_... into csum_and_copy_..._user()
- on x86 collapse several access_ok()/stac()/clac() into
user_access_begin()/user_access_end()"
* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
default csum_and_copy_to_user(): don't bother with access_ok()
take the dummy csum_and_copy_from_user() into net/checksum.h
arm: switch to csum_and_copy_from_user()
sh32: convert to csum_and_copy_from_user()
m68k: convert to csum_and_copy_from_user()
xtensa: switch to providing csum_and_copy_from_user()
sparc: switch to providing csum_and_copy_from_user()
parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
get rid of csum_partial_copy_to_user()