Create the loader.bin bootable image file that can be loaded into
Kendryte K210 based boards using the kflash.py tool with the command:
kflash.py/kflash.py -t arch/riscv/boot/loader.bin
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
This patch adds a defconfig file to build No-MMU kernels meant for
boards based on the Kendryte K210 SoC.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Add a generic device tree for Kendryte K210 SoC based boards. This is
for now a very simple device tree describing the core elements of the
SoC. This is suitable (and tested) for the Kendryte KD233 development
board, the Sipeed MAIX M1 Dan Dock board and the Sipeed MAIXDUINO board.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
This patch selects drivers required for the Kendryte K210 SOC.
Since K210 SoC based boards do not provide a device tree, this patch
also enables the BUILTIN_DTB option.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Add support for the Kendryte K210 RISC-V SoC. For now, this support
only provides a simple sysctl driver allowing to setup the CPU and
uart clock. This support is enabled through the new Kconfig option
SOC_KENDRYTE and defines the config option CONFIG_K210_SYSCTL
to enable the K210 SoC sysctl driver compilation.
The sysctl driver also registers an early SoC initialization function
allowing enabling the general purpose use of the 2MB of SRAM normally
reserved for the SoC AI engine. This initialization function is
automatically called before the dt early initialization using the flat
dt root node compatible property matching the value "kendryte,k210".
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Palmer: Add missing endmenu in Kconfig.socs]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Add a mechanism for early SoC initialization for platforms that need
additional hardware initialization not possible through the regular
device tree and drivers mechanism. With this, a SoC specific
initialization function can be called very early, before DTB parsing
is done by parse_dtb() in Linux RISC-V kernel setup code.
This can be very useful for early hardware initialization for No-MMU
kernels booted directly in M-mode because it is quite likely that no
other booting stage exist prior to the No-MMU kernel.
Example use of a SoC early initialization is as follows:
static void vendor_abc_early_init(const void *fdt)
{
/*
* some early init code here that can use simple matches
* against the flat device tree file.
*/
}
SOC_EARLY_INIT_DECLARE("vendor,abc", abc_early_init);
This early initialization function is executed only if the flat device
tree for the board has a 'compatible = "vendor,abc"' entry;
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Add handlers for unaligned load and store traps that may be generated
by applications. Code heavily inspired from the OpenSBI project.
Handling of the unaligned access traps is suitable for applications
compiled with or without compressed instructions and is independent of
the kernel CONFIG_RISCV_ISA_C option value.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The compiler (GCC) does not like the situation, where there is inline
assembly block that clobbers all available machine registers in the
middle of the function. This situation can be found in function
svm_vcpu_run in file kvm/svm.c and results in many register spills and
fills to/from stack frame.
This patch fixes the issue with the same approach as was done for
VMX some time ago. The big inline assembly is moved to a separate
assembly .S file, taking into account all ABI requirements.
There are two main benefits of the above approach:
* elimination of several register spills and fills to/from stack
frame, and consequently smaller function .text size. The binary size
of svm_vcpu_run is lowered from 2019 to 1626 bytes.
* more efficient access to a register save array. Currently, register
save array is accessed as:
7b00: 48 8b 98 28 02 00 00 mov 0x228(%rax),%rbx
7b07: 48 8b 88 18 02 00 00 mov 0x218(%rax),%rcx
7b0e: 48 8b 90 20 02 00 00 mov 0x220(%rax),%rdx
and passing ia pointer to a register array as an argument to a function one gets:
12: 48 8b 48 08 mov 0x8(%rax),%rcx
16: 48 8b 50 10 mov 0x10(%rax),%rdx
1a: 48 8b 58 18 mov 0x18(%rax),%rbx
As a result, the total size, considering that the new function size is 229
bytes, gets lowered by 164 bytes.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit ebb37cf3ff.
That commit does not play well with soft-masked irq state
manipulations in idle, interrupt replay, and possibly others due to
tracing code sometimes using irq_work_queue (e.g., in
trace_hardirqs_on()). That can cause PACA_IRQ_DEC to become set when
it is not expected, and be ignored or cleared or cause warnings.
The net result seems to be missing an irq_work until the next timer
interrupt in the worst case which is usually not going to be noticed,
however it could be a long time if the tick is disabled, which is
against the spirit of irq_work and might cause real problems.
The idea is still solid, but it would need more work. It's not really
clear if it would be worth added complexity, so revert this for
now (not a straight revert, but replace with a comment explaining why
we might see interrupts happening, and gives git blame something to
find).
Fixes: ebb37cf3ff ("powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200402120401.1115883-1-npiggin@gmail.com
For the memory size ( > 512MB, < 1GB), the MSA setting is:
- SSEG0: PHY_START , PHY_START + 512MB
- SSEG1: PHY_START + 512MB, PHY_START + 1GB
But the real memory is no more than 1GB, there is a gap between the
end size of memory and border of 1GB. CPU could speculatively
execute to that gap and if the gap of the bus couldn't respond to
the CPU request, then the crash will happen.
Now make the setting with:
- SSEG0: PHY_START , PHY_START + 512MB (no change)
- SSEG1: Disabled (We use highmem to use the memory of 512MB~1GB)
We also deprecated zhole_szie[] settings, it's only used by arm
style CPUs. All memory gap should use Reserved setting of dts in
csky system.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
This patch adds support for uprobes on csky architecture.
Just like kprobe, it support single-step and simulate instructions.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
This patch enable kprobes, kretprobes, ftrace interface. It utilized
software breakpoint and single step debug exceptions, instructions
simulation on csky.
We use USR_BKPT replace origin instruction, and the kprobe handler
prepares an excutable memory slot for out-of-line execution with a
copy of the original instruction being probed. Most of instructions
could be executed by single-step, but some instructions need origin
pc value to execute and we need software simulate these instructions.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach in
the platform-memory subsystem before the platform will consider them
power-fail protected.
- Fixup some flexible-array declarations.
- Promote numa_map_to_online_node() to a cross-kernel generic facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
Unlike normal memory ("memory" compatible type in the FDT), the
persistent memory ("ibm,pmemory" in the FDT) can be mapped anywhere in
the guest physical space and it can be used for DMA.
In order to maintain 1:1 mapping via the huge DMA window, we need to
know the maximum physical address at the time of the window setup. So
far we've been looking at "memory" nodes but "ibm,pmemory" does not
have fixed addresses and the persistent memory may be mapped
afterwards.
Since the persistent memory is still backed with page structs, use
MAX_PHYSMEM_BITS as the upper limit.
This effectively disables huge DMA window in LPAR under pHyp if
persistent memory is present but this is the best we can do for the
moment.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200331012338.23773-1-aik@ozlabs.ru
sparc32 is the last platform making dynamic decisions in
get_arch_dma_ops based on the bus passed in. Instead set the
iommu dma_ops at iommu probing and propagate them in
of_propagate_archdata, falling back to the NULL ops for the
direct mapping in the Leon or PCI case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull devicetree updates from Rob Herring:
- Unit test for overlays with GPIO hogs
- Improve dma-ranges parsing to handle dma-ranges with multiple entries
- Update dtc to upstream version v1.6.0-2-g87a656ae5ff9
- Improve overlay error reporting
- Device link support for power-domains and hwlocks bindings
- Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
ITE Tech
- Add deprecated Marvell vendor prefix 'mrvl'
- A bunch of binding conversions to DT schema continues. Of note, the
common serial and USB connector bindings are converted.
- Add more Arm CPU compatibles
- Drop Mark Rutland as DT maintainer :(
* tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (106 commits)
MAINTAINERS: drop an old reference to stm32 pwm timers doc
MAINTAINERS: dt: update etnaviv file reference
dt-bindings: usb: dwc2: fix bindings for amlogic, meson-gxbb-usb
dt-bindings: uniphier-system-bus: fix warning in the example
dt-bindings: display: meson-vpu: fix indentation of reg-names' "items"
dt-bindings: iio: Fix adi, ltc2983 uint64-matrix schema constraints
dt-bindings: power: Fix example for power-domain
dt-bindings: arm: Add some constraints for PSCI nodes
of: some unittest overlays not untracked
of: gpio unittest kfree() wrong object
dt-bindings: phy: convert phy-rockchip-inno-usb2 bindings to yaml
dt-bindings: serial: sh-sci: Convert to json-schema
dt-bindings: serial: Document serialN aliases
dt-bindings: thermal: tsens: Set 'additionalProperties: false'
dt-bindings: thermal: tsens: Fix nvmem-cell-names schema
dt-bindings: vendor-prefixes: Add Beacon vendor prefix
dt-bindings: vendor-prefixes: Add Topwise
of: of_private.h: Replace zero-length array with flexible-array member
docs: dt: fix a broken reference to input.yaml
docs: dt: fix references to ap806-system-controller.txt
...
Pull SCSI updates from James Bottomley:
"This series has a huge amount of churn because it pulls in Mauro's doc
update changing all our txt files to rst ones.
Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
some other minor updates.
The major core change is Hannes moving functions out of the aacraid
driver and into the core"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
scsi: ufs: Do not rely on prefetched data
scsi: dc395x: remove dc395x_bios_param
scsi: libiscsi: Fix error count for active session
scsi: hpsa: correct race condition in offload enabled
scsi: message: fusion: Replace zero-length array with flexible-array member
scsi: qedi: Add PCI shutdown handler support
scsi: qedi: Add MFW error recovery process
scsi: ufs: Enable block layer runtime PM for well-known logical units
scsi: ufs-qcom: Override devfreq parameters
scsi: ufshcd: Let vendor override devfreq parameters
scsi: ufshcd: Update the set frequency to devfreq
scsi: ufs: Resume ufs host before accessing ufs device
scsi: ufs-mediatek: customize the delay for enabling host
scsi: ufs: make HCE polling more compact to improve initialization latency
scsi: ufs: allow custom delay prior to host enabling
scsi: ufs-mediatek: use common delay function
scsi: ufs: introduce common and flexible delay function
scsi: ufs: use an enum for host capabilities
scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
...
Pull MTD updates from Miquel Raynal:
"MTD core changes:
- Fix issue where write_cached_data() fails but write() still returns
success
- maps: sa1100-flash: Replace zero-length array with flexible-array
member
- phram: Fix a double free issue in error path
- Convert fallthrough comments into statements
- MAINTAINERS: Add the IRC channel to the MTD related subsystems
Raw NAND core changes:
- Add support for manufacturer specific suspend/resume operation
- Add support for manufacturer specific lock/unlock operation
- Replace zero-length array with flexible-array member
- Fix a typo ("manufecturer")
- Ensure nand_soft_waitrdy wait period is enough
Raw NAND controller driver changes:
- Brcmnand:
* Add support for flash-edu for dma transfers (+ bindings)
- Cadence:
* Reinit completion before executing a new command
* Change bad block marker size
* Fix the calculation of the avaialble OOB size
* Get meta data size from registers
- Qualcom:
* Use dma_request_chan() instead dma_request_slave_channel()
* Release resources on failure within qcom_nandc_alloc()
- Allwinner:
* Use dma_request_chan() instead dma_request_slave_channel()
- Marvell:
* Use dma_request_chan() instead dma_request_slave_channel()
* Release DMA channel on error
- Freescale:
* Use dma_request_chan() instead dma_request_slave_channel()
- Macronix:
* Add support for Macronix NAND randomizer (+ bindings)
- Ams-delta:
* Rename structures and functions to gpio_nand*
* Make the driver custom I/O ready
* Drop useless local variable
* Support custom driver initialisation
* Add module device tables
* Handle more GPIO pins as optional
* Make read pulses optional
* Don't hardcode read/write pulse widths
* Push inversion handling to gpiolib
* Enable OF partition info support
* Drop board specific partition info
* Use struct gpio_nand_platdata
* Write protect device during probe
- Ingenic:
* Use devm_platform_ioremap_resource()
* Add dependency on MIPS || COMPILE_TEST
- Denali:
* Deassert write protect pin
- ST:
* Use dma_request_chan() instead dma_request_slave_channel()
Raw NAND chip driver changes:
- Toshiba:
* Support reading the number of bitflips for BENAND (Built-in ECC NAND)
- Macronix:
* Add support for deep power down mode
* Add support for block protection
SPI-NAND core changes:
- Do not erase the block before writing a bad block marker
- Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
- Stop using spinand->oobbuf for buffering bad block markers
- Rework detect procedure for different READ_ID operation
SPI-NAND driver changes:
- Toshiba:
* Support for new Kioxia Serial NAND
* Rename function name to change suffix and prefix (8Gbit)
* Add comment about Kioxia ID
- Micron:
* Add new Micron SPI NAND devices with multiple dies
* Add M70A series Micron SPI NAND devices
* identify SPI NAND device with Continuous Read mode
* Add new Micron SPI NAND devices
* Describe the SPI NAND device MT29F2G01ABAGD
* Generalize the OOB layout structure and function names
SPI NOR core changes:
- Move all the manufacturer specific quirks/code out of the core, to
make the core logic more readable and thus ease maintenance.
- Move the SFDP logic out of the core, it provides a better
separation between the SFDP parsing and core logic.
- Trim what is exposed in spi-nor.h. The SPI NOR controllers drivers
must not be able to use structures that are meant just for the SPI
NOR core.
- Use the spi-mem direct mapping API to let advanced controllers
optimize the read/write operations when they support direct
mapping.
- Add generic formula for the Status Register block protection
handling. It fixes some long standing locking limitations and eases
the addition of the 4bit block protection support.
- Add block protection support for flashes with 4 block protection
bits in the Status Register.
SPI NOR controller drivers changes:
- The mtk-quadspi driver is replaced by the new spi-mem spi-mtk-nor
driver.
- Merge tag 'mtk-mtd-spi-move' into spi-nor/next to avoid conflicts.
HyperBus changes:
- Print error msg when compatible is wrong or missing
- Move mapping of direct access window from core to individual
drivers"
* tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (103 commits)
mtd: Convert fallthrough comments into statements
mtd: rawnand: toshiba: Support reading the number of bitflips for BENAND (Built-in ECC NAND)
MAINTAINERS: Add the IRC channel to the MTD related subsystems
mtd: Fix issue where write_cached_data() fails but write() still returns success
mtd: maps: sa1100-flash: Replace zero-length array with flexible-array member
mtd: phram: fix a double free issue in error path
mtd: spinand: toshiba: Support for new Kioxia Serial NAND
mtd: spinand: toshiba: Rename function name to change suffix and prefix (8Gbit)
mtd: rawnand: macronix: Add support for deep power down mode
mtd: rawnand: Add support for manufacturer specific suspend/resume operation
mtd: spi-nor: Enable locking for n25q512ax3/n25q512a
mtd: spi-nor: Add SR 4bit block protection support
mtd: spi-nor: Add generic formula for SR block protection handling
mtd: spi-nor: Set all BP bits to one when lock_len == mtd->size
mtd: spi-nor: controllers: aspeed-smc: Replace zero-length array with flexible-array member
mtd: spi-nor: Clear WEL bit when erase or program errors occur
MAINTAINERS: update entry after SPI NOR controller move
mtd: spi-nor: Trim what is exposed in spi-nor.h
mtd: spi-nor: Drop the MFR definitions
mtd: spi-nor: Get rid of the now empty spi_nor_ids[] table
...
The BPF JIT fails to build for kernels configured to !MMU. Without an
MMU, the BPF JIT does not make much sense, therefore this patch
disables the JIT for nommu builds.
This was reported by the kbuild test robot:
All errors (new ones prefixed by >>):
arch/riscv/net/bpf_jit_comp64.c: In function 'bpf_jit_alloc_exec':
>> arch/riscv/net/bpf_jit_comp64.c:1094:47: error: 'BPF_JIT_REGION_START' undeclared (first use in this function)
1094 | return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
| ^~~~~~~~~~~~~~~~~~~~
arch/riscv/net/bpf_jit_comp64.c:1094:47: note: each undeclared identifier is reported only once for each function it appears in
>> arch/riscv/net/bpf_jit_comp64.c:1095:9: error: 'BPF_JIT_REGION_END' undeclared (first use in this function)
1095 | BPF_JIT_REGION_END, GFP_KERNEL,
| ^~~~~~~~~~~~~~~~~~
arch/riscv/net/bpf_jit_comp64.c:1098:1: warning: control reaches end of non-void function [-Wreturn-type]
1098 | }
| ^
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Luke Nelson <luke.r.nels@gmail.com>
Link: https://lore.kernel.org/bpf/20200331101046.23252-1-bjorn.topel@gmail.com
Pull kvm updates from Paolo Bonzini:
"ARM:
- GICv4.1 support
- 32bit host removal
PPC:
- secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
- allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
- New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require
bulk modification of the page tables.
- Initial work on making nested SVM event injection more similar to
VMX, and less buggy.
- Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in
function names which occasionally means eptp, KVM too has
standardized on "pgd".
- A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
- Some removal of pointer chasing from kvm_x86_ops, which will also
be switched to static calls as soon as they are available.
- New Tigerlake CPUID features.
- More bugfixes, optimizations and cleanups.
Generic:
- selftests: cleanups, new MMU notifier stress test, steal-time test
- CSV output for kvm_stat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
x86/kvm: fix a missing-prototypes "vmread_error"
KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: SVM: Annotate svm_x86_ops as __initdata
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: Pass kvm_init()'s opaque param to additional arch funcs
s390/gmap: return proper error code on ksm unsharing
KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
KVM: Fix out of range accesses to memslots
KVM: X86: Micro-optimize IPI fastpath delay
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: PPC: Book3S HV: Add a capability for enabling secure guests
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
...
Pull x86 fix from Ingo Molnar:
"A single fix addressing Sparse warnings. <asm/bitops.h> is changed
non-trivially to avoid the warnings, but generated code is not
supposed to be affected"
* tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix bitops.h warning with a moved cast
Pull integrity updates from Mimi Zohar:
"Just a couple of updates for linux-5.7:
- A new Kconfig option to enable IMA architecture specific runtime
policy rules needed for secure and/or trusted boot, as requested.
- Some message cleanup (eg. pr_fmt, additional error messages)"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add a new CONFIG for loading arch-specific policies
integrity: Remove duplicate pr_fmt definitions
IMA: Add log statements for failure conditions
IMA: Update KBUILD_MODNAME for IMA files to ima
Merge updates from Andrew Morton:
"A large amount of MM, plenty more to come.
Subsystems affected by this patch series:
- tools
- kthread
- kbuild
- scripts
- ocfs2
- vfs
- mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
hugetlbfs, hugetlb"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
selftests/vm: fix map_hugetlb length used for testing read and write
mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
mm/hugetlb.c: clean code by removing unnecessary initialization
hugetlb_cgroup: add hugetlb_cgroup reservation docs
hugetlb_cgroup: add hugetlb_cgroup reservation tests
hugetlb: support file_region coalescing again
hugetlb_cgroup: support noreserve mappings
hugetlb_cgroup: add accounting for shared mappings
hugetlb: disable region_add file_region coalescing
hugetlb_cgroup: add reservation accounting for private mappings
mm/hugetlb_cgroup: fix hugetlb_cgroup migration
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
hugetlb_cgroup: add hugetlb_cgroup reservation counter
hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
mm/memblock.c: remove redundant assignment to variable max_addr
mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
...
- Fix memory leak in hv probe path (Dexuan Cui)
- Add support for Hyper-V protocol 1.3 (Long Li)
- Replace zero-length array with flexible-array member (Gustavo A. R.
Silva)
- Move hypercall definitions to <asm/hyperv-tlfs.h> (Boqun Feng)
- Move retarget definitions to <asm/hyperv-tlfs.h> and make them packed
(Boqun Feng)
- Add struct hv_msi_entry and hv_set_msi_entry_from_desc() to prepare for
future virtual PCI on non-x86 (Boqun Feng)
* remotes/lorenzo/pci/hv:
PCI: hv: Introduce hv_msi_entry
PCI: hv: Move retarget related structures into tlfs header
PCI: hv: Move hypercall related definitions into tlfs header
PCI: hv: Replace zero-length array with flexible-array member
PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2
PCI: hv: Decouple the func definition in hv_dr_state from VSP message
PCI: hv: Add missing kfree(hbus) in hv_pci_probe()'s error handling path
PCI: hv: Remove unnecessary type casting from kzalloc
The commit 842f4be958 ("KVM: VMX: Add a trampoline to fix VMREAD error
handling") removed the declaration of vmread_error() causes a W=1 build
failure with KVM_WERROR=y. Fix it by adding it back.
arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes]
asmlinkage void vmread_error(unsigned long field, bool fault)
^~~~~~~~~~~~
Signed-off-by: Qian Cai <cai@lca.pw>
Message-Id: <20200402153955.1695-1-cai@lca.pw>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull exec/proc updates from Eric Biederman:
"This contains two significant pieces of work: the work to sort out
proc_flush_task, and the work to solve a deadlock between strace and
exec.
Fixing proc_flush_task so that it no longer requires a persistent
mount makes improvements to proc possible. The removal of the
persistent mount solves an old regression that that caused the hidepid
mount option to only work on remount not on mount. The regression was
found and reported by the Android folks. This further allows Alexey
Gladkov's work making proc mount options specific to an individual
mount of proc to move forward.
The work on exec starts solving a long standing issue with exec that
it takes mutexes of blocking userspace applications, which makes exec
extremely deadlock prone. For the moment this adds a second mutex with
a narrower scope that handles all of the easy cases. Which makes the
tricky cases easy to spot. With a little luck the code to solve those
deadlocks will be ready by next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
signal: Extend exec_id to 64bits
pidfd: Use new infrastructure to fix deadlocks in execve
perf: Use new infrastructure to fix deadlocks in execve
proc: io_accounting: Use new infrastructure to fix deadlocks in execve
proc: Use new infrastructure to fix deadlocks in execve
kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
kernel: doc: remove outdated comment cred.c
mm: docs: Fix a comment in process_vm_rw_core
selftests/ptrace: add test cases for dead-locks
exec: Fix a deadlock in strace
exec: Add exec_update_mutex to replace cred_guard_mutex
exec: Move exec_mmap right after de_thread in flush_old_exec
exec: Move cleanup of posix timers on exec out of de_thread
exec: Factor unshare_sighand out of de_thread and call it separately
exec: Only compute current once in flush_old_exec
pid: Improve the comment about waiting in zap_pid_ns_processes
proc: Remove the now unnecessary internal mount of proc
uml: Create a private mount of proc for mconsole
uml: Don't consult current to find the proc_mnt in mconsole_proc
proc: Use a list of inodes to flush from proc
...
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page. However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.
This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY. It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event. Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.
Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):
- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try
- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try
- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all
- !ALLOW_RETRY and TRIED: this is forbidden and should never be used
In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths. One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.
This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault. It might also benefit
other potential users who will have similar requirement like userfault
write-protection.
GUP code is not touched yet and will be covered in follow up patch.
Please read the thread below for more information.
[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change a header to mandatory-y if both of the following are met:
[1] At least one architecture (except um) specifies it as generic-y in
arch/*/include/asm/Kbuild
[2] Every architecture (except um) either has its own implementation
(arch/*/include/asm/*.h) or specifies it as generic-y in
arch/*/include/asm/Kbuild
This commit was generated by the following shell script.
----------------------------------->8-----------------------------------
arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')
tmpfile=$(mktemp)
grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile
find arch -path 'arch/*/include/asm/Kbuild' |
xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
mandatory=yes
for arch in $arches
do
if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
! [ -f arch/$arch/include/asm/$header ]; then
mandatory=no
break
fi
done
if [ "$mandatory" = yes ]; then
echo "mandatory-y += $header" >> $tmpfile
for arch in $arches
do
sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
done
fi
done
sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild
LANG=C sort $tmpfile >> include/asm-generic/Kbuild
----------------------------------->8-----------------------------------
One obvious benefit is the diff stat:
25 files changed, 52 insertions(+), 557 deletions(-)
It is tedious to list generic-y for each arch that needs it.
So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.
See the following commits:
def3f7cefea1b39bae16
It is tedious to convert headers one by one, so I processed by a shell
script.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge the 32bit and 64bit version.
Halve the check constants on 32bit.
Use STACK_TOP since it is defined.
Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.
This removes a page from the valid 32bit area on 64bit:
#define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
#define STACK_TOP_USER32 TASK_SIZE_USER32
Change return value to bool. It is inverted by users anyway.
Change to invalid_user_sp to avoid inverting the return value twice.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8e40fc0737fb28ad08b198552dee7cac1c5ce2.1584699455.git.msuchanek@suse.de