Most of the capabilities are common to both arm and arm64, but
we still need to handle the exceptions.
Introduce kvm_arch_dev_ioctl_check_extension, which both architectures
implement (in the 32bit case, it just returns 0).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Now that we have the necessary infrastructure to boot a hotplugged CPU
at any point in time, wire a CPU notifier that will perform the HYP
init for the incoming CPU.
Note that this depends on the platform code and/or firmware to boot the
incoming CPU with HYP mode enabled and return to the kernel by following
the normal boot path (HYP stub installed).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
There is no point in freeing HYP page tables differently from Stage-2.
They now have the same requirements, and should be dealt with the same way.
Promote unmap_stage2_range to be The One True Way, and get rid of a number
of nasty bugs in the process (good thing we never actually called free_hyp_pmds
before...).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
After the HYP page table rework, it is pretty easy to let the KVM
code provide its own idmap, rather than expecting the kernel to
provide it. It takes actually less code to do so.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
In order to be able to correctly profile what is happening on the
host, we need to be able to identify when we're running on the guest,
and log these events differently.
Perf offers a simple way to register callbacks into KVM. Mimic what
x86 does and enjoy being able to profile your KVM host.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Merging in fixes since there's a conflict in the omap4 clock tables caused by
it.
* fixes: (245 commits)
ARM: highbank: fix cache flush ordering for cpu hotplug
ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock
arm: mvebu: Fix the irq map function in SMP mode
Fix GE0/GE1 init on ix2-200 as GE0 has no PHY
ARM: S3C24XX: Fix interrupt pending register offset of the EINT controller
ARM: S3C24XX: Correct NR_IRQS definition for s3c2440
ARM i.MX6: Fix ldb_di clock selection
ARM: imx: provide twd clock lookup from device tree
ARM: imx35 Bugfix admux clock
ARM: clk-imx35: Bugfix iomux clock
+ Linux 3.9-rc6
Signed-off-by: Olof Johansson <olof@lixom.net>
Conflicts:
arch/arm/mach-omap2/cclock44xx_data.c
ARM processors with LPAE enabled use 3 levels of page tables, with an
entry in the top level (pgd) covering 1GB of virtual space. Because of
the branch relocation limitations on ARM, the loadable modules are
mapped 16MB below PAGE_OFFSET, making the corresponding 1GB pgd shared
between kernel modules and user space.
If free_pgtables() is called with the default ceiling 0,
free_pgd_range() (and subsequently called functions) also frees the page
table shared between user space and kernel modules (which is normally
handled by the ARM-specific pgd_free() function). This patch changes
defines the ARM USER_PGTABLES_CEILING to TASK_SIZE when CONFIG_ARM_LPAE
is enabled.
Note that the pgd_free() function already checks the presence of the
shared pmd page allocated by pgd_alloc() and frees it, though with
ceiling 0 this wasn't necessary.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org> # 3.3+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This provides helper methods to coordinate between CPUs coming down
and CPUs going up, as well as documentation on the used algorithms,
so that cluster teardown and setup
operations are not done for a cluster simultaneously.
For use in the power_down() implementation:
* __mcpm_cpu_going_down(unsigned int cluster, unsigned int cpu)
* __mcpm_outbound_enter_critical(unsigned int cluster)
* __mcpm_outbound_leave_critical(unsigned int cluster)
* __mcpm_cpu_down(unsigned int cluster, unsigned int cpu)
The power_up_setup() helper should do platform-specific setup in
preparation for turning the CPU on, such as invalidating local caches
or entering coherency. It must be assembler for now, since it must
run before the MMU can be switched on. It is passed the affinity level
for which initialization should be performed.
Because the mcpm_sync_struct content is looked-up and modified
with the cache enabled or disabled depending on the code path, it is
crucial to always ensure proper cache maintenance to update main memory
right away. The sync_cache_*() helpers are used to that end.
Also, in order to prevent a cached writer from interfering with an
adjacent non-cached writer, we ensure each state variable is located to
a separate cache line.
Thanks to Nicolas Pitre and Achin Gupta for the help with this
patch.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
This is the basic API used to handle the powering up/down of individual
CPUs in a (multi-)cluster system. The platform specific backend
implementation has the responsibility to also handle the cluster level
power as well when the first/last CPU in a cluster is brought up/down.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
CPUs in cluster based systems, such as big.LITTLE, have special needs
when entering the kernel due to a hotplug event, or when resuming from
a deep sleep mode.
This is vectorized so multiple CPUs can enter the kernel in parallel
without serialization.
The mcpm prefix stands for "multi cluster power management", however
this is usable on single cluster systems as well. Only the basic
structure is introduced here. This will be extended with later patches.
In order not to complexify things more than they currently have to,
the planned work to make runtime adjusted MPIDR based indexing and
dynamic memory allocation for cluster states is postponed to a later
cycle. The MAX_NR_CLUSTERS and MAX_CPUS_PER_CLUSTER static definitions
should be sufficient for those systems expected to be available in the
near future.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Algorithms used by the MCPM layer rely on state variables which are
accessed while the cache is either active or inactive, depending
on the code path and the active state.
This patch introduces generic cache maintenance helpers to provide the
necessary cache synchronization for such state variables to always hit
main memory in an ordered way.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Dave Martin <dave.martin@linaro.org>
Pull ARM fixes from Russell King:
"A set of fixes from various people - Will Deacon gets a prize for
removing code this time around. The biggest fix in this lot is
sorting out the ARM740T mess. The rest are relatively small fixes."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7699/1: sched_clock: Add more notrace to prevent recursion
ARM: 7698/1: perf: fix group validation when using enable_on_exec
ARM: 7697/1: hw_breakpoint: do not use __cpuinitdata for dbg_cpu_pm_nb
ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon
ARM: 7694/1: ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()
ARM: 7692/1: iop3xx: move IOP3XX_PERIPHERAL_VIRT_BASE
ARM: modules: don't export cpu_set_pte_ext when !MMU
ARM: mm: remove broken condition check for v4 flushing
ARM: mm: fix numerous hideous errors in proc-arm740.S
ARM: cache: remove ARMv3 support code
ARM: tlbflush: remove ARMv3 support
This patch adds the base support for the ARMv7-M
architecture. It consists of the corresponding arch/arm/mm/ files and
various #ifdef's around the kernel. Exception handling is implemented by
a subsequent patch.
[ukleinek: squash in some changes originating from commit
b5717ba (Cortex-M3: Add support for the Microcontroller Prototyping System)
from the v2.6.33-arm1 patch stack, port to post 3.6, drop zImage
support, drop reorganisation of pt_regs, assert CONFIG_CPU_V7M doesn't
leak into installed headers and a few cosmetic changes]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jonathan Austin <jonathan.austin@arm.com>
Tested-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
This is the 2nd part of ARM timer clean-ups for 3.10. This series has
the following changes:
- Add sched_clock selection logic to select the highest frequency clock
- Use full 64-bit arch timer counter for sched_clock
- Convert arch timer, sp804 and integrator-cp timers to CLKSRC_OF and
adapt all users to use clocksource_of_init
* tag 'clksrc-cleanup-for-3.10-part2' of git://sources.calxeda.com/kernel/linux:
devtree: add binding documentation for sp804
ARM: integrator-cp: convert use CLKSRC_OF for timer init
ARM: versatile: use OF init for sp804 timer
ARM: versatile: add versatile dtbs to dtbs target
ARM: vexpress: remove extra timer-sp control register clearing
ARM: dts: vexpress: disable CA9 core tile sp804 timer
ARM: vexpress: remove sp804 OF init
ARM: highbank: use OF init for sp804 timer
ARM: timer-sp: convert to use CLKSRC_OF init
OF: add empty of_device_is_available for !OF
ARM: convert arm/arm64 arch timer to use CLKSRC_OF init
ARM: make machine_desc->init_time default to clocksource_of_init
ARM: arch_timer: use full 64-bit counter for sched_clock
ARM: make sched_clock just call a function pointer
ARM: sched_clock: allow changing to higher frequency counter
Signed-off-by: Olof Johansson <olof@lixom.net>
This has a nasty set of conflicts with the exynos MCT code, which was
moved in a separate branch, and then fixed up when merged in, but still
conflicts a bit here. It should have been sorted out by this merge though.
Looks like our L_PTE_S2_RDWR definition is slightly wrong,
and is actually write only (see ARM ARM Table B3-9, Stage 2 control
of access permissions). Didn't make a difference for normal pages,
as we OR the flags together, but I'm still wondering how it worked
for Stage-2 mapped devices, such as the GIC.
Brown paper bag time, again.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
This adds CLKSRC_OF based init for sp804 timer. The clock initialization is
refactored to support retrieving the clock(s) from the DT.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
This converts sched_clock to simply a call to a function pointer in order
to allow overriding it. This will allow for use with 64-bit counters where
overflow handling is not needed.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Some boards are running with secure firmware running in TrustZone secure
world, which changes the way some things have to be initialized.
This patch adds an interface for platforms to specify available firmware
operations and call them.
A wrapper macro, call_firmware_op(), checks if the operation is provided
and calls it if so, otherwise returns -ENOSYS to allow fallback to
legacy operation..
By default no operations are provided.
Example of use:
In code using firmware ops:
__raw_writel(virt_to_phys(exynos4_secondary_startup),
CPU1_BOOT_REG);
/* Call Exynos specific smc call */
if (call_firmware_op(cpu_boot, cpu) == -ENOSYS)
cpu_boot_legacy(...); /* Try legacy way */
gic_raise_softirq(cpumask_of(cpu), 1);
In board-/platform-specific code:
static int platformX_do_idle(void)
{
/* tell platformX firmware to enter idle */
return 0;
}
static int platformX_cpu_boot(int i)
{
/* tell platformX firmware to boot CPU i */
return 0;
}
static const struct firmware_ops platformX_firmware_ops = {
.do_idle = exynos_do_idle,
.cpu_boot = exynos_cpu_boot,
/* other operations not available on platformX */
};
static void __init board_init_early(void)
{
register_firmware_ops(&platformX_firmware_ops);
}
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
From Michal Simek <michal.simek@xilinx.com>:
* 'zynq/clksrc/cleanup' of git://git.xilinx.com/linux-xlnx:
arm: zynq: Move timer to generic location
arm: zynq: Do not use xilinx specific function names
arm: zynq: Move timer to clocksource interface
arm: zynq: Use standard timer binding
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Many ARMv7 cores have hardware page table walkers that can read the L1
cache. This is discoverable from the ID_MMFR3 register, although this
can be expensive to access from the low-level set_pte functions and is a
pain to cache, particularly with multi-cluster systems.
A useful observation is that the multi-processing extensions for ARMv7
require coherent table walks, meaning that we can make use of ALT_SMP
patching in proc-v7-* to patch away the cache flush safely for these
cores.
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
commit 91d1aa43 (context_tracking: New context tracking susbsystem)
generalized parts of the RCU userspace extended quiescent state into
the context tracking subsystem. Context tracking is then used
to implement adaptive tickless (a.k.a extended nohz)
To support the new context tracking subsystem on ARM, the user/kernel
boundary transtions need to be instrumented.
For exceptions and IRQs in usermode, the existing usr_entry macro is
used to instrument the user->kernel transition. For the return to
usermode path, the ret_to_user* path is instrumented. Using the
usr_entry macro, this covers interrupts in userspace, data abort and
prefetch abort exceptions in userspace as well as undefined exceptions
in userspace (which is where FP emulation and VFP are handled.)
For syscalls, the slow return path is covered by instrumenting the
ret_to_user path. In addition, the syscall entry point is
instrumented which covers the user->kernel transition for both fast
and slow syscalls, and an additional instrumentation point is added
for the fast syscall return path (ret_fast_syscall).
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
To ease page table updates with 64-bit descriptors, CPUs implementing
LPAE are required to implement ldrd/strd as atomic operations.
This patch uses these accessors instead of the exclusive variants when
performing atomic64_{read,set} on LPAE systems.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The PCI specifications says that an I/O region must be aligned on a 4
KB boundary, and a memory region aligned on a 1 MB boundary.
However, the Marvell PCIe interfaces rely on address decoding windows
(which allow to associate a range of physical addresses with a given
device). For PCIe memory windows, those windows are defined with a 1
MB granularity (which matches the PCI specs), but PCIe I/O windows can
only be defined with a 64 KB granularity, so they have to be 64 KB
aligned. We therefore need to tell the PCI core about this special
alignement requirement.
The PCI core already calls pcibios_align_resource() in the ARM PCI
core, specifically for such purposes. So this patch extends the ARM
PCI core so that it calls a ->align_resource() hook registered by the
PCI driver, exactly like the existing ->map_irq() and ->swizzle()
hooks.
A particular PCI driver can register a align_resource() hook, and do
its own specific alignement, depending on the specific constraints of
the underlying hardware.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 70264367a2 ("ARM: 7653/2: do not scale loops_per_jiffy when
using a constant delay clock") fixed a problem with our timer-based
delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly
by the timer delay ops.
This patch fixes the problem in a more elegant way by keeping a private
ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy
and therefore not subject to scaling. The loop-based delay continues to
use loops_per_jiffy directly, as it should.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On Cortex-A15 (r0p0..r3p2) the TLBI/DSB are not adequately shooting down
all use of the old entries. This patch implements the erratum workaround
which consists of:
1. Dummy TLBIMVAIS and DSB on the CPU doing the TLBI operation.
2. Send IPI to the CPUs that are running the same mm (and ASID) as the
one being invalidated (or all the online CPUs for global pages).
3. CPU receiving the IPI executes a DMB and CLREX (part of the exception
return code already).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'gic' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
irqchip: gic: Perform the gic_secondary_init() call via CPU notifier
irqchip: gic: Call handle_bad_irq() directly
arm: Move chained_irq_(enter|exit) to a generic file
arm: Move the set_handle_irq and handle_arch_irq declarations to asm/irq.h
+ Linux 3.9-rc3
Signed-off-by: Olof Johansson <olof@lixom.net>
These functions have been introduced by commit 10a8c383 (irq: introduce
entry and exit functions for chained handlers) in asm/mach/irq.h. This
patch moves them to linux/irqchip/chained_irq.h so that generic irqchip
drivers do not rely on architecture specific header files.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <rob.herring@calxeda.com>
This is only used by 740t, which is a v4 core and (by my reading of the
datasheet for the CPU) ignores CRm for the cp15 cache flush operation,
making the v4 cache implementation in cache-v4.S sufficient for this
CPU.
Tested with 740T core-tile on Integrator/AP baseboard.
Acked-by: Hyok S. Choi <hyok.choi@samsung.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull Xen fixes from Konrad Rzeszutek Wilk:
- Compile warnings and errors (one on x86, two on ARM)
- WARNING in xen-pciback
- Use the acpi_processor_get_performance_info instead of the 'register'
version
* tag 'stable/for-linus-3.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/acpi: remove redundant acpi/acpi_drivers.h include
xen: arm: mandate EABI and use generic atomic operations.
acpi: Export the acpi_processor_get_performance_info
xen/pciback: Don't disable a PCI device that is already disabled.
Rob Herring has observed that c81611c4e9 "xen: event channel arrays are
xen_ulong_t and not unsigned long" introduced a compile failure when building
without CONFIG_AEABI:
/tmp/ccJaIZOW.s: Assembler messages:
/tmp/ccJaIZOW.s:831: Error: even register required -- `ldrexd r5,r6,[r4]'
Will Deacon pointed out that this is because OABI does not require even base
registers for 64-bit values. We can avoid this by simply using the existing
atomic64_xchg operation and the same containerof trick as used by the cmpxchg
macros. However since this code is used on memory which is shared with the
hypervisor we require proper atomic instructions and cannot use the generic
atomic64 callbacks (which are based on spinlocks), therefore add a dependency
on !GENERIC_ATOMIC64. Since we already depend on !CPU_V6 there isn't much
downside to this.
While thinking about this we also observed that OABI has different struct
alignment requirements to EABI, which is a problem for hypercall argument
structs which are shared with the hypervisor and which must be in EABI layout.
Since I don't expect people to want to run OABI kernels on Xen depend on
CONFIG_AEABI explicitly too (although it also happens to be enforced by the
!GENERIC_ATOMIC64 requirement too).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Rob Herring <robherring2@gmail.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
v8 is capable of invalidating Stage-2 by IPA, but v7 is not.
Change kvm_tlb_flush_vmid() to take an IPA parameter, which is
then ignored by the invalidation code (and nuke the whole TLB
as it always did).
This allows v8 to implement a more optimized strategy.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
arm64 cannot represent the kernel VAs in HYP mode, because of the lack
of TTBR1 at EL2. A way to cope with this situation is to have HYP VAs
to be an offset from the kernel VAs.
Introduce macros to convert a kernel VA to a HYP VA, make the HYP
mapping functions use these conversion macros. Also change the
documentation to reflect the existence of the offset.
On ARM, where we can have an identity mapping between kernel and HYP,
the macros are without any effect.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to keep the VFP allocation code common, use an abstract type
for the VFP containers. Maps onto struct vfp_hard_struct on ARM.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Make the split of the pgd_ptr an implementation specific thing
by moving the init call to an inline function.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>