The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.
For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.
As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for the new "jump label" feature.
Unlike x86 and sparc we just merrily patch the code with no locks etc,
as far as I know this is safe, but I'm not really sure what the x86/sparc
code is protecting against so maybe it's not.
I also don't see any reason for us to implement the poke_early() routine,
even though sparc does.
[BenH: Updated the patch to upstream generic changes]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A mix of think & mismerge on my side caused a problem where both the
new hvsi_lib and the old hvsi driver gets compiled and try to define
symbols with the same name.
This fixes it by renaming the hvsi_lib exported symbols.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
a9c0f41b3a breaks the build
on some platforms. The extern declaration must be shielded
against assembly.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On pseries machines, consoles are provided by the hypervisor using
a low level get_chars/put_chars type interface. However, this is
really just a transport to the service processor which implements
them either as "raw" console (networked consoles, HMC, ...) or as
"hvsi" serial ports.
The later is a simple packet protocol on top of the raw character
interface that is supposed to convey additional "serial port" style
semantics. In practice however, all it does is provide a way to
read the CD line and set/clear our DTR line, that's it.
We currently implement the "raw" protocol as an hvc console backend
(/dev/hvcN) and the "hvsi" protocol using a separate tty driver
(/dev/hvsi0).
However this is quite impractical. The arbitrary difference between
the two type of devices has been a major source of user (and distro)
confusion. Additionally, there's an additional mini -hvsi implementation
in the pseries platform code for our low level debug console and early
boot kernel messages, which means code duplication, though that low
level variant is impractical as it's incapable of doing the initial
protocol negociation to establish the link to the FSP.
This essentially replaces the dedicated hvsi driver and the platform
udbg code completely by extending the existing hvc_vio backend used
in "raw" mode so that:
- It now supports HVSI as well
- We add support for hvc backend providing tiocm{get,set}
- It also provides a udbg interface for early debug and boot console
This is overall less code, though this will only be obvious once we
remove the old "hvsi" driver, which is still available for now. When
the old driver is enabled, the new code still kicks in for the low
level udbg console, replacing the old mini implementation in the platform
code, it just doesn't provide the higher level "hvc" interface.
In addition to producing generally simler code, this has several benefits
over our current situation:
- The user/distro only has to deal with /dev/hvcN for the hypervisor
console, avoiding all sort of confusion that has plagued us in the past
- The tty, kernel and low level debug console all use the same code
base which supports the full protocol establishment process, thus the
console is now available much earlier than it used to be with the
old HVSI driver. The kernel console works much earlier and udbg is
available much earlier too. Hackers can enable a hard coded very-early
debug console as well that works with HVSI (previously that was only
supported for the "raw" mode).
I've tried to keep the same semantics as hvsi relative to how I react
to things like CD changes, with some subtle differences though:
- I clear DTR on close if HUPCL is set
- Current hvsi triggers a hangup if it detects a up->down transition
on CD (you can still open a console with CD down). My new implementation
triggers a hangup if the link to the FSP is severed, and severs it upon
detecting a up->down transition on CD.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Embed the struct hvsi_header in the various packet definitions
rather than open coding it multiple times. Will help provide
stronger type checking.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This moves various HVSI protocol definitions from the hvsi.c
driver to a header file that can be used later on by a udbg
implementation
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On MMUs such as FSL where we can guarantee the entire linear mapping is
bolted, we don't need to worry about linear TLB misses. If on top of
that we do a full table walk, we get rid of all recursive TLB faults, and
can dispense with some state saving. This gains a few percent on
TLB-miss-heavy workloads, and around 50% on a benchmark that had a high
rate of virtual page table faults under the normal handler.
While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and
EX_TLB_SRR1 as they're not used.
[BenH: Fixed build with 64K pages (wsp config)]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit 21a3c96 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.
Then, we see
mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'
So, fixiing page_cgroup.c is an idea...
But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)
This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.
A result of macro expansion is here (mm/page_cgroup.c)
for !NUMA
start_pfn = ((&contig_page_data)->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
for NUMA (x86-64)
start_pfn = ((node_data[nid])->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
Changelog:
- fixed to avoid using "nid" twice in node_end_pfn() macro.
Reported-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Freescale ePAPR reference hypervisor provides interrupt controller
services via a hypercall interface, instead of emulating the MPIC
controller. This is called the VMPIC.
The ePAPR "virtual interrupt controller" provides interrupt controller
services for external interrupts. External interrupts received by a
partition can come from two sources:
- Hardware interrupts - hardware interrupts come from external
interrupt lines or on-chip I/O devices.
- Virtual interrupts - virtual interrupts are generated by the hypervisor
as part of some hypervisor service or hypervisor-created virtual device.
Both types of interrupts are processed using the same programming model and
same set of hypercalls.
Signed-off-by: Ashish Kalra <ashish.kalra@freescale.com>
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
ePAPR hypervisors provide operating system services via a "hypercall"
interface. The following steps need to be performed to make an hcall:
1. Load r11 with the hcall number
2. Load specific other registers with parameters
3. Issue instrucion "sc 1"
4. The return code is in r3
5. Other returned parameters are in other registers.
To provide this service to the kernel, these steps are wrapped in inline
assembly functions. Standard ePAPR hcalls are in epapr_hcalls.h, and
Freescale extensions are in fsl_hcalls.h.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Move irq_choose_cpu() into arch/powerpc/kernel/irq.c so that it can be used
by other PIC drivers. The function is not MPIC-specific.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We expect this is actually faster, and we end up needing more space than we
can get from the SPRGs in some instances. This is also useful when running
as a guest OS - SPRGs4-7 do not have guest versions.
8 slots are allocated in thread_info for this even though we only actually
use 4 of them - this allows space for future code to have more scratch
space (and we know we'll need it for things like hugetlb).
Signed-off-by: Ashish Kalra <Ashish.Kalra@freescale.com>
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
doorbell type is defined as bits 32:36 so should be shifted by 63-36 =
27 rather than 28.
We never noticed this bug as we've only every used type PPC_DBELL = 0.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
smp_release_cpus() waits for all cpus (including the bootcpu) due to an
off-by-one count on boot_cpu_count (which is all CPUs). This patch replaces
that with spinning_secondaries (which is all secondary CPUs).
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Several fixes as well where the +1 was missing.
Done via coccinelle scripts like:
@@
struct resource *ptr;
@@
- ptr->end - ptr->start + 1
+ resource_size(ptr)
and some grep and typing.
Mostly uncompiled, no cross-compilers.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
All archs do more or less the same thing now, move it into
a single generic place.
I chose pci.h rather than of_pci.h to avoid having to change
all call-sites to include the later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
powerpc has two different ways of matching PCI devices to their
corresponding OF node (if any) for historical reasons. The ppc64 one
does a scan looking for matching bus/dev/fn, while the ppc32 one does a
scan looking only for matching dev/fn on each level in order to be
agnostic to busses being renumbered (which Linux does on some
platforms).
This removes both and instead moves the matching code to the PCI core
itself. It's the most logical place to do it: when a pci_dev is created,
we know the parent and thus can do a single level scan for the matching
device_node (if any).
The benefit is that all archs now get the matching for free. There's one
hook the arch might want to provide to match a PHB bus to its device
node. A default weak implementation is provided that looks for the
parent device device node, but it's not entirely reliable on powerpc for
various reasons so powerpc provides its own.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
arch/powerpc/kernel/built-in.o: In function `machine_check_e500mc':
arch/powerpc/kernel/traps.c:429: undefined reference to `fsl_rio_mcheck_exception'
arch/powerpc/kernel/built-in.o: In function `machine_check_e500':
arch/powerpc/kernel/traps.c:519: undefined reference to `fsl_rio_mcheck_exception'
make: *** [.tmp_vmlinux1] Error 1
Reported-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.
setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.
While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.
v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.
> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>
Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Fix PM QOS's user mode interface to work with ASCII input
PM / Hibernate: Update kerneldoc comments in hibernate.c
PM / Hibernate: Remove arch_prepare_suspend()
PM / Hibernate: Update some comments in core hibernate code
The cell iic interrupt controller has enough software caused interrupts
to use a unique interrupt for each of the 4 messages powerpc uses.
This means each interrupt gets its own irq action/data combination.
Use the seperate, optimized, arch common ipi action functions
registered via the helper smp_request_message_ipi instead passing the
message as action data to a single action that then demultipexes to
the required acton via a switch statement.
smp_request_message_ipi will register the action as IRQF_PER_CPU
and IRQF_DISABLED, and WARN if the allocation fails for some reason,
so no need to print on that failure. It will return positive if
the message will not be used by the kernel, in which case we can
free the virq.
In addition to elimiating inefficient code, this also corrects the
error that a kernel built with kexec but without a debugger would
not register the ipi for kdump to notify the other cpus of a crash.
This also restores the debugger action to be static to kernel/smp.c.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When page coalescing support was added recently, the MAX_HCALL_OPCODE
define was not updated for the newly added H_GET_MPP_X hcall.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch implements the raw syscall tracepoints on PowerPC and exports
them for ftrace syscalls to use.
To minimise reworking existing code, I slightly re-ordered the thread
info flags such that the new TIF_SYSCALL_TRACEPOINT bit would still fit
within the 16 bits of the andi. instruction's UI field. The instructions
in question are in /arch/powerpc/kernel/entry_{32,64}.S to and the
_TIF_SYSCALL_T_OR_A with the thread flags to see if system call tracing
is enabled.
In the case of 64bit PowerPC, arch_syscall_addr and
arch_syscall_match_sym_name are overridden to allow ftrace syscalls to
work given the unusual system call table structure and symbol names that
start with a period.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
All architectures supporting hibernation define
arch_prepare_suspend() as an empty function, so remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
b43: fix comment typo reqest -> request
Haavard Skinnemoen has left Atmel
cris: typo in mach-fs Makefile
Kconfig: fix copy/paste-ism for dell-wmi-aio driver
doc: timers-howto: fix a typo ("unsgined")
perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
treewide: fix a few typos in comments
regulator: change debug statement be consistent with the style of the rest
Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
audit: acquire creds selectively to reduce atomic op overhead
rtlwifi: don't touch with treewide double semicolon removal
treewide: cleanup continuations and remove logging message whitespace
ath9k_hw: don't touch with treewide double semicolon removal
include/linux/leds-regulator.h: fix syntax in example code
tty: fix typo in descripton of tty_termios_encode_baud_rate
xtensa: remove obsolete BKL kernel option from defconfig
m68k: fix comment typo 'occcured'
arch:Kconfig.locks Remove unused config option.
treewide: remove extra semicolons
...
* 'kvm-updates/2.6.40' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (131 commits)
KVM: MMU: Use ptep_user for cmpxchg_gpte()
KVM: Fix kvm mmu_notifier initialization order
KVM: Add documentation for KVM_CAP_NR_VCPUS
KVM: make guest mode entry to be rcu quiescent state
KVM: x86 emulator: Make jmp far emulation into a separate function
KVM: x86 emulator: Rename emulate_grpX() to em_grpX()
KVM: x86 emulator: Remove unused arg from emulate_pop()
KVM: x86 emulator: Remove unused arg from writeback()
KVM: x86 emulator: Remove unused arg from read_descriptor()
KVM: x86 emulator: Remove unused arg from seg_override()
KVM: Validate userspace_addr of memslot when registered
KVM: MMU: Clean up gpte reading with copy_from_user()
KVM: PPC: booke: add sregs support
KVM: PPC: booke: save/restore VRSAVE (a.k.a. USPRG0)
KVM: PPC: use ticks, not usecs, for exit timing
KVM: PPC: fix exit accounting for SPRs, tlbwe, tlbsx
KVM: PPC: e500: emulate SVR
KVM: VMX: Cache vmcs segment fields
KVM: x86 emulator: consolidate segment accessors
KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions
...
Linux doesn't use USPRG0 (now renamed VRSAVE in the architecture, even
when Altivec isn't involved), but a guest might.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Return the actual host SVR for now, as we already do for PVR. Eventually
we may support Qemu overriding PVR/SVR if the situation is appropriate,
once we implement KVM_SET_SREGS on e500.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
macvlan: fix panic if lowerdev in a bond
tg3: Add braces around 5906 workaround.
tg3: Fix NETIF_F_LOOPBACK error
macvlan: remove one synchronize_rcu() call
networking: NET_CLS_ROUTE4 depends on INET
irda: Fix error propagation in ircomm_lmp_connect_response()
irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
be2net: Kill set but unused variable 'req' in lancer_fw_download()
irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
isdn: capi: Use pr_debug() instead of ifdefs.
tg3: Update version to 3.119
tg3: Apply rx_discards fix to 5719/5720
...
Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
as per Davem.
Simultaneous FCM and GPCM or UPM operation may erroneously trigger
bus monitor timeout.
Set the local bus monitor timeout value to the maximum by setting
LBCR[BMT] = 0 and LBCR[BMTPS] = 0xF.
Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi()
call to arch/powerpc/platforms/cell/interrupt.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commits a5d4f3ad3a ("powerpc: Base support for exceptions using
HSRR0/1") and 673b189a2e ("powerpc: Always use SPRN_SPRG_HSCRATCH0
when running in HV mode") cause compile and link errors for 32-bit
classic Book 3S processors when KVM is enabled. This fixes these
errors.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>