On MIPS64 we have spinlocks that are 32b in size and an efficient
cmpxchg64 implementation, so we qualify to make use of cmpxchg backed
lockrefs. Select the ARCH_USE_CMPXCHG_LOCKREF Kconfig symbol and provide
a trivial implementation of arch_spin_value_unlocked to satisfy the
lockref code.
Using Linus' simple testcase from
http://article.gmane.org/gmane.linux.file-systems/77466 on a dual core
system with an in-development MIPS64 CPU running on FPGA I see around an
8% gain:
Pre-patch:
Total loops: 252698
Total loops: 251482
Total loops: 250806
Total loops: 252885
Total loops: 251666
Post-patch:
Total loops: 273728
Total loops: 269932
Total loops: 269341
Total loops: 275004
Total loops: 270208
[ralf@linux-mips.org: Fixed conflict.]
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: Maciej W. Rozycki <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Patchwork: https://patchwork.linux-mips.org/patch/10810/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS was using finish_arch_switch() as a hook to restore and initialize
CPU context for all threads, even newly created kernel and user threads.
This is however entirely solvable within switch_to() so get rid of
finish_arch_switch() which is in the way of scheduler cleanups.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If pgprot_writecombine is not #defined, asm-generic/pgtable.h will try
to provide a default implementation by #defining it to pgprot_noncached.
However our implementation is an inline function rather than a #define,
so it was never actually used because of the #define in generic code.
Add "#define pgprot_writecombine pgprot_writecombine" to prevent generic
code from re-defining it.
Signed-off-by: Alex Smith <alex.smith@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10767/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The context introduced by MSA needs to be saved around signals. However,
we can't increase the size of struct sigcontext because that will change
the offset of the signal mask in struct sigframe or struct ucontext.
This patch instead places the new context immediately after the struct
sigframe for traditional signals, or similarly after struct ucontext for
RT signals. The layout of struct sigframe & struct ucontext is identical
from their sigcontext fields onwards, so the offset from the sigcontext
to the extended context will always be the same regardless of the type
of signal.
Userland will be able to search through the extended context by using
the magic values to detect which types of context are present. Any
unrecognised context can be skipped over using the size field of struct
extcontext. Once the magic value END_EXTCONTEXT_MAGIC is seen it is
known that there are no further extended context structures to examine.
This approach is somewhat similar to that taken by ARM to save VFP &
other context at the end of struct ucontext.
Userland can determine whether extended context is present by checking
for the USED_EXTCONTEXT bit in the sc_used_math field of struct
sigcontext. Whilst this could potentially change the historic semantics
of sc_used_math if further extended context which does not imply FP
context were to be introduced in the future, I have been unable to find
any userland code making use of sc_used_math at all. Using one of the
fields described as unused in struct sigcontext was considered, but the
kernel does not already write to those fields so there would be no
guarantee of the field being clear on older kernels. Other alternatives
would be to have userland check the kernel version, or to have a HWCAP
bit indicating presence of extended context. However there is a desire
to have the context & information required to decode it be self
contained such that, for example, debuggers could decode the saved
context easily.
[ralf@linux-mips.org: Fixed conflict.]
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Matthew Fortune <matthew.fortune@imgtec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: linux-kernel@vger.kernel.org
Cc: Richard Weinberger <richard@nod.at>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Maciej W. Rozycki <macro@codesourcery.com>
Patchwork: https://patchwork.linux-mips.org/patch/10795/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The sc_used_math field of struct sigcontext & its variants has
traditionally been used as a boolean value indicating only whether or
not floating point context is saved within the sigcontext. With various
supported FP modes & the ability to switch between them this information
will no longer be enough to decode the meaning of the data stored in the
sc_fpregs fields of struct sigcontext.
To make that possible 3 bits are defined within sc_used_math:
- Bit 0 (USED_FP) represents whether FP was used, essentially
providing the boolean flag which sc_used_math as a whole provided
previously.
- Bit 1 (USED_FR1) provides the value of the Status.FR bit at the time
the FP context was saved.
- Bit 2 (USED_HYBRID_FPRS) indicates whether the FP context was saved
under the hybrid FPR scheme. Essentially, when set the odd singles
are located in bits 63:32 of the preceding even indexed sc_fpregs
element.
Any userland that tests whether the sc_used_math field is zero or
non-zero will continue to function as expected. Having said that, I
could not find any userland which uses the sc_used_math field at all.
[ralf@linux-mips.org: Fixed rejects.]
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Matthew Fortune <matthew.fortune@imgtec.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-kernel@vger.kernel.org
Cc: Richard Weinberger <richard@nod.at>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Maciej W. Rozycki <macro@codesourcery.com>
Patchwork: https://patchwork.linux-mips.org/patch/10794/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Make use of the common FP sigcontext code for O32 binaries running on
MIPS64 kernels now that it is taking appropriate offsets into struct
sigcontext(32) from struct mips_abi.
[ralf@linux-mips.org: Fixed reject.]
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Matthew Fortune <matthew.fortune@imgtec.com>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-kernel@vger.kernel.org
Cc: Richard Weinberger <richard@nod.at>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Manuel Lauss <manuel.lauss@gmail.com>
Cc: Maciej W. Rozycki <macro@codesourcery.com>
Patchwork: https://patchwork.linux-mips.org/patch/10792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The generic field definitions (i.e. present before MIPS32/MIPS64) in
mipsregs.h are conventionally not prefixed with MIPS_, so rename the
recently added MIPS_ENTRYLO_* definitions for the G, V, D, and C fields
to ENTRYLO_*. Also rearrange to put the EntryLo and EntryHi definitions
in the right place in the file.
Fixes: 8ab6abcb6a ("MIPS: mipsregs.h: Add EntryLo bit definitions")
Reported-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10725/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The TLB registers are dumped in a couble of places:
- sysrq_tlbdump_single() - when dumping TLB state.
- do_mcheck() - in response to a machine check error.
The main TLB registers also differ between r3k and r4k, but r4k appears
to be assumed.
Refactor this code into a dump_tlb_regs() function, implemented for both
r3k and r4k, and used by both of the above functions.
Fixes: d1e9a4f547 ("MIPS: Add SysRq operation to dump TLBs on all CPUs")
Suggested-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10721/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
These are bitfields and treating them as signed values doesn't make
any sense.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Chris Packham <judge.packham@gmail.com>
Weak header file declarations are error-prone because they make every
definition weak, and the linker chooses one based on link order (see
10629d711e ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).
mips_cdmm_phys_base() is defined only in arch/mips/mti-malta/malta-memory.c
so there's no problem with multiple definitions. But it works better to
have a weak default implementation and allow a strong function to override
it. Then we don't have to test whether a definition is present, and if
there are ever multiple strong definitions, we get a link error instead of
calling a random definition.
Add a weak mips_cdmm_phys_base() definition and remove the weak annotation
from the declaration in arch/mips/include/asm/cdmm.h.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10688/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Weak header file declarations are error-prone because they make every
definition weak, and the linker chooses one based on link order (see
10629d711e ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).
The most elegant solution is to have a weak default implementation and
allow a strong function to override it. Then we don't have to test
whether a definition is present, and if there are ever multiple strong
definitions, we get a link error instead of calling a random definition.
Add a weak get_c0_fdc_int() definition with the default code and remove the
weak annotation from the declaration.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10687/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Weak header file declarations are error-prone because they make every
definition weak, and the linker chooses one based on link order (see
10629d711e ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).
get_c0_compare_int() is defined in several files. Each definition is weak,
so I assume Kconfig prevents two or more from being included. The caller
contains default code used when get_c0_compare_int() isn't defined at all.
Add a weak get_c0_compare_int() definition with the default code and remove
the weak annotation from the declaration.
Then the platform implementations will be strong and will override the weak
default. If multiple platforms are ever configured in, we'll get a link
error instead of calling a random platform's implementation.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10686/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Pull scheduler updates from Ingo Molnar:
"The biggest change in this cycle is the rewrite of the main SMP load
balancing metric: the CPU load/utilization. The main goal was to make
the metric more precise and more representative - see the changelog of
this commit for the gory details:
9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")
It is done in a way that significantly reduces complexity of the code:
5 files changed, 249 insertions(+), 494 deletions(-)
and the performance testing results are encouraging. Nevertheless we
need to keep an eye on potential regressions, since this potentially
affects every SMP workload in existence.
This work comes from Yuyang Du.
Other changes:
- SCHED_DL updates. (Andrea Parri)
- Simplify architecture callbacks by removing finish_arch_switch().
(Peter Zijlstra et al)
- cputime accounting: guarantee stime + utime == rtime. (Peter
Zijlstra)
- optimize idle CPU wakeups some more - inspired by Facebook server
loads. (Mike Galbraith)
- stop_machine fixes and updates. (Oleg Nesterov)
- Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra)
- sched/numa tweaks. (Srikar Dronamraju)
- misc fixes and small cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
sched/deadline: Fix comment in enqueue_task_dl()
sched/deadline: Fix comment in push_dl_tasks()
sched: Change the sched_class::set_cpus_allowed() calling context
sched: Make sched_class::set_cpus_allowed() unconditional
sched: Fix a race between __kthread_bind() and sched_setaffinity()
sched: Ensure a task has a non-normalized vruntime when returning back to CFS
sched/numa: Fix NUMA_DIRECT topology identification
tile: Reorganize _switch_to()
sched, sparc32: Update scheduler comments in copy_thread()
sched: Remove finish_arch_switch()
sched, tile: Remove finish_arch_switch
sched, sh: Fold finish_arch_switch() into switch_to()
sched, score: Remove finish_arch_switch()
sched, avr32: Remove finish_arch_switch()
sched, MIPS: Get rid of finish_arch_switch()
sched, arm: Remove finish_arch_switch()
sched/fair: Clean up load average references
sched/fair: Provide runnable_load_avg back to cfs_rq
sched/fair: Remove task and group entity load when they are dead
sched/fair: Init cfs_rq's sched_entity load average
...
Correct a build failure introduced by be0c37c9 [MIPS: Rearrange PTE bits
into fixed positions.]:
In file included from ./arch/mips/include/asm/io.h:27:0,
from ./arch/mips/include/asm/page.h:176,
from include/linux/mm_types.h:15,
from include/linux/sched.h:27,
from include/linux/ptrace.h:5,
from arch/mips/kernel/cpu-probe.c:16:
./arch/mips/include/asm/pgtable-bits.h:164:0: error: "_PAGE_GLOBAL_SHIFT" redefined [-Werror]
#define _PAGE_GLOBAL_SHIFT (_PAGE_MODIFIED_SHIFT + 1)
^
./arch/mips/include/asm/pgtable-bits.h:141:0: note: this is the location of the previous definition
#define _PAGE_GLOBAL_SHIFT (_PAGE_SPLITTING_SHIFT + 1)
^
cc1: all warnings being treated as errors
make[2]: *** [arch/mips/kernel/cpu-probe.o] Error 1
for 64BIT/CPU_MIPSR1/MIPS_HUGE_TLB_SUPPORT configurations. Remove the
scattered double `_PAGE_NO_EXEC_SHIFT' and `_PAGE_GLOBAL_SHIFT' macro
definitions and rearrange them so that the respective macros these
definitions are based on are also those used for guarding conditionals.
[ralf@linux-mips.org: resolved conflicts and updated commments.]
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9960/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Weak header file declarations are error-prone because they make every
definition weak, and the linker chooses one based on link order (see
10629d711e ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).
That's not a problem for vpe_run() because Kconfig ensures there's never
more than one definition:
- vpe_run() is defined in arch/mips/kernel/vpe-mt.c if
CONFIG_MIPS_VPE_LOADER_MT=y
- vpe_run() is defined in arch/mips/mti-malta/malta-amon.c if
CONFIG_MIPS_CMP=y
- CONFIG_MIPS_VPE_LOADER_MT cannot be set if CONFIG_MIPS_CMP=y
But it's simpler to verify correctness if we remove "weak" from the picture
and test the config symbols directly.
Remove "weak" from the vpe_run() declaration and use #if to test whether a
definition should be present.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10684/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Weak header file declarations are error-prone because they make every
definition weak, and the linker chooses one based on link order (see
10629d711e ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).
platform_maar_init() is defined in:
- arch/mips/mm/init.c (where it is marked "weak")
- arch/mips/mti-malta/malta-memory.c (without annotation)
The "weak" attribute on the platform_maar_init() extern declaration applies
to the platform-specific definition in arch/mips/mti-malta/malta-memory.c,
so both definitions are weak, and which one we get depends on link order.
Remove the "weak" attribute from the declaration. That makes the malta
definition strong, so it will always be preferred if it is present.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-mips@linux-mips.org
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10682/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
There's only one implementation of mips_cpc_phys_base(), and it's only used
within the same file, so it doesn't need to be weak, and it doesn't need an
extern declaration.
Remove the extern mips_cpc_phys_base() declaration and make it static.
[ralf@linux-mips.org: Fixed conflict.]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-mips@linux-mips.org
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10681/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Provide a function to trivially return the version of the CM present in
the system, or 0 if no CM is present. The mips_cm_revision() will be
used later on to determine the CM register width, so it must not use
the regular CM accessors to read the revision register since that will
lead to build failures due to recursive inlines.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10655/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
On MIPS the GLOBAL bit of the PTE must have the same value in any
aligned pair of PTEs. These pairs of PTEs are referred to as
"buddies". In a SMP system is is possible for two CPUs to be calling
set_pte() on adjacent PTEs at the same time. There is a race between
setting the PTE and a different CPU setting the GLOBAL bit in its
buddy PTE.
This race can be observed when multiple CPUs are executing
vmap()/vfree() at the same time.
Make setting the buddy PTE's GLOBAL bit an atomic operation to close
the race condition.
The case of CONFIG_64BIT_PHYS_ADDR && CONFIG_CPU_MIPS32 is *not*
handled.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: <stable@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10835/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS was using finish_arch_switch() as a hook to restore and initialize
CPU context for all threads, even newly created kernel and user threads.
This is however entirely solvable within switch_to() so get rid of
finish_arch_switch() which is in the way of scheduler cleanups.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are various problems and short-comings with the current
static_key interface:
- static_key_{true,false}() read like a branch depending on the key
value, instead of the actual likely/unlikely branch depending on
init value.
- static_key_{true,false}() are, as stated above, tied to the
static_key init values STATIC_KEY_INIT_{TRUE,FALSE}.
- we're limited to the 2 (out of 4) possible options that compile to
a default NOP because that's what our arch_static_branch() assembly
emits.
So provide a new static_key interface:
DEFINE_STATIC_KEY_TRUE(name);
DEFINE_STATIC_KEY_FALSE(name);
Which define a key of different types with an initial true/false
value.
Then allow:
static_branch_likely()
static_branch_unlikely()
to take a key of either type and emit the right instruction for the
case.
This means adding a second arch_static_branch_jump() assembly helper
which emits a JMP per default.
In order to determine the right instruction for the right state,
encode the branch type in the LSB of jump_entry::key.
This is the final step in removing the naming confusion that has led to
a stream of avoidable bugs such as:
a833581e37 ("x86, perf: Fix static_key bug in load_mm_cr4()")
... but it also allows new static key combinations that will give us
performance enhancements in the subsequent patches.
Tested-by: Rabin Vincent <rabin@rab.in> # arm
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> # ppc
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The majority of SMP platforms handle their IPIs through do_IRQ()
which calls irq_{enter/exit}(). When a call function IPI is received,
smp_call_function_interrupt() is called which also calls
irq_{enter,exit}(), meaning irq_count is raised twice.
When tick broadcasting is used (which is implemented via a call
function IPI), this incorrectly causes all CPU idle time on the core
receiving broadcast ticks to be accounted as time spent servicing
IRQs, as account_process_tick() will account as such if irq_count is
greater than 1. This results in 100% CPU usage being reported on a
core which receives its ticks via broadcast.
This patch removes the SMP smp_call_function_interrupt() wrapper which
calls irq_{enter,exit}(). Platforms which handle their IPIs through
do_IRQ() now call generic_smp_call_function_interrupt() directly to
avoid incrementing irq_count a second time. Platforms which don't
(loongson, sgi-ip27, sibyte) call generic_smp_call_function_interrupt()
wrapped in irq_{enter,exit}().
Signed-off-by: Alex Smith <alex.smith@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10770/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>