powerpc/booke64: wrap tlb lock and search in htw miss with FTR_SMT

Virtualized environments may expose a e6500 dual-threaded core
as two single-threaded e6500 cores. Take advantage of this
and get rid of the tlb lock and the trap-causing tlbsx in
the htw miss handler by guarding with CPU_FTR_SMT, as it's
already being done in the bolted tlb1 miss handler.

As seen in the results below, measurements done with lmbench
random memory access latency test running under Freescale's
Embedded Hypervisor, there is a ~34% improvement.

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
----------------------------------------------------
Host       Mhz   L1 $   L2 $    Main mem    Rand mem
---------  ---   ----   ----    --------    --------
smt       1665 1.8020   13.2    83.0         1149.7
nosmt     1665 1.8020   13.2    83.0          758.1

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
[scottwood@freescale.com: commit message tweak]
Signed-off-by: Scott Wood <scottwood@freescale.com>
这个提交包含在:
Laurentiu Tudor
2014-05-30 17:59:15 +03:00
提交者 Scott Wood
父节点 f24bc2701a
当前提交 7c48005036

查看文件

@@ -299,6 +299,7 @@ itlb_miss_fault_bolted:
* r10 = crap (free to use)
*/
tlb_miss_common_e6500:
BEGIN_FTR_SECTION
/*
* Search if we already have an indirect entry for that virtual
* address, and if we do, bail out.
@@ -333,6 +334,7 @@ tlb_miss_common_e6500:
andis. r10,r10,MAS1_VALID@h
bne tlb_miss_done_e6500
END_FTR_SECTION_IFSET(CPU_FTR_SMT)
/* Now, we need to walk the page tables. First check if we are in
* range.
@@ -393,11 +395,13 @@ tlb_miss_common_e6500:
tlb_miss_done_e6500:
.macro tlb_unlock_e6500
BEGIN_FTR_SECTION
beq cr1,1f /* no unlock if lock was recursively grabbed */
li r15,0
isync
stb r15,0(r11)
1:
END_FTR_SECTION_IFSET(CPU_FTR_SMT)
.endm
tlb_unlock_e6500