powerpc: Add 64bit optimised memcmp

xiaomi-sm8450/android_kernel_xiaomi_sm8450

I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

This commit is contained in:

Anton Blanchard

2015-01-21 12:27:38 +11:00

committed by

Michael Ellerman

parent a113de373b

commit 15c2d45d17

3 changed files with 237 additions and 1 deletions

									
										2

arch/powerpc/lib/string.S
									
												View File
												
				@@ -93,6 +93,7 @@ _GLOBAL(strlen)

					subf	r3,r3,r4

					blr

				#ifdef CONFIG_PPC32

				_GLOBAL(memcmp)

					PPC_LCMPI 0,r5,0

					beq-	2f

				@@ -106,6 +107,7 @@ _GLOBAL(memcmp)

					blr

				2:	li	r3,0

					blr

				#endif

				_GLOBAL(memchr)

					PPC_LCMPI 0,r5,0

powerpc: Add 64bit optimised memcmp

2 arch/powerpc/lib/string.S Unescape Escape View File

2

arch/powerpc/lib/string.S

View File