Jussi Kivilinna
f94a73f8dd
crypto: twofish-avx - tune assembler code for more performance
...
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies and interleaves instructions better for out-of-order scheduling.
Tested on Intel Core i5-2450M and AMD FX-8100.
tcrypt ECB results:
Intel Core i5-2450M:
size old-vs-new new-vs-3way old-vs-3way
enc dec enc dec enc dec
256 1.12x 1.13x 1.36x 1.37x 1.21x 1.22x
1k 1.14x 1.14x 1.48x 1.49x 1.29x 1.31x
8k 1.14x 1.14x 1.50x 1.52x 1.32x 1.33x
AMD FX-8100:
size old-vs-new new-vs-3way old-vs-3way
enc dec enc dec enc dec
256 1.10x 1.11x 1.01x 1.01x 0.92x 0.91x
1k 1.11x 1.12x 1.08x 1.07x 0.97x 0.96x
8k 1.11x 1.13x 1.10x 1.08x 0.99x 0.97x
[v2]
- Do instruction interleaving another way to avoid adding new FPU<=>CPU
register moves as these cause performance drop on Bulldozer.
- Further interleaving improvements for better out-of-order scheduling.
Tested-by: Borislav Petkov <bp@alien8.de >
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-09-07 04:17:04 +08:00
Jussi Kivilinna
023af60825
crypto: aesni_intel - improve lrw and xts performance by utilizing parallel AES-NI hardware pipelines
...
Use parallel LRW and XTS encryption facilities to better utilize AES-NI
hardware pipelines and gain extra performance.
Tcrypt benchmark results (async), old vs new ratios:
Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7)
aes:128bit
lrw:256bit xts:256bit
size lrw-enc lrw-dec xts-dec xts-dec
16B 0.99x 1.00x 1.22x 1.19x
64B 1.38x 1.50x 1.58x 1.61x
256B 2.04x 2.02x 2.27x 2.29x
1024B 2.56x 2.54x 2.89x 2.92x
8192B 2.85x 2.99x 3.40x 3.23x
aes:192bit
lrw:320bit xts:384bit
size lrw-enc lrw-dec xts-dec xts-dec
16B 1.08x 1.08x 1.16x 1.17x
64B 1.48x 1.54x 1.59x 1.65x
256B 2.18x 2.17x 2.29x 2.28x
1024B 2.67x 2.67x 2.87x 3.05x
8192B 2.93x 2.84x 3.28x 3.33x
aes:256bit
lrw:348bit xts:512bit
size lrw-enc lrw-dec xts-dec xts-dec
16B 1.07x 1.07x 1.18x 1.19x
64B 1.56x 1.56x 1.70x 1.71x
256B 2.22x 2.24x 2.46x 2.46x
1024B 2.76x 2.77x 3.13x 3.05x
8192B 2.99x 3.05x 3.40x 3.30x
Cc: Huang Ying <ying.huang@intel.com >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Reviewed-by: Kim Phillips <kim.phillips@freescale.com >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-08-20 16:28:10 +08:00
Johannes Goetzfried
4ea1277d30
crypto: cast6 - add x86_64/avx assembler implementation
...
This patch adds a x86_64/avx assembler implementation of the Cast6 block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the functions from the generic module are called. A good
performance increase is provided for blocksizes greater or equal to 128B.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmark results:
Intel Core i5-2500 CPU (fam:6, model:42, step:7)
cast6-avx-x86_64 vs. cast6-generic
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.97x 1.00x 1.01x 1.01x 0.99x 0.97x 0.98x 1.01x 0.96x 0.98x
64B 0.98x 0.99x 1.02x 1.01x 0.99x 1.00x 1.01x 0.99x 1.00x 0.99x
256B 1.77x 1.84x 0.99x 1.85x 1.77x 1.77x 1.70x 1.74x 1.69x 1.72x
1024B 1.93x 1.95x 0.99x 1.96x 1.93x 1.93x 1.84x 1.85x 1.89x 1.87x
8192B 1.91x 1.95x 0.99x 1.97x 1.95x 1.91x 1.86x 1.87x 1.93x 1.90x
256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.97x 0.99x 1.02x 1.01x 0.98x 0.99x 1.00x 1.00x 0.98x 0.98x
64B 0.98x 0.99x 1.01x 1.00x 1.00x 1.00x 1.01x 1.01x 0.97x 1.00x
256B 1.77x 1.83x 1.00x 1.86x 1.79x 1.78x 1.70x 1.76x 1.71x 1.69x
1024B 1.92x 1.95x 0.99x 1.96x 1.93x 1.93x 1.83x 1.86x 1.89x 1.87x
8192B 1.94x 1.95x 0.99x 1.97x 1.95x 1.95x 1.87x 1.87x 1.93x 1.91x
Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-08-01 17:47:30 +08:00
Johannes Goetzfried
4d6d6a2c85
crypto: cast5 - add x86_64/avx assembler implementation
...
This patch adds a x86_64/avx assembler implementation of the Cast5 block
cipher. The implementation processes sixteen blocks in parallel (four 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the functions from the generic module are called. A good
performance increase is provided for blocksizes greater or equal to 128B.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmark results:
Intel Core i5-2500 CPU (fam:6, model:42, step:7)
cast5-avx-x86_64 vs. cast5-generic
64bit key:
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B 0.99x 0.99x 1.00x 1.00x 1.02x 1.01x
64B 1.00x 1.00x 0.98x 1.00x 1.01x 1.02x
256B 2.03x 2.01x 0.95x 2.11x 2.12x 2.13x
1024B 2.30x 2.24x 0.95x 2.29x 2.35x 2.35x
8192B 2.31x 2.27x 0.95x 2.31x 2.39x 2.39x
128bit key:
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B 0.99x 0.99x 1.00x 1.00x 1.01x 1.01x
64B 1.00x 1.00x 0.98x 1.01x 1.02x 1.01x
256B 2.17x 2.13x 0.96x 2.19x 2.19x 2.19x
1024B 2.29x 2.32x 0.95x 2.34x 2.37x 2.38x
8192B 2.35x 2.32x 0.95x 2.35x 2.39x 2.39x
Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-08-01 17:47:30 +08:00
Jussi Kivilinna
7af6c24568
crypto: arch/x86 - cleanup - remove unneeded crypto_alg.cra_list initializations
...
Initialization of cra_list is currently mixed, most ciphers initialize this
field and most shashes do not. Initialization however is not needed at all
since cra_list is initialized/overwritten in __crypto_register_alg() with
list_add(). Therefore perform cleanup to remove all unneeded initializations
of this field in 'arch/x86/crypto/'.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-08-01 17:47:27 +08:00
Johannes Goetzfried
a43478863b
crypto: twofish-avx - remove useless instruction
...
The register %rdx is written, but never read till the end of the encryption
routine. Therefore let's delete the useless instruction.
Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-07-11 11:08:30 +08:00
Milan Broz
bf084d8f6e
crypto: aesni-intel - fix wrong kfree pointer
...
kfree(new_key_mem) in rfc4106_set_key() should be called on malloced pointer,
not on aligned one, otherwise it can cause invalid pointer on free.
(Seen at least once when running tcrypt tests with debug kernel.)
Signed-off-by: Milan Broz <mbroz@redhat.com >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-07-11 11:06:13 +08:00
Jussi Kivilinna
70ef2601fe
crypto: move arch/x86/include/asm/aes.h to arch/x86/include/asm/crypto/
...
Move AES header to the new asm/crypto directory.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:03 +08:00
Jussi Kivilinna
d4af0e9d6e
crypto: move arch/x86/include/asm/serpent-{sse2|avx}.h to arch/x86/include/asm/crypto/
...
Move serpent crypto headers to the new asm/crypto/ directory.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:02 +08:00
Jussi Kivilinna
a7378d4e55
crypto: twofish-avx - remove duplicated glue code and use shared glue code from glue_helper
...
Now that shared glue code is available, convert twofish-avx to use it.
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:02 +08:00
Jussi Kivilinna
414cb5e7cc
crypto: twofish-x86_64-3way - remove duplicated glue code and use shared glue code from glue_helper
...
Now that shared glue code is available, convert twofish-x86_64-3way to use it.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:02 +08:00
Jussi Kivilinna
964263afdc
crypto: camellia-x86_64 - remove duplicated glue code and use shared glue code from glue_helper
...
Now that shared glue code is available, convert camellia-x86_64 to use it.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:02 +08:00
Jussi Kivilinna
1d0debbd46
crypto: serpent-avx: remove duplicated glue code and use shared glue code from glue_helper
...
Now that shared glue code is available, convert serpent-avx to use it.
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:01 +08:00
Jussi Kivilinna
596d875052
crypto: serpent-sse2 - split generic glue code to new helper module
...
Now that serpent-sse2 glue code has been made generic, it can be split to
separate module.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:01 +08:00
Jussi Kivilinna
e81792fbc2
crypto: serpent-sse2 - prepare serpent-sse2 glue code into generic x86 glue code for 128bit block ciphers
...
Block cipher implementations in arch/x86/crypto/ contain common glue code that
is currently duplicated in each module (camellia-x86_64, twofish-x86_64-3way,
twofish-avx, serpent-sse2 and serpent-avx). This patch prepares serpent-sse2
glue into generic glue code for all 128bit block ciphers to use in
arch/x86/crypto.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:01 +08:00
Jussi Kivilinna
a9629d7142
crypto: aes_ni - change to use shared ablk_* functions
...
Remove duplicate ablk_* functions and make use of ablk_helper module instead.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:01 +08:00
Jussi Kivilinna
30a0400882
crypto: twofish-avx - change to use shared ablk_* functions
...
Remove duplicate ablk_* functions and make use of ablk_helper module instead.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:01 +08:00
Jussi Kivilinna
ffaf915632
crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code to shared module
...
Move ablk-* functions to separate module to share common code between cipher
implementations.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-27 14:42:00 +08:00
Jussi Kivilinna
3387e7d690
crypto: serpent-sse2/avx - allow both to be built into kernel
...
Rename serpent-avx assembler functions so that they do not collide with
serpent-sse2 assembler functions when linking both versions in to same
kernel image.
Reported-by: Randy Dunlap <rdunlap@xenotime.net >
Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-14 10:09:03 +08:00
Johannes Goetzfried
7efe407672
crypto: serpent - add x86_64/avx assembler implementation
...
This patch adds a x86_64/avx assembler implementation of the Serpent block
cipher. The implementation is very similar to the sse2 implementation and
processes eight blocks in parallel. Because of the new non-destructive three
operand syntax all move-instructions can be removed and therefore a little
performance increase is provided.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmark results:
Intel Core i5-2500 CPU (fam:6, model:42, step:7)
serpent-avx-x86_64 vs. serpent-sse2-x86_64
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.03x 1.01x 1.01x 1.01x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x
64B 1.00x 1.00x 1.00x 1.00x 1.00x 0.99x 1.00x 1.01x 1.00x 1.00x
256B 1.05x 1.03x 1.00x 1.02x 1.05x 1.06x 1.05x 1.02x 1.05x 1.02x
1024B 1.05x 1.02x 1.00x 1.02x 1.05x 1.06x 1.05x 1.03x 1.05x 1.02x
8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.03x 1.04x 1.02x
256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.01x 1.00x 1.01x 1.01x 1.00x 1.00x 0.99x 1.03x 1.01x 1.01x
64B 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 1.00x 1.02x
256B 1.05x 1.02x 1.00x 1.02x 1.05x 1.02x 1.04x 1.05x 1.05x 1.02x
1024B 1.06x 1.02x 1.00x 1.02x 1.07x 1.06x 1.05x 1.04x 1.05x 1.02x
8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.05x 1.05x 1.02x
serpent-avx-x86_64 vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.26x 1.73x
ecb-dec 1.20x 1.64x
cbc-enc 0.33x 0.45x
cbc-dec 1.24x 1.67x
ctr-enc 1.32x 1.76x
ctr-dec 1.32x 1.76x
lrw-enc 1.20x 1.60x
lrw-dec 1.15x 1.54x
xts-enc 1.22x 1.64x
xts-dec 1.17x 1.57x
Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-12 16:47:43 +08:00
Johannes Goetzfried
107778b592
crypto: twofish - add x86_64/avx assembler implementation
...
This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
module are called. A good performance increase is provided for blocksizes
greater or equal to 128B.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmark results:
Intel Core i5-2500 CPU (fam:6, model:42, step:7)
twofish-avx-x86_64 vs. twofish-x86_64-3way
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x
64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x
256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x
8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x
256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x
64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x
256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x
8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30x
twofish-avx-x86_64 vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.19x 1.63x
ecb-dec 1.18x 1.62x
cbc-enc 0.75x 1.03x
cbc-dec 1.23x 1.67x
ctr-enc 1.24x 1.65x
ctr-dec 1.24x 1.65x
lrw-enc 1.15x 1.53x
lrw-dec 1.14x 1.52x
xts-enc 1.16x 1.56x
xts-dec 1.16x 1.56x
Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-12 16:46:07 +08:00
Mathias Krause
65df577439
crypto: sha1 - use Kbuild supplied flags for AVX test
...
Commit ea4d26ae
("raid5: add AVX optimized RAID5 checksumming")
introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX
support in binutils based on the same test we have in x86/crypto/ right
now. To minimize duplication drop our implementation in favour to the
one in x86/.
Signed-off-by: Mathias Krause <minipli@googlemail.com >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-06-12 16:37:16 +08:00
Mathias Krause
7c8d51848a
crypto: aesni-intel - fix unaligned cbc decrypt for x86-32
...
The 32 bit variant of cbc(aes) decrypt is using instructions requiring
128 bit aligned memory locations but fails to ensure this constraint in
the code. Fix this by loading the data into intermediate registers with
load unaligned instructions.
This fixes reported general protection faults related to aesni.
References: https://bugzilla.kernel.org/show_bug.cgi?id=43223
Reported-by: Daniel <garkein@mailueberfall.de >
Cc: stable@kernel.org [v2.6.39+]
Signed-off-by: Mathias Krause <minipli@googlemail.com >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-05-31 20:53:22 +10:00
Jussi Kivilinna
ef45b83431
crypto: aesni-intel - move more common code to ablk_init_common
...
ablk_*_init functions share more common code than what is currently in
ablk_init_common. Move all of the common code to ablk_init_common.
Cc: Huang Ying <ying.huang@intel.com >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-05-15 17:25:33 +10:00
Jussi Kivilinna
fa46ccb8eb
crypto: aesni-intel - use crypto_[un]register_algs
...
Combine all crypto_alg to be registered and use new crypto_[un]register_algs
functions. Simplifies init/exit code and reduce object size.
Cc: Huang Ying <ying.huang@intel.com >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-05-15 17:25:33 +10:00
Linus Torvalds
6a76a69923
Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
...
Pull crypto fixes from Herbert Xu:
"This fixes a build problem where two crypto modules both try to export
the same symbols (which shouldn't have been exported in the first
place)."
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: twofish-x86_64-3way - module init/exit functions should be static
crypto: camellia-x86_64 - module init/exit functions should be static
2012-03-22 20:19:30 -07:00
Jussi Kivilinna
ff0a70fe05
crypto: twofish-x86_64-3way - module init/exit functions should be static
...
This caused conflict with camellia-x86_64 when compiled into kernel, same
function names and not static.
Reported-by: Randy Dunlap <rdunlap@xenotime.net >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Acked-by: Randy Dunlap <rdunlap@xenotime.net >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-03-22 09:17:45 +08:00
Jussi Kivilinna
676a38046f
crypto: camellia-x86_64 - module init/exit functions should be static
...
This caused conflict with twofish-x86_64-3way when compiled into kernel,
same function names and not static.
Reported-by: Randy Dunlap <rdunlap@xenotime.net >
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Acked-by: Randy Dunlap <rdunlap@xenotime.net >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-03-22 09:17:44 +08:00
Linus Torvalds
b8716614a7
Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
...
Pull crypto update from Herbert Xu:
"* sha512 bug fixes (already in your tree).
* SHA224/SHA384 AEAD support in caam.
* X86-64 optimised version of Camellia.
* Tegra AES support.
* Bulk algorithm registration interface to make driver registration easier.
* padata race fixes.
* Misc fixes."
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (31 commits)
padata: Fix race on sequence number wrap
padata: Fix race in the serialization path
crypto: camellia - add assembler implementation for x86_64
crypto: camellia - rename camellia.c to camellia_generic.c
crypto: camellia - fix checkpatch warnings
crypto: camellia - rename camellia module to camellia_generic
crypto: tcrypt - add more camellia tests
crypto: testmgr - add more camellia test vectors
crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro
crypto: twofish-x86_64/i586 - set alignmask to zero
crypto: blowfish-x86_64 - set alignmask to zero
crypto: serpent-sse2 - combine ablk_*_init functions
crypto: blowfish-x86_64 - use crypto_[un]register_algs
crypto: twofish-x86_64-3way - use crypto_[un]register_algs
crypto: serpent-sse2 - use crypto_[un]register_algs
crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
crypto: In crypto_add_alg(), 'exact' wants to be initialized to 0
crypto: caam - fix gcc 4.6 warning
crypto: Add bulk algorithm registration interface
...
2012-03-21 13:20:43 -07:00
Linus Torvalds
9f3938346a
Merge branch 'kmap_atomic' of git://github.com/congwang/linux
...
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
2012-03-21 09:40:26 -07:00
Cong Wang
8fd75e1216
x86: remove the second argument of k[un]map_atomic()
...
Acked-by: Avi Kivity <avi@redhat.com >
Acked-by: Herbert Xu <herbert@gondor.apana.org.au >
Signed-off-by: Cong Wang <amwang@redhat.com >
2012-03-20 21:48:15 +08:00
Jussi Kivilinna
0b95ec56ae
crypto: camellia - add assembler implementation for x86_64
...
Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
functions are provided. First set is regular 'one-block at time' encrypt/decrypt
functions. Second is 'two-block at time' functions that gain performance increase
on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
functions with in-order CPUs.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmark results:
AMD Phenom II 1055T (fam:16, model:10):
camellia-asm vs camellia_generic:
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x
64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x
256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x
1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x
8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x
256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x
64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x
256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x
1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x
8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x
camellia-asm vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.15x 1.22x
ecb-dec 1.16x 1.16x
cbc-enc 0.85x 0.90x
cbc-dec 1.20x 1.23x
ctr-enc 1.28x 1.30x
ctr-dec 1.27x 1.28x
lrw-enc 1.12x 1.16x
lrw-dec 1.08x 1.10x
xts-enc 1.11x 1.15x
xts-dec 1.14x 1.15x
Intel Core2 T8100 (fam:6, model:23, step:6):
camellia-asm vs camellia_generic:
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x
64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x
256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x
1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x
8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x
256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x
64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x
256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x
1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x
8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x
camellia-asm vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.17x 1.21x
ecb-dec 1.17x 1.20x
cbc-enc 0.80x 0.82x
cbc-dec 1.22x 1.24x
ctr-enc 1.25x 1.26x
ctr-dec 1.25x 1.26x
lrw-enc 1.14x 1.18x
lrw-dec 1.13x 1.17x
xts-enc 1.14x 1.18x
xts-dec 1.14x 1.17x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-03-14 17:25:56 +08:00
Jussi Kivilinna
8940426489
crypto: twofish-x86_64/i586 - set alignmask to zero
...
x86 has fast unaligned accesses, so twofish-x86_64/i586 does not need to enforce
alignment.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-25 17:20:24 +08:00
Jussi Kivilinna
919e2c3249
crypto: blowfish-x86_64 - set alignmask to zero
...
x86 has fast unaligned accesses, so blowfish-x86_64 does not need to enforce
alignment.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-25 17:20:24 +08:00
Jussi Kivilinna
435d3e51af
crypto: serpent-sse2 - combine ablk_*_init functions
...
Driver name in ablk_*_init functions can be constructed runtime. Therefore
use single function ablk_init to reduce object size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-25 17:20:23 +08:00
Jussi Kivilinna
d433208cfc
crypto: blowfish-x86_64 - use crypto_[un]register_algs
...
Combine all crypto_alg to be registered and use new crypto_[un]register_algs
functions. Simplifies init/exit code and reduce object size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-25 17:20:23 +08:00
Jussi Kivilinna
53709ddee3
crypto: twofish-x86_64-3way - use crypto_[un]register_algs
...
Combine all crypto_alg to be registered and use new crypto_[un]register_algs
functions. Simplifies init/exit code and reduce object size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-25 17:20:22 +08:00
Jussi Kivilinna
35474c3bb7
crypto: serpent-sse2 - use crypto_[un]register_algs
...
Combine all crypto_alg to be registered and use new crypto_[un]register_algs
functions. Simplifies init/exit code and reduce object size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-25 17:20:22 +08:00
Jesper Juhl
6e77fe8c11
crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
...
We cannot reach the line after 'return err'. Remove it.
Signed-off-by: Jesper Juhl <jj@chaosbits.net >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-14 16:34:19 +08:00
Jesper Juhl
8d21190e22
crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
...
We can never reach the line just after the 'return 0'
statement. Remove it.
Signed-off-by: Jesper Juhl <jj@chaosbits.net >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-02-14 16:34:18 +08:00
Andi Kleen
3bd391f056
crypto: Add support for x86 cpuid auto loading for x86 crypto drivers
...
Add support for auto-loading of crypto drivers based on cpuid features.
This enables auto-loading of the VIA and Intel specific drivers
for AES, hashing and CRCs.
Requires the earlier infrastructure patch to add x86 modinfo.
I kept it all in a single patch for now.
I dropped the printks when the driver cpuid doesn't match (imho
drivers never should print anything in such a case)
One drawback is that udev doesn't know if the drivers are used or not,
so they will be unconditionally loaded at boot up. That's better
than not loading them at all, like it often happens.
Cc: Dave Jones <davej@redhat.com >
Cc: Kay Sievers <kay.sievers@vrfy.org >
Cc: Jen Axboe <axboe@kernel.dk >
Cc: Herbert Xu <herbert@gondor.apana.org.au >
Cc: Huang Ying <ying.huang@intel.com >
Signed-off-by: Andi Kleen <ak@linux.intel.com >
Signed-off-by: Thomas Renninger <trenn@suse.de >
Acked-by: H. Peter Anvin <hpa@zytor.com >
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de >
2012-01-26 16:48:10 -08:00
Jussi Kivilinna
847cb7ef56
crypto: serpent-sse2 - change transpose_4x4 to only use integer instructions
...
Matrix transpose macro in serpent-sse2 uses mix of SSE2 integer and SSE floating
point instructions, which might cause performance penality on some CPUs.
This patch replaces transpose_4x4 macro with version that uses only SSE2
integer instructions.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-01-13 16:38:40 +11:00
Jussi Kivilinna
4c58464b80
crypto: blowfish-x86_64 - blacklist Pentium 4
...
Implementation in blowfish-x86_64 uses 64bit rotations which are slow on P4,
making blowfish-x86_64 slower than generic C implementation. Therefore
blacklist P4.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-01-13 16:38:39 +11:00
Jussi Kivilinna
a522ee85ba
crypto: twofish-x86_64-3way - blacklist pentium4 and atom
...
Performance of twofish-x86_64-3way on Intel Pentium 4 and Atom is lower than
of twofish-x86_64 module. So blacklist these CPUs.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2012-01-13 16:38:39 +11:00
Jussi Kivilinna
7ba8babf84
crypto: serpent-sse2 - remove unneeded LRW/XTS #ifdefs
...
Since LRW & XTS are selected by serpent-sse2, we don't need these #ifdefs
anymore.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2011-12-20 15:20:08 +08:00
Jussi Kivilinna
88715b9ade
crypto: twofish-x86_64-3way - remove unneeded LRW/XTS #ifdefs
...
Since LRW & XTS are selected by twofish-x86_64-3way, we don't need these
#ifdefs anymore.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2011-12-20 15:20:07 +08:00
Jussi Kivilinna
d356433856
crypto: serpent-sse2 - clear CRYPTO_TFM_REQ_MAY_SLEEP in lrw and xts modes
...
LRW/XTS patches for serpent-sse2 forgot to add this. CRYPTO_TFM_REQ_MAY_SLEEP
should be cleared as sleeping between kernel_fpu_begin()/kernel_fpu_end() is
not allowed.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2011-11-21 16:13:25 +08:00
Jussi Kivilinna
5962f8b66d
crypto: serpent-sse2 - add xts support
...
Patch adds XTS support for serpent-sse2 by using xts_crypt(). Patch has been
tested with tcrypt and automated filesystem tests.
Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):
Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
size xts-enc xts-dec
16B 0.98x 1.00x
64B 1.00x 1.01x
256B 2.78x 2.75x
1024B 3.30x 3.26x
8192B 3.39x 3.30x
AMD Phenom II 1055T (x86_64) (fam:16, model:10):
size xts-enc xts-dec
16B 1.05x 1.02x
64B 1.04x 1.03x
256B 2.10x 2.05x
1024B 2.34x 2.35x
8192B 2.34x 2.40x
Intel Atom N270 (i586):
size xts-enc xts-dec
16B 0.95x 0.96x
64B 1.53x 1.50x
256B 1.72x 1.75x
1024B 1.88x 1.87x
8192B 1.86x 1.83x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2011-11-21 16:13:24 +08:00
Jussi Kivilinna
18482053f9
crypto: serpent-sse2 - add lrw support
...
Patch adds LRW support for serpent-sse2 by using lrw_crypt(). Patch has been
tested with tcrypt and automated filesystem tests.
Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):
Benchmark results with tcrypt:
Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
size lrw-enc lrw-dec
16B 1.00x 0.96x
64B 1.01x 1.01x
256B 3.01x 2.97x
1024B 3.39x 3.33x
8192B 3.35x 3.33x
AMD Phenom II 1055T (x86_64) (fam:16, model:10):
size lrw-enc lrw-dec
16B 0.98x 1.03x
64B 1.01x 1.04x
256B 2.10x 2.14x
1024B 2.28x 2.33x
8192B 2.30x 2.33x
Intel Atom N270 (i586):
size lrw-enc lrw-dec
16B 0.97x 0.97x
64B 1.47x 1.50x
256B 1.72x 1.69x
1024B 1.88x 1.81x
8192B 1.84x 1.79x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2011-11-21 16:13:24 +08:00
Jussi Kivilinna
251496dbfc
crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation
...
Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in four block chunks.
Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):
Intel Atom N270:
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x
64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x
256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x
1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x
8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x
Full output:
http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt
Userspace test results:
Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
encrypt: 2.35x
decrypt: 2.54x
Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
encrypt: 1.82x
decrypt: 2.51x
Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
encrypt: 2.99x
decrypt: 3.48x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi >
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au >
2011-11-21 16:13:23 +08:00