Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 362357 - add INTELAES=y option to sys-fs/loop-aes for an 8x speedup
Summary: add INTELAES=y option to sys-fs/loop-aes for an 8x speedup
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Crypto team [DISABLED]
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-06 22:23 UTC by Hank Leininger
Modified: 2011-06-08 13:14 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
add aes-ni USE flag to loop-aes (loop-aes-3.6b-aes-ni.patch,708 bytes, patch)
2011-04-06 22:23 UTC, Hank Leininger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hank Leininger 2011-04-06 22:23:02 UTC
loop-aes on newer Intel CPUs using the AES-NI instruction set can get an 8x (or more) speedup.  As a rule of thumb, Intel i5 / i7 / Xeon CPUs with 32nm manufacturing have support for the AES-NI instruction set.

loop-aes builds AES-NI support when the 'INTELAES=y' parameter is passed to make.  I will attach a patch to the current loop-aes-3.6b.ebuild that adds an 'aes-ni' USE flag.

Running a loop.ko compiled with INTELAES=y appears to be completely harmless, it works just fine in my limited testing.

Reproducible: Always

Steps to Reproduce:
1. Build normally

2. On a decent CPU you will max out at ~200mbytes/sec with 100% CPU load--not fast enough to keep up with any decent RAID setup.

3. Build with INTELAES=y

4. Get over 1 gigabyte/sec encryption, and nearly 2 gbytes/sec decryption.  Lower CPU cost for any setup, and keep up with just about any RAID system.
Actual Results:  
Here are some tests performed against a brd-based ramdisk (to eliminate disk speeds from the equation):

# losetup -e AES128 /dev/loop0 /dev/ram0
# mke2fs /dev/loop0
# mount /dev/loop0 /mnt/ramdisk/

# Typical write (encrypt) performance
# dd if=/dev/zero of=/mnt/ramdisk/zeros.dd bs=1024k count=4096 conv=fsync iflag=sync oflag=sync,direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 24.074 s, 178 MB/s

# Typical read (decrypt) performance
gkrack ~ # dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct           
dd: fsync failed for `/dev/null': Invalid argument
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 23.3257 s, 184 MB/s

One can get faster decrypts by making multiple parallel threads:
# modprobe loop lo_threads=4
...
# dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct
dd: fsync failed for `/dev/null': Invalid argument
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 6.4834 s, 662 MB/s

...But only with commensurate CPU cost.

Expected Results:  
Here are the same tests with a loop-aes compiled with INTELAES=y (enabled using the 'aes-ni' USE flag from my patch)

# ramdisk + loop-aes AES128, hw-accel, one thread

# Write (encrypt)
# dd if=/dev/zero of=/mnt/ramdisk/zeros.dd bs=1024k count=4096 conv=fsync iflag=sync oflag=sync,direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 4.07551 s, 1.1 GB/s

# Read (decrypt)
# dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct oflag=sync
dd: fsync failed for `/dev/null': Invalid argument
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 3.36718 s, 1.3 GB/s

# ramdisk + loop-aes AES128, hw-accel, 4 threads
# Write performance is unchanged

# Read performance gets a nice boost
# dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct oflag=sync
dd: fsync failed for `/dev/null': Invalid argument
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 2.4677 s, 1.7 GB/s


You can check for the aes-ni instructions by looking for the 'aes' flag in /proc/cpuinfo:

# egrep ' aes ' /proc/cpuinfo | head -1
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
Comment 1 Hank Leininger 2011-04-06 22:23:51 UTC
Created attachment 268779 [details, diff]
add aes-ni USE flag to loop-aes
Comment 2 Dane Smith (RETIRED) gentoo-dev 2011-06-08 13:14:29 UTC
Done in CVS. Thanks for reporting!

  08 Jun 2011; Dane Smith <c1pher@gentoo.org> +loop-aes-3.6b-r1.ebuild,
  metadata.xml:
  Revision bump. Add support for AES-NI via new use flag aes-ni. Closes bug
  362357.

Closing!