loop-aes on newer Intel CPUs using the AES-NI instruction set can get an 8x (or more) speedup. As a rule of thumb, Intel i5 / i7 / Xeon CPUs with 32nm manufacturing have support for the AES-NI instruction set. loop-aes builds AES-NI support when the 'INTELAES=y' parameter is passed to make. I will attach a patch to the current loop-aes-3.6b.ebuild that adds an 'aes-ni' USE flag. Running a loop.ko compiled with INTELAES=y appears to be completely harmless, it works just fine in my limited testing. Reproducible: Always Steps to Reproduce: 1. Build normally 2. On a decent CPU you will max out at ~200mbytes/sec with 100% CPU load--not fast enough to keep up with any decent RAID setup. 3. Build with INTELAES=y 4. Get over 1 gigabyte/sec encryption, and nearly 2 gbytes/sec decryption. Lower CPU cost for any setup, and keep up with just about any RAID system. Actual Results: Here are some tests performed against a brd-based ramdisk (to eliminate disk speeds from the equation): # losetup -e AES128 /dev/loop0 /dev/ram0 # mke2fs /dev/loop0 # mount /dev/loop0 /mnt/ramdisk/ # Typical write (encrypt) performance # dd if=/dev/zero of=/mnt/ramdisk/zeros.dd bs=1024k count=4096 conv=fsync iflag=sync oflag=sync,direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 24.074 s, 178 MB/s # Typical read (decrypt) performance gkrack ~ # dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct dd: fsync failed for `/dev/null': Invalid argument 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 23.3257 s, 184 MB/s One can get faster decrypts by making multiple parallel threads: # modprobe loop lo_threads=4 ... # dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct dd: fsync failed for `/dev/null': Invalid argument 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 6.4834 s, 662 MB/s ...But only with commensurate CPU cost. Expected Results: Here are the same tests with a loop-aes compiled with INTELAES=y (enabled using the 'aes-ni' USE flag from my patch) # ramdisk + loop-aes AES128, hw-accel, one thread # Write (encrypt) # dd if=/dev/zero of=/mnt/ramdisk/zeros.dd bs=1024k count=4096 conv=fsync iflag=sync oflag=sync,direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 4.07551 s, 1.1 GB/s # Read (decrypt) # dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct oflag=sync dd: fsync failed for `/dev/null': Invalid argument 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 3.36718 s, 1.3 GB/s # ramdisk + loop-aes AES128, hw-accel, 4 threads # Write performance is unchanged # Read performance gets a nice boost # dd if=/mnt/ramdisk/zeros.dd of=/dev/null bs=1024k count=4096 conv=fsync iflag=sync,direct oflag=sync dd: fsync failed for `/dev/null': Invalid argument 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 2.4677 s, 1.7 GB/s You can check for the aes-ni instructions by looking for the 'aes' flag in /proc/cpuinfo: # egrep ' aes ' /proc/cpuinfo | head -1 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
Created attachment 268779 [details, diff] add aes-ni USE flag to loop-aes
Done in CVS. Thanks for reporting! 08 Jun 2011; Dane Smith <c1pher@gentoo.org> +loop-aes-3.6b-r1.ebuild, metadata.xml: Revision bump. Add support for AES-NI via new use flag aes-ni. Closes bug 362357. Closing!