New version is out
ATLAS 3.9.12 released 08/06/09, changes from 3.9.11 * Complete rewrite of GER, SYR and SYR2: - New tuning mechinism tunes GER for in-L1, in-L2, and out-of-cache * Call ATL_<pre>ger_L1 if data known to be in L1 cache * Call ATL_<pre>ger_L2 if data known to be in L2 cache - Most architectures now lack GER arch defs * Provided GER archdefs 64-bit K10h and Core2 - atlas_devel not yet updated * Relatively untested standard timing/tester code available for all tuned kernels (GER fairly well tested) - atlas_[mv,r1,mm]parse.h reads standard input/output files - atlas_[mv,r1,mm]testtime.h provides tester/timer calls for kernels * Can compile both lapack 3.2 and 3.1 with --with-netlib-lapack-tarfile - Removed support for other ways of building lapack - atlas_install mostly updated * Bug fixes - Fixed BETA=0 SCAL NaN-propogation bug (no more call to ATL_set) - Fixed C/Z GEMM JITcp bug where C was read when BETA=0 - Fixed threaded LAPACK calling serial ilaenv (QR speedup) ATLAS 3.9.11 released 04/07/09, changes from 3.9.10 * Added flags -Si [omp,antthr] 0/1/2 to allow ease of building ATLAS with alternative threading implementations * Fixed prototypes in atlas_f77wrap.h so that all thread interfaces are properly prototyped when they are selected by the above flags * Fixed missing TRMM prototype in atlas_tlvl3.h that caused STRSM to fail tests in xsl3blastst_pt ATLAS 3.9.10 released 03/11/09, changes from 3.9.9 * Rewrote tgemm's combine routine to work on arbitrary partitionings combined in arbitrary orders (necessary for non-power-of-2 processors) - Restricted fix for SYRK (not general, as it isn't needed yet) * Fixed bug in EnforceNonPwr2LO caused by failure to rename moved structure in the Cinfp array * Fixed makefile problem that caused ATLAS to re-archive the L3BLAS for every tester compile * On windows, added -lkernel32 to LIBS macro to enable shared lib build ATLAS 3.9.9 released 02/26/09, changes from 3.9.8 * Fixed bug in Xtsyrk's ATL_tsyrkdecomp_K, both on when the algorithm is used, and correctness for when K is not large enough to give all processors NB of work. * Fixed bug in lanbtst, where single precision (S/C) used double values rather than single values when determining workspace requirements * Changed atlas_install to have a final library build phase - Was not rebuilding lib after post-build tuning -> Caused lapack and poss other files to be untuned unless user rebuilds by invoking tester/timer for each subpiece -> Caused dynamic libs to be built from badly tuned libs * Added missing lapack arch defs for Corei764 and MIPSICE9 ATLAS 3.9.8 released 02/23/09, changes from 3.9.7 * Fixed bug in ATL_Xtgemm where ATL_thrdecompMM failed to return the number of processors on non-power-of-2 processor systems * Fixed bug in ATL_tsyrk where I was calling the K-splitting routine when the required workspace was large, rather than when it was small. * Fixed analagous problem in ATL_tsyrk as the 3.9.7 did for ATL_tgemm; however, tsyrk bug could not have been exercised by current decomposition. * Introduced some fixes & workarounds for SiCortex/MIPSICE9: - Changed default MIPSICE9 compiler back to gcc, since pathcc produces bad ATL_tsyrk when optimization is above -O1 (confirmed compiler error) * Added dependence on atlas_ptalias3.h in cblas interface Makefile. ATLAS 3.9.7 released 02/20/09, changes from 3.9.6: * Fixed bug in ATL_tgemm that caused seg faults for some small-M tGEMMs * Added architectural defaults for K7323DNow (Athlon "classic") ATLAS 3.9.6 released 02/01/09, Changes from 3.9.5: * Made it so LAPACK is tuned specifically for threading as well as for serial - Added threaded lapack arch defs for: + Core264SSE3, P4E64SSE3, Corei764SSE3 * Made it so LAPACK NB-tuning is mu/nu aware * MIPSICE9 (sicortex) improvements: - added pathcc arch defs - updated gcc arch defs to better values --> Still getting errors on this platform * Some bug fixes: - Detect model 29 as Core2 - Rewrote ptFlushAreasByCL to use new thread framework - Fixed handling of non-power-of-2 number of threads - Better dependencies for building ilaenv ATLAS 3.9.5 released 12/11/08, Changes from 3.9.4: * Complete rewrite of ATLAS threading system: - Now supports native windows threads in addition to pthreads - Use of master-last and affinity increases threaded performance, with an advantage that grows with P (almost no advantage for P=2, but for instance LU is more than 60% faster asymptotically on a P=8 Core2) + OS X and FreeBSD don't support processor affinity, and so their performance is still bad - Cacheedge specifically tuned for threading (another 5%) * Changed emit_buildinfo so that it replaces all control characters with spaces (prevents errors under windows). * Added dependency info for ATL_ilaenv so that it is recompiled once lapack tuning is complete * Fixed error in configure where it issues commands in wrong directory when the user builds lapack directly from a tarfile * Fixed typos in config.c where I used 'comp' rather than 'comps'. * Added mmtime_pt.c, which can allow us to find kernels that do well in parallel operation. * Various small configure fixes for windows ATLAS 3.9.4 released 09/06/08, Changes from 3.9.3: * Improved Windows/cygwin configure with addition of archinfo_win.c * Added basic support for Windows/interix - Did not pursue much due to widespread seg fault in gcc, hundreds of hard-to-get "hot fixes", and ancient gnu tools that can't assemble SSE3 * Removed special "no-need-to-copy" cases from ATLmm_JIK/IJK.c, since they occasionally seem to cause large performance drops. * Changed it so JIK matmul always called for rank-K update, in order to reduce access costs on C. * Fixed several errors in ATLAS's ILAENV. * Fixed several errors in configure * Fixed error when -Ss lasrc is given as relative rather than absolute path * Added BETA support for auto-building shared/dynamic libraries when the user passes --shared to configure (no need to explicitly set compiler flags [eg., -fPIC] for any of the known compilers): - Not fully tested, but appears to work for Windows, OS X and Linux - Now referenced in make install, but present process is crude - with --nof77, get clapack reather than lapack; eventually probably want a logical link of lapack
*** Bug 277199 has been marked as a duplicate of this bug. ***
(In reply to comment #0) > New version is out > The version is now 3.9.21, and it's becoming urgent for me, as I have run into things that were broken in 3.9.3 that are supposedly fixed. Unfortunately the size of the shared library patch is pretty daunting, though I may give it a try anyway.
(In reply to comment #3) > (In reply to comment #0) > > New version is out > > > > The version is now 3.9.21, and it's becoming urgent for me, as I have run into > things that were broken in 3.9.3 that are supposedly fixed. > I'll try to have a look at it soon but past experience has shown that it may take a while before all bits are working properl ;) cheers, Markus
Just FYI. I am mostly done with this. Unfortunately, there's a bug in atlas' build system which fails to build the threaded version of ILAENV which is new in 3.9.21. This might take me some time to track down. cheers, Markus
Thanks! I will wait for that.
Thanks for looking at this, Markus. I did get far enough in the installation to notice that the shared-library stuff seems to all be incorporated now. The bug in the old code that is bothering me is failure of some numpy SVD routines which the newer ATLAS supposedly has fixed.
I've added 3.9.21 versions for blas/lapack-atlas to the tree and both seem to be working (at least for the tests I've tried). @Joel: I hope these fix your issues. cheers, Markus