Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 567938 - dev-python/numpy-1.10.1-r1 Ebuild Relies on Removed Feature?
Summary: dev-python/numpy-1.10.1-r1 Ebuild Relies on Removed Feature?
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Science Related Packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-10 18:35 UTC by Mark Fenner
Modified: 2015-12-18 17:44 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Output of 'emerge -DuqN world' (error,10.65 KB, text/plain)
2015-12-16 04:23 UTC, Sebastian Pucilowski
Details
Output of 'emerge --info' (file_567938.txt,4.71 KB, text/plain)
2015-12-16 04:24 UTC, Sebastian Pucilowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Fenner 2015-12-10 18:35:32 UTC
It appears that the 1.10.1-r1 ebuild is mostly a version bump from 1.9.x.  

Line 77 says "#make sure _dotblas.so gets built".  Unfortunately, _dotblas.so was dropped in NumPy 1.10 (see Dropped Support here http://docs.scipy.org/doc/numpy-1.10.1/release.html#numpy-1-10-0-release-notes).  At minimum, this needs to be cleaned up -- but there may be other issues.

On my system, when building against OpenBlas (science overlay) for blas support, the 1.10.1-r1 build results in an unoptimized (very slow) np.dot().  Same system, with the 1.9.2 ebuild gives an optimized (fast) np.dot() call.  I'm not sure if this is a side-effect of the site.cfg hack I reference above, or if there are other differences in the numpy build process.

Admittedly, openblas is from science overlay (not main tree), but the regression in functionality may point to other issues (namely problems with other blas/lapack libs).
Comment 1 Justin Lecher (RETIRED) gentoo-dev 2015-12-13 09:02:48 UTC
(In reply to Mark Fenner from comment #0)
> It appears that the 1.10.1-r1 ebuild is mostly a version bump from 1.9.x.  
> 
> Line 77 says "#make sure _dotblas.so gets built".  Unfortunately,
> _dotblas.so was dropped in NumPy 1.10 (see Dropped Support here
> http://docs.scipy.org/doc/numpy-1.10.1/release.html#numpy-1-10-0-release-
> notes).  At minimum, this needs to be cleaned up -- but there may be other
> issues.
> 

Thanks for pointing this out. I will closer look into this.

> On my system, when building against OpenBlas (science overlay) for blas
> support, the 1.10.1-r1 build results in an unoptimized (very slow) np.dot().
> Same system, with the 1.9.2 ebuild gives an optimized (fast) np.dot() call. 
> I'm not sure if this is a side-effect of the site.cfg hack I reference
> above, or if there are other differences in the numpy build process.

numpy-1.10 has a huge performance problem which upstream already worked on. This is independent of our ebuilds. But could you please provide some code to test for this speed problem you are describing? I will look into a workaround.
Comment 2 Justin Lecher (RETIRED) gentoo-dev 2015-12-13 14:06:14 UTC
commit 4c00600c36fd20bd7f64a2a0ca4487c6773f15ca
Author: Justin Lecher <jlec@gentoo.org>
Date:   Sun Dec 13 13:46:51 2015 +0100

    dev-python/numpy: Drop obsolete dotblas handling

    Gentoo-Bug: https://bugs.gentoo.org/show_bug.cgi?id=567938

    Package-Manager: portage-2.2.26
    Signed-off-by: Justin Lecher <jlec@gentoo.org>

    https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4c00600c36fd20bd7f64a2a0ca4487c6773f15ca
Comment 3 Mark Fenner 2015-12-14 14:46:21 UTC
Justin,

The performance difference is the difference between (1) Fast - uses BLAS dgemm under the hood and (2) Slow - uses Numpy fall back dot product.  Basically, run under the two conditions (a good build and a bad build) and you'll see orders of magnitude difference in run times.

When I said "optimized", I should have said "uses BLAS routine" to be more specific.

For example, in ipython notebook (to access easy %timeit functionality):

```python
X = np.random.uniform(0,10,(2**10,2**16))
%timeit X.T.dot(X)
```

Takes a few seconds when the BLAS routine (dgemm) is used but takes very, very long (I stopped it before completion) when it doesn't have BLAS support.


Also, after digging into it a bit, I discovered that the NumPy build system (for 1.10.1)  doesn't seem to honor site.cfg properly (for some set of configuration cases).  I filed a bug regarding LAPACK (https://github.com/numpy/numpy/issues/6810) but I think I had the same issues with OpenBLAS (namely:  using environment variables to drive the NumPy build got a properly linked np.dot --> dgemm but using site.cfg gave a fall back np.dot).

Best,
Mark
Comment 4 Justin Lecher (RETIRED) gentoo-dev 2015-12-15 15:47:02 UTC
Please test 1.10.2. Works better here

commit 0280d4a259b2cb4393da0fb377422a667e611d73
Author: Justin Lecher <jlec@gentoo.org>
Date:   Tue Dec 15 16:46:10 2015 +0100

    dev-python/numpy: Version Bump

    fixes performance regressions

    Gentoo-Bug: https://bugs.gentoo.org/show_bug.cgi?id=567938

    Package-Manager: portage-2.2.26
    Signed-off-by: Justin Lecher <jlec@gentoo.org>

    https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=0280d4a259b2cb4393da0fb377422a667e611d73
Comment 5 Mark Fenner 2015-12-15 17:36:28 UTC
1.10.2 hadn't propagated to my sync yet, but I grabbed it to my local portage overlay.

I had one problem with the 1.10.2 ebuild and I also noticed one additional possible fix:  

********************  Problem
The problem was with trying to apply a 1.9.2 patch:

/usr/local/portage/dev-python/numpy/files/numpy-1.9.2-no-hardcode-blas.patch

Which probably works in my main repo (with the 1.10.1-r1 ebuild) b/c I have 1.9.2 installed.  I moved forward by manually copying the patch file to my numpy/files directory in my local overlay.

It seems to have worked (well done and thank you!):

```ipython
In [1]: import numpy as np
In [2]: X = np.random.uniform(0,10,(2**16,2**10))
In [3]: %timeit X = np.random.uniform(0,10,(2**16,2**10))
1 loops, best of 3: 799 ms per loop
```

******************** Potential to fix another issue
The potential fix regards the docs.  The 1.9.x and 1.10.x ebuilds all seem to reference the same 1.9.x docs (with a comment about referencing the 1.8.x docs).  At least the 1.10.x series can probably be updated to (as of today) the 1.10.1 docs.  The following seemed to be valid URLs with valid doc files:
http://docs.scipy.org/doc/numpy-1.10.1/numpy-html-1.10.1.zip
http://docs.scipy.org/doc/numpy-1.10.1/numpy-ref-1.10.1.pdf
Comment 6 Sebastian Pucilowski 2015-12-16 04:23:27 UTC
Created attachment 419306 [details]
Output of 'emerge -DuqN world'
Comment 7 Sebastian Pucilowski 2015-12-16 04:24:17 UTC
Created attachment 419308 [details]
Output of 'emerge --info'
Comment 8 Sebastian Pucilowski 2015-12-16 04:25:52 UTC
dev-python/numpy-1.10.2 has propagated to my local mirror. Emerging the update from dev-python/numpy-1.10.1-r1 fails for me: it appears to be missing dot functions or blas.
Comment 9 Justin Lecher (RETIRED) gentoo-dev 2015-12-16 08:49:46 UTC
ommit e5ce90a04e79f6413604e96e4803cb95ada7c859
Author: Justin Lecher <jlec@gentoo.org>
Date:   Wed Dec 16 09:48:45 2015 +0100

    dev-python/numpy: Fix linking to cblas and update docs

    Gentoo-Bug: https://bugs.gentoo.org/show_bug.cgi?id=567938

    Package-Manager: portage-2.2.26
    Signed-off-by: Justin Lecher <jlec@gentoo.org>

    https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e5ce90a04e79f6413604e96e4803cb95ada7c859
Comment 10 Mark Fenner 2015-12-18 17:44:27 UTC
For the record,

I had no problem with emerging 1.10.2-r1 from my sync'd portage tree.  And, my `np.dot` was nice and quick :).  

This was with openblas (0.2.15::science) and lapack-reference (3.6.0-r1).

Best,
Mark