Package Details: openblas-lapack 0.3.0-1

Git Clone URL: https://aur.archlinux.org/openblas-lapack.git (read-only)
Package Base: openblas-lapack
Description: Optimized BLAS library based on GotoBLAS2 1.13 BSD (providing blas, lapack, and cblas)
Upstream URL: http://www.openblas.net/
Licenses: BSD
Conflicts: blas, cblas, lapack, lapacke, openblas
Provides: blas=3.8.0, cblas=3.8.0, lapack=3.8.0, lapacke=3.8.0, openblas
Submitter: sftrytry
Maintainer: eolianoe
Last Packager: eolianoe
Votes: 75
Popularity: 4.458204
First Submitted: 2013-11-20 23:53
Last Updated: 2018-06-04 19:20

Required by (263)

Sources (1)

Latest Comments

jerry73204 commented on 2018-05-10 17:41

The dependency gcc-libs changes the version number of shared library libgfortran.so.5 from 4 to 5. I suggest to increase the pkgrel to enforce re-building of this package.

eolianoe commented on 2018-03-19 21:06

@asitdepends: why do you need static libs? If you want them, you juste need to staticlibs to the options

asitdepends commented on 2018-03-11 14:04

Would you update the package to make the static libraries also?

MaartenBaert commented on 2017-10-08 20:05

I know, OMP_NUM_THREADS has the same effect. But I am getting slightly better performance when I compile without multithreading. Probably because it reduces overhead.

adfjjv commented on 2017-10-08 09:27

@MaartenBaert If you're using OpenBlas in a multi-threaded application the simplest thing is to disable threading by setting environment variable OPENBLAS_NUM_THREADS to 1. No need to recompile.

See FAQ: https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

MaartenBaert commented on 2017-10-07 03:15

Actually it seems to be more complicated than this. I managed to fix the problem this way on another machine (running CentOS 6), but the same change doesn't work on this machine.

CHOLMOD always launches 3 threads even with OMP_NUM_THREADS=1. On my Arch machine, OpenBLAS launches 3 extra threads and the performance crashes. On the CentOS 6 machine however, OpenBLAS launches 4 extra threads and the performance is only slightly degraded. Both machines have 4 physical cores. Looks like I still don't fully understand it.

So far I'm getting the best performance with multithreading completely disabled, i.e. USE_OPENMP=0 USE_THREAD=0. This still gives me multithreading in CHOLMOD but it solves the problem, and apparently also reduces overhead slightly.

EDIT: After recompiling yet another time with USE_OPENMP=1 (USE_THREAD not defined), the problem is solved on the Arch machine as well. The behaviour is now the same as on the CentOS 6 machine: 3 threads from CHOLMOD, 4 extra threads from OpenBLAS. Performance is still better with multithreading completely disabled, but at least it is usable now.

MaartenBaert commented on 2017-10-07 02:29

I found the problem. The issue here is USE_OPENMP=0 USE_THREAD=1. CHOLMOD uses OpenMP, and this is creating a conflict with the non-OpenMP based multithreading in this OpenBLAS package.

My CPU has 4 physical cores, so OpenMP would normally create 3 worker threads to occupy the remaining cores. Instead I'm seeing 6 worker threads: 3 with 100% CPU usage (from OpenMP) and 3 with 25% CPU usage (from CHOLMOD). Breaking with a debugger attached at random times suggests that these workers are mostly just busy-waiting, wasting CPU time without doing real computation. The OpenMP threads use spinlocks, which is a terrible idea when you have more threads than physical cores.

I tried testing again with OMP_WAIT_POLICY=PASSIVE (which disables the spinlocks), this results in performance comparable to the OpenMP-enabled package. OMP_NUM_THREADS=1 (which eliminates 3 worker threads) has a similar effect.

Conclusion: if the parent application/library uses OpenMP, OpenBLAS should also be compiled with OpenMP support or the performance will be terrible.

adfjjv commented on 2017-10-05 17:12

@MaartenBaert What is your hardware? Did you compile the package on a different machine? The PKGBUILD looks like it doesn't create a hardware-agnostic package. I think it would need at a minimum DYNAMIC_ARCH=1.

MaartenBaert commented on 2017-10-02 02:45

I was comparing the performance of various BLAS/LAPACK packages (through CHOLMOD from SuiteSparse), and was surprised to find that this package is ~40 times slower than the 'openblas' package without LAPACK:

blas + lapack: 10015ms
openblas + lapack: 3721ms
openblas-lapack: 161417ms
atlas-lapack: 4232ms

Does anyone know what could be causing this?

eolianoe commented on 2017-07-14 14:42

@solnce: building fine in an up to date clean chroot. OpenMP is not enabled in this PKGBUILD, so I do not understand why there is some references to OpenMP routines.

All comments