Package Details: openblas-lapack 0.2.20-1

Git Clone URL: https://aur.archlinux.org/openblas-lapack.git (read-only)
Package Base: openblas-lapack
Description: Optimized BLAS library based on GotoBLAS2 1.13 BSD (providing blas, lapack, and cblas)
Upstream URL: http://www.openblas.net/
Licenses: BSD
Conflicts: blas, cblas, lapack, lapacke, openblas
Provides: blas=3.7.0, cblas=3.7.0, lapack=3.7.0, lapacke=3.7.0, openblas
Submitter: sftrytry
Maintainer: eolianoe
Last Packager: eolianoe
Votes: 60
Popularity: 2.806535
First Submitted: 2013-11-20 23:53
Last Updated: 2017-07-24 14:29

Required by (220)

Sources (1)

Latest Comments

MaartenBaert commented on 2017-10-08 20:05

I know, OMP_NUM_THREADS has the same effect. But I am getting slightly better performance when I compile without multithreading. Probably because it reduces overhead.

adfjjv commented on 2017-10-08 09:27

@MaartenBaert If you're using OpenBlas in a multi-threaded application the simplest thing is to disable threading by setting environment variable OPENBLAS_NUM_THREADS to 1. No need to recompile.

See FAQ: https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

MaartenBaert commented on 2017-10-07 03:15

Actually it seems to be more complicated than this. I managed to fix the problem this way on another machine (running CentOS 6), but the same change doesn't work on this machine.

CHOLMOD always launches 3 threads even with OMP_NUM_THREADS=1. On my Arch machine, OpenBLAS launches 3 extra threads and the performance crashes. On the CentOS 6 machine however, OpenBLAS launches 4 extra threads and the performance is only slightly degraded. Both machines have 4 physical cores. Looks like I still don't fully understand it.

So far I'm getting the best performance with multithreading completely disabled, i.e. USE_OPENMP=0 USE_THREAD=0. This still gives me multithreading in CHOLMOD but it solves the problem, and apparently also reduces overhead slightly.

EDIT: After recompiling yet another time with USE_OPENMP=1 (USE_THREAD not defined), the problem is solved on the Arch machine as well. The behaviour is now the same as on the CentOS 6 machine: 3 threads from CHOLMOD, 4 extra threads from OpenBLAS. Performance is still better with multithreading completely disabled, but at least it is usable now.

MaartenBaert commented on 2017-10-07 02:29

I found the problem. The issue here is USE_OPENMP=0 USE_THREAD=1. CHOLMOD uses OpenMP, and this is creating a conflict with the non-OpenMP based multithreading in this OpenBLAS package.

My CPU has 4 physical cores, so OpenMP would normally create 3 worker threads to occupy the remaining cores. Instead I'm seeing 6 worker threads: 3 with 100% CPU usage (from OpenMP) and 3 with 25% CPU usage (from CHOLMOD). Breaking with a debugger attached at random times suggests that these workers are mostly just busy-waiting, wasting CPU time without doing real computation. The OpenMP threads use spinlocks, which is a terrible idea when you have more threads than physical cores.

I tried testing again with OMP_WAIT_POLICY=PASSIVE (which disables the spinlocks), this results in performance comparable to the OpenMP-enabled package. OMP_NUM_THREADS=1 (which eliminates 3 worker threads) has a similar effect.

Conclusion: if the parent application/library uses OpenMP, OpenBLAS should also be compiled with OpenMP support or the performance will be terrible.

adfjjv commented on 2017-10-05 17:12

@MaartenBaert What is your hardware? Did you compile the package on a different machine? The PKGBUILD looks like it doesn't create a hardware-agnostic package. I think it would need at a minimum DYNAMIC_ARCH=1.

MaartenBaert commented on 2017-10-02 02:45

I was comparing the performance of various BLAS/LAPACK packages (through CHOLMOD from SuiteSparse), and was surprised to find that this package is ~40 times slower than the 'openblas' package without LAPACK:

blas + lapack: 10015ms
openblas + lapack: 3721ms
openblas-lapack: 161417ms
atlas-lapack: 4232ms

Does anyone know what could be causing this?

eolianoe commented on 2017-07-14 14:42

@solnce: building fine in an up to date clean chroot. OpenMP is not enabled in this PKGBUILD, so I do not understand why there is some references to OpenMP routines.

solnce commented on 2017-07-14 07:49

Building the most recent version fails for me.


gcc -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=12 -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I. -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=12 -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -Wl,-O1,--sort-common,--as-needed,-z,relro -w -o linktest linktest.c ../libopenblas_nehalemp-r0.2.19.so -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1 -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1/../../.. -lgfortran -lm -lquadmath -lm -lc && echo OK.
../libopenblas_nehalemp-r0.2.19.so: undefined reference to `GOMP_parallel'
../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_in_parallel'
../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_set_num_threads'
../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_get_num_threads'
../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_get_max_threads'
../libopenblas_nehalemp-r0.2.19.so: undefined reference to `omp_get_thread_num'
collect2: Fehler: ld gab 1 als Ende-Status zurück

richli commented on 2017-07-13 19:00

Could you add another symlink?

ln -sf libopenblas.so libcblas.so.${_lapackver:0:1}

Otherwise, it seems a recent update to python-numpy fails:

$ python -c 'import numpy'
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
from . import multiarray
ImportError: libcblas.so.3: cannot open shared object file: No such file or directory

eolianoe commented on 2017-07-10 17:26

@xyproto: I'm fine with the move to [community] but if I'm not wrong some optimisations depend on the type of the CPU, not as strongly as atlas does but it may decrease the performance.

All comments