AUR (en) - leela-zero

Git Clone URL:	https://aur.archlinux.org/leela-zero.git (read-only, click to copy)
Package Base:	leela-zero
Description:	Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
Upstream URL:	https://github.com/leela-zero/leela-zero
Licenses:	GPLv3
Submitter:	apetresc
Maintainer:	apetresc (algebro)
Last Packager:	apetresc
Votes:	6
Popularity:	0.000888
First Submitted:	2018-04-25 03:12 (UTC)
Last Updated:	2019-04-04 19:46 (UTC)

Latest Comments

« First ‹ Previous 1 2 3 Next › Last »

Liso commented on 2019-07-28 15:26 (UTC)

@janwil , it seems I have same problem (in my case AMD Radeon RX 550X).

If I do coredumpctl list I get

Sun 2019-07-28 17:00:46 CEST 19240 1000 1000 11 present /var/tmp/pamac-build-i/leela-zero/src/leela-zero/build/tests

Then you could use "exe" to investigate problem:

coredumpctl gdb /var/tmp/pamac-build-i/leela-zero/src/leela-zero/build/tests

then you could write where and see stack. I got this:

(gdb) where

0 0x0000000000000000 in ?? ()

1 0x00007f90acde36a2 in ?? () from /usr/lib/libMesaOpenCL.so.1

2 0x00007f90acdd4e3f in ?? () from /usr/lib/libMesaOpenCL.so.1

3 0x00007f90acdd5a13 in ?? () from /usr/lib/libMesaOpenCL.so.1

4 0x00007f90acdd6291 in ?? () from /usr/lib/libMesaOpenCL.so.1

5 0x00007f90acdd31a5 in ?? () from /usr/lib/libMesaOpenCL.so.1

6 0x00007f90acdd0cde in ?? () from /usr/lib/libMesaOpenCL.so.1

7 0x000055c2ad0bf72b in cl::CommandQueue::enqueueWriteBuffer (blocking=0, offset=0, events=0x0, event=0x0, ptr=<optimized out>, size=147456, buffer=..., this=<synthetic pointer>)

at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/CL/cl2.hpp:7166

8 Tuner<half_float::half>::tune_sgemmabi:cxx11 (this=0x7ffc804f9b50, m=8, n=25, k=8, batch_size=36, runs=<optimized out>)

at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Tuner.cpp:491

9 0x000055c2ad0c0c3f in Tuner<half_float::half>::load_sgemm_tunersabi:cxx11 (this=0x7ffc804f9b50, m=8, n=25, k=8, batch_size=36)

at /usr/include/c++/9.1.0/ext/new_allocator.h:89

10 0x000055c2ad0d6d18 in OpenCL<half_float::half>::initialize (this=0x55c2ae675570, channels=8, batch_size=1) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Tuner.cpp:722

11 0x000055c2ad0d738e in OpenCLScheduler<half_float::half>::initialize (this=0x55c2ae675490, channels=8) at /usr/include/c++/9.1.0/bits/unique_ptr.h:357

12 0x000055c2ad0eaacb in Network::init_net (this=0x7f90ace86010, channels=8, pipe=...) at /usr/include/c++/9.1.0/bits/unique_ptr.h:357

13 0x000055c2ad0f2c64 in Network::select_precision (this=0x7f90ace86010, channels=8) at /usr/include/c++/9.1.0/bits/move.h:74

14 0x000055c2ad0f35a8 in Network::initialize (this=0x7f90ace86010, playouts=<optimized out>, weightsfile=...) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Network.cpp:573

15 0x000055c2ad1245a5 in LeelaEnv::SetUp (this=<optimized out>) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/tests/gtests.cpp:87

16 0x000055c2ad1470d1 in testing::internal::UnitTestImpl::RunAllTests() ()

17 0x000055c2ad15255d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl, bool (testing::internal::UnitTestImpl::)(), char const*) ()

18 0x000055c2ad147562 in testing::UnitTest::Run() ()

19 0x000055c2ad07e8f6 in main () at /usr/include/c++/9.1.0/ext/new_allocator.h:89

(gdb)

Do you have same problem?

janwil commented on 2019-07-23 15:41 (UTC)

@apetresc, did a boot and clean git clone, but makepkg still gives the same error.

How should I debug this?

Best, Jan

apetresc commented on 2019-07-23 15:25 (UTC)

@janwil Hmm, interesting; I'm not sure, as I don't have access to an AMD GPU to test it myself.

One similar problem that is often the culprit - have you upgraded the kernel+headers since your last restart? This sort of thing sometimes occurs in those cases because leela-zero is being compiled against the newly-installed AMD headers, but executing against the old version of the module. If so, just try rebooting and let me know if that helps!

janwil commented on 2019-07-19 18:44 (UTC)

I have AMD RX 480 GPU which seems to be recognised during the test phase with OpenCL support working and everything, but then a test fails:

Started OpenCL SGEMM tuner. Will try 290 valid configurations. /home/janwil/Documents/install/leela-zero/PKGBUILD: line 39: 2209 Segmentation fault (core dumped) ./tests ==> ERROR: A failure occurred in check(). Aborting...

What am I missing?

Thanks in advance, Jan

sfranchi commented on 2019-01-13 16:07 (UTC)

Ah, success at last!

You were right @apetresc, that was the problem. It took me a few iterations of updating kernel/rebooting/reinstalling nvidia-390xx + opencl-nvidia-390xx but it finally passed the test.

I would suggest adding a note to the PKGBUILD recommending users to make sure they have kernel and modules in sync before installing the program

sfranchi commented on 2019-01-13 08:09 (UTC)

@apetresc: Unfortunately that was not the problem, I am getting the same error in the tests phase

Do I have to do anything special to recompile the 390xx driver? I simply rebooted after the updates to the kernel were installed.

apetresc commented on 2019-01-01 19:48 (UTC)

@sfranchi: I've run this PGKBUILD with the nvidia390xx driver before, so I don't think it's that...

The only times I've encountered your error message before, it's always been for the same reason: I'd recompiled my nvidia driver module after upgrading my kernel but before rebooting into that updated kernel. Since the module compiled against the headers of the installed kernel, not the running one, it would fail to load with exactly that error until I rebooted so that the two matched again.

It's a bit of a long shot, but could that be the cause of your issue here?

sfranchi commented on 2019-01-01 18:42 (UTC)

It is actually not---or not only---a permission problem, even though it seems to be part of it. I built leela manually and it does fail to start, with errors related to opencl. It may be a driver issue---my nVidia Quadro 2000 requires the legacy nvdia390xx driver---perhaps it requires some tweaks to get leela/opencl to work with them?

This is the error thrown by $leelaz -w weights.txt

BLAS Core: built-in Eigen 3.3.5 library. Detecting residual layers...v1...256 channels...40 blocks. Initializing OpenCL (autodetecting precision). OpenCL: clGetPlatformIDs terminate called after throwing an instance of 'cl::Error' what(): clGetPlatformIDs Aborted (core dumped)

sfranchi commented on 2018-12-31 17:32 (UTC)

Building fails on my system at the test stage, when no cli platforms are found even though I have an nVidia Quadro 2000 installed, which supports opencl 1.1.

I wonder if there is a permission issue, since running clinfo as a user gives "Number of platforms 0" while running it as super user finds the card/driver and so on.

apetresc commented on 2018-12-04 20:38 (UTC)

@flovo - yup. Did some testing with algebro and this new version should work equally well on NVidia as well as AMD hardware!

CPU-only version is still in the works, though.