Package Details: leela-zero 0.17-1

Git Clone URL: https://aur.archlinux.org/leela-zero.git (read-only, click to copy)
Package Base: leela-zero
Description: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
Upstream URL: https://github.com/leela-zero/leela-zero
Licenses: GPLv3
Submitter: apetresc
Maintainer: apetresc (algebro)
Last Packager: apetresc
Votes: 5
Popularity: 0.000000
First Submitted: 2018-04-25 03:12 (UTC)
Last Updated: 2019-04-04 19:46 (UTC)

Latest Comments

apetresc commented on 2020-02-19 17:29 (UTC)

@jshholland Can you try a clean build? I can no longer reproduce the problem on 5.2.1, it's all good on my end.

jshholland commented on 2020-02-19 17:11 (UTC)

$ makepkg --version
makepkg (pacman) 5.2.1

According to the pacman NEWS file, the gzip bug was fixed in 5.2.0, but I'm still seeing this same bug.

apetresc commented on 2019-09-19 18:37 (UTC)

@UndeadKernel: Ah, figured it out! Eli Schwartz confirmed on IRC that it's a recently-introduced upstream bug: https://git.archlinux.org/pacman.git/commit/?id=99c5809bbf01725829ce67458565b46bce32eaa9

It's fixed in pacman-git but hasn't been released in pacman yet. Once it's released, it will restore the old auto-extraction behaviour.

Given that circumstance, I'm not gonna push a hotfix, since I'd have to just revert it soon once Pacman is updated

apetresc commented on 2019-09-19 18:28 (UTC)

@UndeadKernel: Hmm, you're right, something has clearly changed in makepkg, this exact PKGBUILD (plus leela-zero-git's almost-identical one) definitely used to work as-is a few weeks ago.

I think makepkg used to implicitly extract *.gz resources and now it doesn't anymore. I'll try to get to the bottom of this before just pushing out a hotfix.

UndeadKernel commented on 2019-09-18 09:42 (UTC)

I'm observing an error while installing:

==> Starting package()...
install: cannot stat '/home/boy/.cache/yay/leela-zero/src/weights.txt': No such file or directory
==> ERROR: A failure occurred in package().
    Aborting...
Error making: leela-zero

I don't see anywhere in the PKGBUILD where the weights.txt.gz file is uncompressed. Am I missing something obvious?

apetresc commented on 2019-08-05 17:53 (UTC)

@janwil You will have more luck posting your findings so far in the ticket linked by @Liso; it's likely this is just a bug with Leela-Zero itself, not this particular package of it.

(You might also want to give leela-zero-git a try, just in case whatever the problem is has already been fixed. Leela-Zero is very slow to tag actual releases, so this is not too unlikely)

Hope that helps!

janwil commented on 2019-08-04 14:03 (UTC)

@Liso, I can confirm that my results of 'coredump gdb' and 'where' are similar to yours. What does this mean now and what can I do to fix this? Is there anything I can do in the first place? I am unfortunately not very good at debugging and fixing C++ myself :(

Best regards, Jan

Liso commented on 2019-07-28 17:57 (UTC)

Now I found that it is same as this issue https://github.com/leela-zero/leela-zero/issues/2438

Liso commented on 2019-07-28 15:26 (UTC)

@janwil , it seems I have same problem (in my case AMD Radeon RX 550X).

If I do coredumpctl list I get

Sun 2019-07-28 17:00:46 CEST 19240 1000 1000 11 present /var/tmp/pamac-build-i/leela-zero/src/leela-zero/build/tests

Then you could use "exe" to investigate problem:

coredumpctl gdb /var/tmp/pamac-build-i/leela-zero/src/leela-zero/build/tests

then you could write where and see stack. I got this:

(gdb) where

0 0x0000000000000000 in ?? ()
1 0x00007f90acde36a2 in ?? () from /usr/lib/libMesaOpenCL.so.1
2 0x00007f90acdd4e3f in ?? () from /usr/lib/libMesaOpenCL.so.1
3 0x00007f90acdd5a13 in ?? () from /usr/lib/libMesaOpenCL.so.1
4 0x00007f90acdd6291 in ?? () from /usr/lib/libMesaOpenCL.so.1
5 0x00007f90acdd31a5 in ?? () from /usr/lib/libMesaOpenCL.so.1
6 0x00007f90acdd0cde in ?? () from /usr/lib/libMesaOpenCL.so.1
7 0x000055c2ad0bf72b in cl::CommandQueue::enqueueWriteBuffer (blocking=0, offset=0, events=0x0, event=0x0, ptr=<optimized out>, size=147456, buffer=..., this=<synthetic pointer>)
at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/CL/cl2.hpp:7166
8 Tuner<half_float::half>::tune_sgemmabi:cxx11 (this=0x7ffc804f9b50, m=8, n=25, k=8, batch_size=36, runs=<optimized out>)
at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Tuner.cpp:491
9 0x000055c2ad0c0c3f in Tuner<half_float::half>::load_sgemm_tunersabi:cxx11 (this=0x7ffc804f9b50, m=8, n=25, k=8, batch_size=36)
at /usr/include/c++/9.1.0/ext/new_allocator.h:89
10 0x000055c2ad0d6d18 in OpenCL<half_float::half>::initialize (this=0x55c2ae675570, channels=8, batch_size=1) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Tuner.cpp:722
11 0x000055c2ad0d738e in OpenCLScheduler<half_float::half>::initialize (this=0x55c2ae675490, channels=8) at /usr/include/c++/9.1.0/bits/unique_ptr.h:357
12 0x000055c2ad0eaacb in Network::init_net (this=0x7f90ace86010, channels=8, pipe=...) at /usr/include/c++/9.1.0/bits/unique_ptr.h:357
13 0x000055c2ad0f2c64 in Network::select_precision (this=0x7f90ace86010, channels=8) at /usr/include/c++/9.1.0/bits/move.h:74
14 0x000055c2ad0f35a8 in Network::initialize (this=0x7f90ace86010, playouts=<optimized out>, weightsfile=...) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Network.cpp:573
15 0x000055c2ad1245a5 in LeelaEnv::SetUp (this=<optimized out>) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/tests/gtests.cpp:87
16 0x000055c2ad1470d1 in testing::internal::UnitTestImpl::RunAllTests() ()
17 0x000055c2ad15255d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl, bool (testing::internal::UnitTestImpl::)(), char const*) ()
18 0x000055c2ad147562 in testing::UnitTest::Run() ()
19 0x000055c2ad07e8f6 in main () at /usr/include/c++/9.1.0/ext/new_allocator.h:89

(gdb)

Do you have same problem?

janwil commented on 2019-07-23 15:41 (UTC)

@apetresc, did a boot and clean git clone, but makepkg still gives the same error.

How should I debug this?

Best, Jan

apetresc commented on 2019-07-23 15:25 (UTC)

@janwil Hmm, interesting; I'm not sure, as I don't have access to an AMD GPU to test it myself.

One similar problem that is often the culprit - have you upgraded the kernel+headers since your last restart? This sort of thing sometimes occurs in those cases because leela-zero is being compiled against the newly-installed AMD headers, but executing against the old version of the module. If so, just try rebooting and let me know if that helps!

janwil commented on 2019-07-19 18:44 (UTC)

I have AMD RX 480 GPU which seems to be recognised during the test phase with OpenCL support working and everything, but then a test fails:

Started OpenCL SGEMM tuner. Will try 290 valid configurations. /home/janwil/Documents/install/leela-zero/PKGBUILD: line 39: 2209 Segmentation fault (core dumped) ./tests ==> ERROR: A failure occurred in check(). Aborting...

What am I missing?

Thanks in advance, Jan

sfranchi commented on 2019-01-13 16:07 (UTC)

Ah, success at last!

You were right @apetresc, that was the problem. It took me a few iterations of updating kernel/rebooting/reinstalling nvidia-390xx + opencl-nvidia-390xx but it finally passed the test.

I would suggest adding a note to the PKGBUILD recommending users to make sure they have kernel and modules in sync before installing the program

sfranchi commented on 2019-01-13 08:09 (UTC)

@apetresc: Unfortunately that was not the problem, I am getting the same error in the tests phase

Do I have to do anything special to recompile the 390xx driver? I simply rebooted after the updates to the kernel were installed.

apetresc commented on 2019-01-01 19:48 (UTC)

@sfranchi: I've run this PGKBUILD with the nvidia390xx driver before, so I don't think it's that...

The only times I've encountered your error message before, it's always been for the same reason: I'd recompiled my nvidia driver module after upgrading my kernel but before rebooting into that updated kernel. Since the module compiled against the headers of the installed kernel, not the running one, it would fail to load with exactly that error until I rebooted so that the two matched again.

It's a bit of a long shot, but could that be the cause of your issue here?

sfranchi commented on 2019-01-01 18:42 (UTC)

It is actually not---or not only---a permission problem, even though it seems to be part of it. I built leela manually and it does fail to start, with errors related to opencl. It may be a driver issue---my nVidia Quadro 2000 requires the legacy nvdia390xx driver---perhaps it requires some tweaks to get leela/opencl to work with them?

This is the error thrown by $leelaz -w weights.txt

BLAS Core: built-in Eigen 3.3.5 library. Detecting residual layers...v1...256 channels...40 blocks. Initializing OpenCL (autodetecting precision). OpenCL: clGetPlatformIDs terminate called after throwing an instance of 'cl::Error' what(): clGetPlatformIDs Aborted (core dumped)

sfranchi commented on 2018-12-31 17:32 (UTC)

Building fails on my system at the test stage, when no cli platforms are found even though I have an nVidia Quadro 2000 installed, which supports opencl 1.1.

I wonder if there is a permission issue, since running clinfo as a user gives "Number of platforms 0" while running it as super user finds the card/driver and so on.

apetresc commented on 2018-12-04 20:38 (UTC)

@flovo - yup. Did some testing with algebro and this new version should work equally well on NVidia as well as AMD hardware!

CPU-only version is still in the works, though.

flovo commented on 2018-11-29 10:25 (UTC)

Hi, thank you for maintaining this package.

Can you please change the dependency opencl-nvidia -> opencl-driver?

mortimer_mcmire commented on 2018-11-06 15:47 (UTC)

Is there any reason why this depends on libopenblas instead of the [extra] openblas package?

apetresc commented on 2018-05-25 20:11 (UTC)

Glad it worked for you :)

The qt5 dependency is actually only needed for the training component of Leela-Zero (autogtp), not for running the engine. You might be right though - considering how heavy a dependency qt5 is, I might want to split those into separate packages someday.

farnmeier commented on 2018-05-25 19:50 (UTC)

Leela zero rules! :-) This was my first AUR package, and the build worked fine for me, without any problems... However, I think the qt5-base dependency can be left out.