Currently working as of current version. Not getting the build error anymore. I just want to say thanks to @txtsd
Search Criteria
Package Details: llama.cpp-cuda b9305-1
Package Actions
| Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
|---|---|
| Package Base: | llama.cpp-cuda |
| Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
| Upstream URL: | https://github.com/ggml-org/llama.cpp |
| Licenses: | MIT |
| Conflicts: | ggml, libggml, llama.cpp |
| Provides: | llama.cpp |
| Submitter: | txtsd |
| Maintainer: | fabse |
| Last Packager: | fabse |
| Votes: | 17 |
| Popularity: | 1.77 |
| First Submitted: | 2024-10-26 20:17 (UTC) |
| Last Updated: | 2026-05-24 12:33 (UTC) |
Dependencies (18)
- cuda (cuda11.1AUR, cuda-12.2AUR, cuda12.0AUR, cuda11.4AUR, cuda-12.5AUR, cuda-12.9AUR, cuda-12.8AUR, cuda-pascalAUR)
- curl (curl-gitAUR, curl-c-aresAUR)
- gcc-libs (gcc-libs-gitAUR, gccrs-libs-gitAUR, gcc-libs-snapshotAUR)
- glibc (glibc-gitAUR, glibc-eacAUR, glibc-git-native-pgoAUR)
- nvidia-utils (nvidia-410xx-utilsAUR, nvidia-440xx-utilsAUR, nvidia-430xx-utilsAUR, nvidia-340xx-utilsAUR, nvidia-510xx-utilsAUR, nvidia-utils-teslaAUR, nvidia-575xx-utilsAUR, nvidia-340xx-utils-macbookAUR, nvidia-535xx-utilsAUR, nvidia-utils-betaAUR, nvidia-470xx-utilsAUR, nvidia-390xx-utilsAUR, nvidia-550xx-utilsAUR, nvidia-580xx-utilsAUR, nvidia-vulkan-utilsAUR, nvidia-525xx-utilsAUR)
- python
- cmake (cmake3AUR, cmake-gitAUR) (make)
- cudnn (cudnn9.10-cuda12.9AUR, cudnn-pascalAUR) (make)
- git (git-gitAUR, git-glAUR, git-wd40AUR) (make)
- ninja (ninja-gitAUR, ninja-memAUR, ninja-noemacs-gitAUR, ninja-kitwareAUR, ninja-fuchsia-gitAUR, n2-ninja-symlinkAUR) (make)
- shaderc (shaderc-gitAUR, shaderc-gitAUR) (make)
- nccl (nccl-cuda12.9AUR, nccl-gitAUR) (optional) – needed for multi-GPU parallelism
- python-ggufAUR (python-gguf-gitAUR) (optional) – needed for convert_hf_to_gguf.py
- python-numpy (python-numpy-gitAUR, python-numpy-mkl-binAUR, python-numpy1AUR, python-numpy-mkl-tbbAUR, python-numpy-mklAUR) (optional) – needed for convert_hf_to_gguf.py
- python-pytorch (python-pytorch-cuda12.9AUR, python-pytorch-opt-cuda12.9AUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-rocm) (optional) – needed for convert_hf_to_gguf.py
- python-safetensors (optional) – needed for convert_hf_to_gguf.py
- python-sentencepieceAUR (python-sentencepiece-gitAUR, python-sentencepiece-binAUR) (optional) – needed for convert_hf_to_gguf.py
- python-transformersAUR (python-transformers-gitAUR) (optional) – needed for convert_hf_to_gguf.py
Required by (5)
- llamaman-bin (requires llama.cpp) (optional)
- scmd-bin (requires llama.cpp)
- voxd (requires llama.cpp) (optional)
- voxd-bin (requires llama.cpp) (optional)
- voxd-git (requires llama.cpp) (optional)
Sources (3)
Latest Comments
« First ‹ Previous 1 2 3 4 5 6 7 8 9 10 Next › Last »
brewkro commented on 2025-07-09 01:27 (UTC)
JamesMowery commented on 2025-07-06 20:24 (UTC) (edited on 2025-07-06 20:25 (UTC) by JamesMowery)
Update 2: Disregard the "fix" I posted below. All the models are not loading into memory now (seems like the block/layer loading is busted) and my computer is crashing often. So I'm guessing there really is a deeper problem. This is so darn frustrating.
JamesMowery commented on 2025-07-06 19:38 (UTC) (edited on 2025-07-06 19:42 (UTC) by JamesMowery)
Update: I got it working by changing this in the PKGBUILD:
-DLLAMA_USE_SYSTEM_GGML=OFF
and then I had to delete the following files:
llama.cpp-cuda-f16: /usr/include/ggml-alloc.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-backend.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-blas.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cann.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cpp.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cpu.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cuda.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-metal.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-opt.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-rpc.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-sycl.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-vulkan.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/gguf.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/cmake/ggml/ggml-config.cmake exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/cmake/ggml/ggml-version.cmake exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/libggml-base.so exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/libggml-cpu.so exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/libggml.so exists in filesystem (owned by libggml-cuda-f16-git)
After doing that, It appears to be working just fine now.
I don't know how to modify the PKGBUILD to make it delete all those files automatically and stuff. I know it's probably very bad to do it this way. But I'm glad I got a temporary fix in order until this gets sorted.
txtsd commented on 2025-07-04 02:28 (UTC)
Must be something about recent commits then. llama.cpp is a very fast moving software and is prone to bugs during building due to the nature of the package.
brewkro commented on 2025-07-03 22:00 (UTC)
I'm getting the same error that @JamesMowery is getting. I've tried a rebuild of both lib.ggml and both versions of the llama.cpp-cuda as well.
JamesMowery commented on 2025-07-03 19:40 (UTC) (edited on 2025-07-03 19:41 (UTC) by JamesMowery)
@txtsd If it's at all helpful, I went ahead and tried to install the non fp16 package (as I had never used that; not even sure if fp16 is helpful with a 4090), and I also got the same exact error despite using those two completely new packages. So something is definitely a bit wrong.
AI says that some function is asking for a different number of arguments instead of what is being provided, likely due to a recent update? Not sure if that's at all accurate, but I hope it helps!
If you need me to provide any additional logging please let me know! (And if you would like that, please let me know how about to do that.) It's very much appreciated!
JamesMowery commented on 2025-07-03 14:20 (UTC) (edited on 2025-07-03 14:23 (UTC) by JamesMowery)
@txtsd I actually went ahead and deleted and re-added that package (I also deleted the cache) even before posting below and got that error. However, just to be sure, I just ran paru -S libggml-cuda-f16-git --rebuild and then after attempt to install paru llama.cpp-cuda-f16 again (also re-ran that with --rebuild too) and I got the same exact error. :(
txtsd commented on 2025-07-03 07:32 (UTC)
@JamesMowery Rebuild libggml
JamesMowery commented on 2025-07-03 03:11 (UTC)
This package started working a few days after my prior post. I just went to upgrade today (I usually update every Friday, today was an exception) and I'm getting this error.
[ 31%] Building CXX object examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/gguf-hash.cpp.o
[ 31%] Linking CXX executable ../../bin/rpc-server
[ 31%] Built target llama-gguf
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp: In function ‘bool weight_buft_supported(const llama_hparams&, ggml_tensor*, ggml_op, ggml_backend_buffer_type_t, ggml_backend_dev_t)’:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:231:42: error: too many arguments to function ‘ggml_tensor* ggml_ssm_scan(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*)’
231 | op_tensor = ggml_ssm_scan(ctx, s, x, dt, w, B, C, ids);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/../include/llama.h:4,
from /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.h:3,
from /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:1:
/usr/include/ggml.h:2009:35: note: declared here
2009 | GGML_API struct ggml_tensor * ggml_ssm_scan(
| ^~~~~~~~~~~~~
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp: In lambda function:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:9922:37: error: too many arguments to function ‘ggml_tensor* ggml_ssm_scan(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*)’
9922 | return ggml_ssm_scan(ctx, ssm, x, dt, A, B, C, ids);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/ggml.h:2009:35: note: declared here
2009 | GGML_API struct ggml_tensor * ggml_ssm_scan(
| ^~~~~~~~~~~~~
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp: In lambda function:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:10046:37: error: too many arguments to function ‘ggml_tensor* ggml_ssm_scan(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*)’
10046 | return ggml_ssm_scan(ctx, ssm, x, dt, A, B, C, ids);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/ggml.h:2009:35: note: declared here
2009 | GGML_API struct ggml_tensor * ggml_ssm_scan(
| ^~~~~~~~~~~~~
[ 31%] Built target rpc-server
[ 32%] Linking CXX executable ../../bin/llama-gguf-hash
In function ‘SHA1Update’,
inlined from ‘SHA1Final’ at /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:269:9:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:219:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
219 | SHA1Transform(context->state, &data[i]);
| ^
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:219:13: note: referencing argument 2 of type ‘const unsigned char[64]’
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c: In function ‘SHA1Final’:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:54:6: note: in a call to function ‘SHA1Transform’
54 | void SHA1Transform(
| ^
[ 32%] Built target llama-gguf-hash
make[2]: *** [src/CMakeFiles/llama.dir/build.make:359: src/CMakeFiles/llama.dir/llama-model.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:1030: src/CMakeFiles/llama.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
==> ERROR: A failure occurred in build().
Aborting...
error: failed to build 'llama.cpp-cuda-f16-b5814-1':
error: packages failed to build: llama.cpp-cuda-f16-b5814-1
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip