@txtsd I actually went ahead and deleted and re-added that package (I also deleted the cache) even before posting below and got that error. However, just to be sure, I just ran paru -S libggml-cuda-f16-git --rebuild
and then after attempt to install paru llama.cpp-cuda-f16
again (also re-ran that with --rebuild too) and I got the same exact error. :(
Search Criteria
Package Details: llama.cpp-cuda b6316-1
Package Actions
Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
---|---|
Package Base: | llama.cpp-cuda |
Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
Upstream URL: | https://github.com/ggerganov/llama.cpp |
Licenses: | MIT |
Conflicts: | ggml, libggml, llama.cpp |
Provides: | llama.cpp |
Replaces: | llama.cpp-cuda-f16 |
Submitter: | txtsd |
Maintainer: | envolution |
Last Packager: | envolution |
Votes: | 10 |
Popularity: | 2.04 |
First Submitted: | 2024-10-26 20:17 (UTC) |
Last Updated: | 2025-08-29 06:39 (UTC) |
Dependencies (11)
- cuda (cuda11.1AUR, cuda-12.2AUR, cuda12.0AUR, cuda11.4AUR, cuda11.4-versionedAUR, cuda12.0-versionedAUR)
- curl (curl-gitAUR, curl-c-aresAUR)
- gcc-libs (gcc-libs-gitAUR, gccrs-libs-gitAUR, gcc-libs-snapshotAUR)
- glibc (glibc-gitAUR, glibc-eacAUR)
- nvidia-utils (nvidia-410xx-utilsAUR, nvidia-440xx-utilsAUR, nvidia-430xx-utilsAUR, nvidia-340xx-utilsAUR, nvidia-550xx-utilsAUR, nvidia-525xx-utilsAUR, nvidia-510xx-utilsAUR, nvidia-390xx-utilsAUR, nvidia-vulkan-utilsAUR, nvidia-535xx-utilsAUR, nvidia-utils-teslaAUR, nvidia-utils-betaAUR, nvidia-470xx-utilsAUR)
- cmake (cmake3AUR, cmake-gitAUR) (make)
- python-numpy (python-numpy-gitAUR, python-numpy1AUR, python-numpy-mklAUR, python-numpy-mkl-binAUR, python-numpy-mkl-tbbAUR) (optional) – needed for convert_hf_to_gguf.py
- python-pytorch (python-pytorch-cxx11abiAUR, python-pytorch-cxx11abi-optAUR, python-pytorch-cxx11abi-cudaAUR, python-pytorch-cxx11abi-opt-cudaAUR, python-pytorch-cxx11abi-rocmAUR, python-pytorch-cxx11abi-opt-rocmAUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-rocm) (optional) – needed for convert_hf_to_gguf.py
- python-safetensorsAUR (python-safetensors-binAUR) (optional) – needed for convert_hf_to_gguf.py
- python-sentencepieceAUR (python-sentencepiece-gitAUR) (optional) – needed for convert_hf_to_gguf.py
- python-transformersAUR (optional) – needed for convert_hf_to_gguf.py
Required by (0)
Sources (3)
JamesMowery commented on 2025-07-03 14:20 (UTC) (edited on 2025-07-03 14:23 (UTC) by JamesMowery)
txtsd commented on 2025-07-03 07:32 (UTC)
@JamesMowery Rebuild libggml
JamesMowery commented on 2025-07-03 03:11 (UTC)
This package started working a few days after my prior post. I just went to upgrade today (I usually update every Friday, today was an exception) and I'm getting this error.
[ 31%] Building CXX object examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/gguf-hash.cpp.o
[ 31%] Linking CXX executable ../../bin/rpc-server
[ 31%] Built target llama-gguf
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp: In function ‘bool weight_buft_supported(const llama_hparams&, ggml_tensor*, ggml_op, ggml_backend_buffer_type_t, ggml_backend_dev_t)’:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:231:42: error: too many arguments to function ‘ggml_tensor* ggml_ssm_scan(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*)’
231 | op_tensor = ggml_ssm_scan(ctx, s, x, dt, w, B, C, ids);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/../include/llama.h:4,
from /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.h:3,
from /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:1:
/usr/include/ggml.h:2009:35: note: declared here
2009 | GGML_API struct ggml_tensor * ggml_ssm_scan(
| ^~~~~~~~~~~~~
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp: In lambda function:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:9922:37: error: too many arguments to function ‘ggml_tensor* ggml_ssm_scan(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*)’
9922 | return ggml_ssm_scan(ctx, ssm, x, dt, A, B, C, ids);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/ggml.h:2009:35: note: declared here
2009 | GGML_API struct ggml_tensor * ggml_ssm_scan(
| ^~~~~~~~~~~~~
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp: In lambda function:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/src/llama-model.cpp:10046:37: error: too many arguments to function ‘ggml_tensor* ggml_ssm_scan(ggml_context*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*)’
10046 | return ggml_ssm_scan(ctx, ssm, x, dt, A, B, C, ids);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/ggml.h:2009:35: note: declared here
2009 | GGML_API struct ggml_tensor * ggml_ssm_scan(
| ^~~~~~~~~~~~~
[ 31%] Built target rpc-server
[ 32%] Linking CXX executable ../../bin/llama-gguf-hash
In function ‘SHA1Update’,
inlined from ‘SHA1Final’ at /mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:269:9:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:219:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
219 | SHA1Transform(context->state, &data[i]);
| ^
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:219:13: note: referencing argument 2 of type ‘const unsigned char[64]’
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c: In function ‘SHA1Final’:
/mnt/big/.cache/paru/clone/llama.cpp-cuda-f16/src/llama.cpp/examples/gguf-hash/deps/sha1/sha1.c:54:6: note: in a call to function ‘SHA1Transform’
54 | void SHA1Transform(
| ^
[ 32%] Built target llama-gguf-hash
make[2]: *** [src/CMakeFiles/llama.dir/build.make:359: src/CMakeFiles/llama.dir/llama-model.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:1030: src/CMakeFiles/llama.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
==> ERROR: A failure occurred in build().
Aborting...
error: failed to build 'llama.cpp-cuda-f16-b5814-1':
error: packages failed to build: llama.cpp-cuda-f16-b5814-1
arichiardi commented on 2025-07-01 17:16 (UTC)
Hi all, thanks for this package! Wondering if we can open it up to other build end vars.
I would like to build the web ui via LLAMA_BUILD_SERVER=1 - I know I can clone and change, lazy question :)
txtsd commented on 2025-06-16 11:54 (UTC)
This package now uses system libggml so it should work alongside whisper.cpp
Tests and examples building has been turned off.
kompute is removed.
txtsd commented on 2025-06-16 11:54 (UTC)
This package now uses system libggml so it should work alongside whisper.cpp
Tests and examples building has been turned off.
kompute is removed.
JamesMowery commented on 2025-06-12 22:49 (UTC) (edited on 2025-06-12 22:51 (UTC) by JamesMowery)
Fails to build for me unfortunately. Nvidia RTX 4090:
-- Found CUDAToolkit: /opt/cuda/targets/x86_64-linux/include (found version "12.8.93")
-- CUDA Toolkit found
-- Using CUDA architectures: native
CMake Error at /usr/share/cmake/Modules/CMakeDetermineCompilerId.cmake:909 (message):
Compiling the CUDA compiler identification source file
"CMakeCUDACompilerId.cu" failed.
Compiler: /opt/cuda/bin/nvcc
Build flags:
Id flags: --keep;--keep-dir;tmp -v
The output was:
2
nvcc warning : Support for offline compilation for architectures prior to
'<compute/sm/lto>_75' will be removed in a future release (Use
-Wno-deprecated-gpu-targets to suppress warning).
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/opt/cuda/bin
#$ _THERE_=/opt/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/opt/cuda/bin/..
#$ CICC_PATH=/opt/cuda/bin/../nvvm/bin
#$ NVVMIR_LIBRARY_DIR=/opt/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/opt/cuda/bin/../lib:/opt/cuda/lib64:/opt/cuda/lib64
#$
PATH=/opt/cuda/bin/../nvvm/bin:/opt/cuda/bin:/opt/cuda/bin:/opt/miniconda3/bin:/opt/cuda/bin:/home/james/miniforge3/condabin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/james/.lmstudio/bin:/home/james/.lmstudio/bin
#$ INCLUDES="-I/opt/cuda/bin/../targets/x86_64-linux/include"
#$ LIBRARIES= "-L/opt/cuda/bin/../targets/x86_64-linux/lib/stubs"
"-L/opt/cuda/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ rm tmp/a_dlink.reg.c
#$ gcc -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -E -x c++ -D__CUDACC__
-D__NVCC__ "-I/opt/cuda/bin/../targets/x86_64-linux/include"
-D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=8
-D__CUDACC_VER_BUILD__=93 -D__CUDA_API_VER_MAJOR__=12
-D__CUDA_API_VER_MINOR__=8 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1
-D__CUDACC_DEVICE_ATOMIC_BUILTINS__=1 -include "cuda_runtime.h" -m64
"CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp4.ii"
...
...
...
/usr/include/c++/15.1.1/type_traits(3521): error: type name is not allowed
inline constexpr bool is_volatile_v = __is_volatile(_Tp);
^
/usr/include/c++/15.1.1/type_traits(3663): error: type name is not allowed
inline constexpr size_t rank_v = __array_rank(_Tp);
^
/usr/include/c++/15.1.1/bits/stl_algobase.h(1239): error: type name is not
allowed
|| __is_pointer(_ValueType1)
^
/usr/include/c++/15.1.1/bits/stl_algobase.h(1412): error: type name is not
allowed
&& __is_pointer(_II1) && __is_pointer(_II2)
^
/usr/include/c++/15.1.1/bits/stl_algobase.h(1412): error: type name is not
allowed
&& __is_pointer(_II1) && __is_pointer(_II2)
^
25 errors detected in the compilation of "CMakeCUDACompilerId.cu".
# --error 0x2 --
Call Stack (most recent call first):
/usr/share/cmake/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
/usr/share/cmake/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
/usr/share/cmake/Modules/CMakeDetermineCUDACompiler.cmake:139 (CMAKE_DETERMINE_COMPILER_ID)
ggml/src/ggml-cuda/CMakeLists.txt:43 (enable_language)
-- Configuring incomplete, errors occurred!
==> ERROR: A failure occurred in build().
Aborting...
error: failed to build 'llama.cpp-cuda-f16-b5648-1':
error: packages failed to build: llama.cpp-cuda-f16-b5648-1
AI says this is the reason (not sure if true or not):
The error occurs because your CUDA toolkit version (12.8.93) is incompatible with your GCC version (15.1.1). CUDA has strict requirements for supported host compiler versions, and GCC 15 is too new for CUDA 12.x.
i2z1 commented on 2025-03-02 11:37 (UTC) (edited on 2025-03-02 11:38 (UTC) by i2z1)
Why are you using Kompute, particularly from the fork (https://github.com/nomic-ai/kompute.git - fork insted of mainline https://github.com/KomputeProject/kompute repo), specifically for llama.cpp CUDA version? I think it would be more convenient to create a separate package for the Kompute backend and have llama.cpp-cuda depend only on CUDA-related dependencies? without Kompute backend dependencies.
chiz commented on 2025-02-23 10:23 (UTC)
llama.cpp-cuda: /usr/lib/libggml-base.so exists in the file system (owned by whisper.cpp-cuda)
llama.cpp-cuda: /usr/lib/libggml-cpu.so exists in the file system (owned by whisper.cpp-cuda)
llama.cpp-cuda: /usr/lib/libggml-cuda.so exists in the file system (owned by whisper.cpp-cuda)
llama.cpp-cuda: /usr/lib/libggml.so exists in the file system (owned by whisper.cpp-cuda)
An error occurred, and no packages were updated.
-> Error during installation: [/home/chi/.cache/yay/llama.cpp-cuda/llama.cpp-cuda-b4762-1-x86_64.pkg.tar.zst] - exit status 1
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip