AUR (en) - llama.cpp-cuda-git

Search Criteria

Enter search criteria

Search by

Keywords

Out of Date

Sort by

Sort order

Per page

Package Details: llama.cpp-cuda-git b9050.r5.8e52631d55-1

Package Actions

Git Clone URL:	https://aur.archlinux.org/llama.cpp-cuda-git.git (read-only, click to copy)
Package Base:	llama.cpp-cuda-git
Description:	Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations)
Upstream URL:	https://github.com/ggml-org/llama.cpp
Licenses:	MIT
Conflicts:	ggml, libggml, llama.cpp
Provides:	ggml, libggml, libggml-cuda-git, libggml.so, llama.cpp
Submitter:	Bink
Maintainer:	Bink
Last Packager:	Bink
Votes:	3
Popularity:	1.21
First Submitted:	2026-01-08 09:17 (UTC)
Last Updated:	2026-05-08 12:41 (UTC)

Dependencies (19)

cuda (cuda11.1^AUR, cuda-12.2^AUR, cuda12.0^AUR, cuda11.4^AUR, cuda-12.5^AUR, cuda-12.9^AUR, cuda-12.8^AUR, cuda-pascal^AUR)
curl (curl-git^AUR, curl-c-ares^AUR)
gcc-libs (gcc-libs-git^AUR, gccrs-libs-git^AUR, gcc-libs-snapshot^AUR)
glibc (glibc-git^AUR, glibc-eac^AUR, glibc-git-native-pgo^AUR)
nvidia-utils (nvidia-410xx-utils^AUR, nvidia-440xx-utils^AUR, nvidia-430xx-utils^AUR, nvidia-340xx-utils^AUR, nvidia-510xx-utils^AUR, nvidia-utils-tesla^AUR, nvidia-525xx-utils^AUR, nvidia-575xx-utils^AUR, nvidia-340xx-utils-macbook^AUR, nvidia-535xx-utils^AUR, nvidia-utils-beta^AUR, nvidia-470xx-utils^AUR, nvidia-390xx-utils^AUR, nvidia-550xx-utils^AUR, nvidia-580xx-utils^AUR, nvidia-vulkan-utils^AUR)
openssl (openssl-git^AUR, openssl-static^AUR, openssl-aegis^AUR)
cmake (cmake3^AUR, cmake-git^AUR) (make)
cudnn (cudnn9.10-cuda12.9^AUR, cudnn-pascal^AUR) (make)
gcc15 (make)
git (git-git^AUR, git-gl^AUR, git-wd40^AUR) (make)
ninja (ninja-git^AUR, ninja-mem^AUR, ninja-noemacs-git^AUR, ninja-kitware^AUR, ninja-fuchsia-git^AUR, n2-ninja-symlink^AUR) (make)
ccache (ccache-git^AUR) (optional) – greatly reduce package re-build time
nccl (nccl-cuda12.9^AUR, nccl-git^AUR) (optional) – needed for multi-GPU parallelism
python-numpy (python-numpy-git^AUR, python-numpy-mkl-bin^AUR, python-numpy1^AUR, python-numpy-mkl-tbb^AUR, python-numpy-mkl^AUR) (optional) – needed for convert_hf_to_gguf.py
python-pytorch (python-pytorch-cuda12.9^AUR, python-pytorch-opt-cuda12.9^AUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-rocm) (optional) – needed for convert_hf_to_gguf.py
python-safetensors (optional) – needed for convert_hf_to_gguf.py
python-sentencepiece^AUR (python-sentencepiece-git^AUR, python-sentencepiece-bin^AUR) (optional) – needed for convert_hf_to_gguf.py
python-transformers^AUR (python-transformers-git^AUR) (optional) – needed for convert_hf_to_gguf.py
rdma-core (rdma-core-git^AUR) (optional) – RDMA transport for RPC backend

Required by (7)

llama.cpp-cublas-git (requires libggml-cuda-git)
llamaman-bin (requires llama.cpp) (optional)
scmd-bin (requires llama.cpp)
voxd (requires llama.cpp) (optional)
voxd-bin (requires llama.cpp) (optional)
voxd-git (requires llama.cpp) (optional)
whisper.cpp-cuda (requires libggml-cuda-git)

Sources (3)

Pinned Comments

Bink commented on 2026-04-20 01:25 (UTC)

The package now leverages ninja to ensure parallel builds regardless of makepkg.conf settings.

If you have multiple Nvidia GPU's, be sure to install the optional dependency nccl, for multi-GPU parallelism.

To improve re-build times, install the optional dependency ccache.

Latest Comments

1 2 Next › Last »

jamesb6626 commented on 2026-05-08 12:37 (UTC) (edited on 2026-05-08 12:41 (UTC) by jamesb6626)

On the first try with makepkg -si, it misses the build number & commit id. (current commit a7b116537675, but isn't a new issue)

The result is that e.g. llama-server --version confidently reports it is version: 0 (), with similar issues anywhere else a version is expected.

My current workaround is to interrupt it partway through a build, then rerun, somehow this gets the correct value.

Sticking in an echo to check the variables _commit_id/_build_number, they are set correctly in prepare(), but have been forgotten and reset to blank/0 when build() runs, in the PKGBUILD.

Anyone else getting this weird behaviour, or just me?

Bink commented on 2026-05-07 11:37 (UTC)

I'd been monitoring local performance too, and had wondered if that introduced a regression. Looks like I might have flown too close to the sun on that one. Thank you for providing a clear analysis @Saluu, I appreciate it.

I've now removed OpenBLAS.

Saluu commented on 2026-05-07 10:55 (UTC) (edited on 2026-05-07 10:57 (UTC) by Saluu)

OpenBLAS was added in commit bc45cc2 (3 May 2026). It causes a severe prompt-processing regression for any model that partially offloads layers to CPU (ngl < all layers). On a 14B Q4_K_M model with 36/49 layers on GPU, pp1024 drops from ~1450 t/s to ~120 t/s, roughly 12x slower.

The BLAS backend registers before the native CPU backend in ggml's scheduler (ggml-backend-reg.cpp). It claims MUL_MAT ops for CPU-side layers, forcing a dequantize-to-F32 + cblas_sgemm path instead of the native backend's optimized quantized vec_dot kernels. This is fundamentally slower for LLM inference on quantized models.

This is a known issue: https://github.com/ggml-org/llama.cpp/issues/5986

BLAS can benefit pure-CPU F32 inference, but for a CUDA package the common use case is partial GPU offload with quantized models, where BLAS is harmful. Making BLAS opt-in rather than the default would avoid this regression for most users while keeping it available for those who need it.

Suggested fix: remove -DGGML_BLAS=ON, -DGGML_BLAS_VENDOR=OpenBLAS, and the openblas dependency, or make them conditional on an opt-in variable (like the existing aur_llamacpp_build_universal pattern).

Workaround: aur_llamacpp_cmakeopts="-DGGML_BLAS=OFF" makepkg -si

aydintb commented on 2026-05-04 08:04 (UTC)

dependency build of cpp15 fails.

aydintb commented on 2026-05-04 08:03 (UTC)

gcc/lto1 differs make[2]: *** [Makefile:25123: compare] Error 1 make[1]: *** [Makefile:25103: stage3-bubble] Error 2 make: *** [Makefile:25166: bootstrap] Error 2 ==> ERROR: A failure occurred in build(). Aborting... -> error making: gcc15-exit status 4 -> nothing to install for gcc15 ==> Making package: llama.cpp-cuda-git b9010.r0.d05fe1d7da-2 (Pzt 04 May 2026 11:01:23) ==> Checking runtime dependencies... ==> Checking buildtime dependencies... ==> Missing dependencies: -> gcc15 ==> ERROR: Could not resolve all dependencies. -> error making: llama.cpp-cuda-git-exit status 8 -> Failed to install the following packages. Manual intervention is required: gcc15 - exit status 4 llama.cpp-cuda-git - exit status 8

ultramango commented on 2026-05-03 16:31 (UTC)

@Bink, thanks for the update! I confirm it now compiles without any tricks.

Bink commented on 2026-05-03 04:56 (UTC)

@ultramango, it should now be working again, with the specified gcc15 make dependency. Unfortunatly, for now, this means gcc15 needs to be compiled, which takes a long time. I suggest keeping gcc15 installed so you don't have to do that every update.

Hopefully GCC 16 support is implemented upstream soon.

Bink commented on 2026-05-03 02:59 (UTC)

The recent version bump of gcc from gcc15 to gcc16 has tripped this up. I'm reviewing options still. Ideally it'd still be using gcc15 for maximum optimisation, but that doesn't yet have an AUR package.

ultramango commented on 2026-05-02 20:38 (UTC) (edited on 2026-05-02 20:54 (UTC) by ultramango)

In case you encounter problems with nvcc/gcc compilation errors (/usr/include/c++/16.1.1/type_traits(1448): error: identifier "__f" is undefined):

# Note: you should have a GCC 13 installed
$ export CUDAHOSTCXX=/usr/bin/g++-13
$ makepkg

You might have a too new gcc (16 as of writing this comment) for nvcc.