Arch Linux User Repository

Search Criteria

Enter search criteria

Search by

Keywords

Out of Date

Sort by

Sort order

Per page

Package Details: llama.cpp-cuda b10099-1

Package Actions

Git Clone URL:	https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy)
Package Base:	llama.cpp-cuda
Description:	Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations)
Upstream URL:	https://github.com/ggml-org/llama.cpp
Licenses:	MIT
Conflicts:	ggml, libggml, llama.cpp
Provides:	llama.cpp
Submitter:	txtsd
Maintainer:	fabse
Last Packager:	fabse
Votes:	22
Popularity:	2.97
First Submitted:	2024-10-26 20:17 (UTC)
Last Updated:	2026-07-24 01:13 (UTC)

Dependencies (19)

cuda (cuda11.1^AUR, cuda-12.2^AUR, cuda12.0^AUR, cuda11.4^AUR, cuda-12.5^AUR, cuda-12.8^AUR, cuda-pascal^AUR, cuda-12.9^AUR)
curl (curl-git^AUR, curl-c-ares^AUR)
gcc-libs (gcc-libs-git^AUR, gcc-libs-fast-optimized^AUR, gccrs-libs-git^AUR, gcc-libs-snapshot^AUR)
glibc (glibc-git^AUR, glibc-git-native-pgo^AUR, glibc-eac^AUR)
nvidia-utils (nvidia-410xx-utils^AUR, nvidia-440xx-utils^AUR, nvidia-430xx-utils^AUR, nvidia-340xx-utils^AUR, nvidia-510xx-utils^AUR, nvidia-utils-tesla^AUR, nvidia-575xx-utils^AUR, nvidia-340xx-utils-macbook^AUR, nvidia-535xx-utils^AUR, nvidia-470xx-utils^AUR, nvidia-390xx-utils^AUR, nvidia-550xx-utils^AUR, nvidia-525xx-utils^AUR, nvidia-580xx-utils^AUR, nvidia-vulkan-utils^AUR, nvidia-utils-beta^AUR)
python
cmake (cmake3^AUR, cmake-git^AUR) (make)
cudnn (cudnn9.10-cuda12.9^AUR, cudnn-pascal^AUR) (make)
git (git-git^AUR, git-gl^AUR, git-wd40^AUR) (make)
ninja (ninja-git^AUR, ninja-mem^AUR, ninja-noemacs-git^AUR, ninja-kitware^AUR, ninja-fuchsia-git^AUR, n2-git^AUR) (make)
npm (npm-corepack^AUR, python-nodejs-wheel^AUR) (make)
shaderc (shaderc-git^AUR, shaderc-git^AUR) (make)
nccl (nccl-git^AUR, nccl-cuda12.9^AUR) (optional) – needed for multi-GPU parallelism
python-gguf^AUR (python-gguf-git^AUR) (optional) – needed for convert_hf_to_gguf.py
python-numpy (python-numpy-git^AUR, python-numpy-mkl-bin^AUR, python-numpy1^AUR, python-numpy-mkl-tbb^AUR, python-numpy-mkl^AUR) (optional) – needed for convert_hf_to_gguf.py
python-pytorch (python-pytorch-cuda12.9^AUR, python-pytorch-opt-cuda12.9^AUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-opt-xpu, python-pytorch-rocm, python-pytorch-xpu) (optional) – needed for convert_hf_to_gguf.py
python-safetensors (optional) – needed for convert_hf_to_gguf.py
python-sentencepiece^AUR (python-sentencepiece-git^AUR, python-sentencepiece-bin^AUR) (optional) – needed for convert_hf_to_gguf.py
python-transformers^AUR (python-transformers-git^AUR) (optional) – needed for convert_hf_to_gguf.py

Required by (13)

assistd (requires llama.cpp) (optional)
assistd-git (requires llama.cpp) (optional)
llamaman-bin (requires llama.cpp) (optional)
llamastash (requires llama.cpp) (optional)
llamastash-bin (requires llama.cpp) (optional)
llamastash-git (requires llama.cpp) (optional)
manboster (requires llama.cpp) (optional)
manboster-bin (requires llama.cpp) (optional)
manboster-git (requires llama.cpp) (optional)
scmd-bin (requires llama.cpp)
voxd (requires llama.cpp) (optional)
voxd-bin (requires llama.cpp) (optional)
voxd-git (requires llama.cpp) (optional)

Sources (3)

Pinned Comments

txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

Latest Comments

« First ‹ Previous 1 2 3 4 5 6 7 8 9 10 11 Next › Last »

envolution commented on 2025-08-27 07:29 (UTC) (edited on 2025-08-27 07:31 (UTC) by envolution)

@undefinedmethod - the changes don't affect any of the build flags unless explicitly enabled so this seems possibly due to a recent commit - possibly https://github.com/ggml-org/llama.cpp/pull/15587

I'd need to see the output of makepkg -L - this would create a build log which would show me what was detected at build time. You can link to it using a pastebin type service, or at https://github.com/envolution/aur/issues - For now you can try run aur_llamacpp_build_universal=true makepkg -fsi to build the default cuda architectures - I also have prebuilds at https://github.com/envolution/aur/releases/tag/llama.cpp-cuda if you'd like to try that as it has the universal flag set already

undefinedmethod commented on 2025-08-27 07:16 (UTC)

@Davidyz not sure if this is to do with recent flags added to build process but lllama-server no longer detects GPU:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes /home/undefinedmethod/.cache/yay/llama.cpp-cuda/src/llama.cpp/ggml/src/ggml-cuda/../ggml-cuda/common.cuh:112: ggml was not compiled with any CUDA arch <= 750

Tested on seperate machines, 4070 Super, 5070

envolution commented on 2025-08-26 10:23 (UTC)

Davidyz awesome, thanks for the info

Davidyz commented on 2025-08-26 10:01 (UTC)

Just came across this PR: CUDA: replace GGML_CUDA_F16 with CUDA arch checks

Davidyz commented on 2025-08-26 08:34 (UTC)

@envolution thanks for the quick response. That makes sense. It'll indeed be a good idea to just allow customizing extra build opts from an env var.

envolution commented on 2025-08-26 08:12 (UTC) (edited on 2025-08-26 08:13 (UTC) by envolution)

@Davidyz it'll update shortly with b6280 (just build testing). I've added aur_llamacpp_cmakeopts to the build script, you can prepend it to makepkg like aur_llamacpp_cmakeopts="-DGGML_CUDA_FA_ALL_QUANTS=ON" makepkg or just add it to ~/.bashrc or ~/.bash_profile - it should work fine with helpers too

envolution commented on 2025-08-26 07:41 (UTC)

@Davidyz I'm open to it, but I'd be more inclined to add it as an optional environment variable similarly to how the recipe sets GGML_NATIVE - let me think on it as I could possibly extend it to support multiple customised options that may deviate from standard builds

Davidyz commented on 2025-08-26 07:34 (UTC)

Hi, thanks for submitting this package (and llama.cpp-cuda-f16 too)!

Is it possible to add GGML_CUDA_FA_ALL_QUANTS=ON to the build options (for this package and llama.cpp-cuda-f16)? This option gives more flexible KV cache quantization options (and combinations). The build time is indeed longer, but I personally don't think it's too bad. It'll be a nice addition for users seeking lower VRAM usage.

JamesMowery commented on 2025-08-02 01:27 (UTC) (edited on 2025-08-02 01:28 (UTC) by JamesMowery)

Just wanted to say thank you for getting this working. It's finally building now without me having to edit the PKGBUILD every time to force it to point to gcc-13. :)

« First ‹ Previous 1 2 3 4 5 6 7 8 9 10 11 Next › Last »