Search Criteria
Package Details: llama.cpp-cuda b9305-1
Package Actions
| Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
|---|---|
| Package Base: | llama.cpp-cuda |
| Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
| Upstream URL: | https://github.com/ggml-org/llama.cpp |
| Licenses: | MIT |
| Conflicts: | ggml, libggml, llama.cpp |
| Provides: | llama.cpp |
| Submitter: | txtsd |
| Maintainer: | fabse |
| Last Packager: | fabse |
| Votes: | 17 |
| Popularity: | 1.77 |
| First Submitted: | 2024-10-26 20:17 (UTC) |
| Last Updated: | 2026-05-24 12:33 (UTC) |
Dependencies (18)
- cuda (cuda11.1AUR, cuda-12.2AUR, cuda12.0AUR, cuda11.4AUR, cuda-12.5AUR, cuda-12.9AUR, cuda-12.8AUR, cuda-pascalAUR)
- curl (curl-gitAUR, curl-c-aresAUR)
- gcc-libs (gcc-libs-gitAUR, gccrs-libs-gitAUR, gcc-libs-snapshotAUR)
- glibc (glibc-gitAUR, glibc-eacAUR, glibc-git-native-pgoAUR)
- nvidia-utils (nvidia-410xx-utilsAUR, nvidia-440xx-utilsAUR, nvidia-430xx-utilsAUR, nvidia-340xx-utilsAUR, nvidia-510xx-utilsAUR, nvidia-utils-teslaAUR, nvidia-575xx-utilsAUR, nvidia-340xx-utils-macbookAUR, nvidia-535xx-utilsAUR, nvidia-utils-betaAUR, nvidia-470xx-utilsAUR, nvidia-390xx-utilsAUR, nvidia-550xx-utilsAUR, nvidia-580xx-utilsAUR, nvidia-vulkan-utilsAUR, nvidia-525xx-utilsAUR)
- python
- cmake (cmake3AUR, cmake-gitAUR) (make)
- cudnn (cudnn9.10-cuda12.9AUR, cudnn-pascalAUR) (make)
- git (git-gitAUR, git-glAUR, git-wd40AUR) (make)
- ninja (ninja-gitAUR, ninja-memAUR, ninja-noemacs-gitAUR, ninja-kitwareAUR, ninja-fuchsia-gitAUR, n2-ninja-symlinkAUR) (make)
- shaderc (shaderc-gitAUR, shaderc-gitAUR) (make)
- nccl (nccl-cuda12.9AUR, nccl-gitAUR) (optional) – needed for multi-GPU parallelism
- python-ggufAUR (python-gguf-gitAUR) (optional) – needed for convert_hf_to_gguf.py
- python-numpy (python-numpy-gitAUR, python-numpy-mkl-binAUR, python-numpy1AUR, python-numpy-mkl-tbbAUR, python-numpy-mklAUR) (optional) – needed for convert_hf_to_gguf.py
- python-pytorch (python-pytorch-cuda12.9AUR, python-pytorch-opt-cuda12.9AUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-rocm) (optional) – needed for convert_hf_to_gguf.py
- python-safetensors (optional) – needed for convert_hf_to_gguf.py
- python-sentencepieceAUR (python-sentencepiece-gitAUR, python-sentencepiece-binAUR) (optional) – needed for convert_hf_to_gguf.py
- python-transformersAUR (python-transformers-gitAUR) (optional) – needed for convert_hf_to_gguf.py
Required by (5)
- llamaman-bin (requires llama.cpp) (optional)
- scmd-bin (requires llama.cpp)
- voxd (requires llama.cpp) (optional)
- voxd-bin (requires llama.cpp) (optional)
- voxd-git (requires llama.cpp) (optional)
Sources (3)
Latest Comments
« First ‹ Previous 1 2 3 4 5 6 7 8 9 10 Next › Last »
envolution commented on 2025-08-26 10:23 (UTC)
Davidyz commented on 2025-08-26 10:01 (UTC)
Just came across this PR: CUDA: replace GGML_CUDA_F16 with CUDA arch checks
Davidyz commented on 2025-08-26 08:34 (UTC)
@envolution thanks for the quick response. That makes sense. It'll indeed be a good idea to just allow customizing extra build opts from an env var.
envolution commented on 2025-08-26 08:12 (UTC) (edited on 2025-08-26 08:13 (UTC) by envolution)
@Davidyz it'll update shortly with b6280 (just build testing). I've added aur_llamacpp_cmakeopts to the build script, you can prepend it to makepkg like aur_llamacpp_cmakeopts="-DGGML_CUDA_FA_ALL_QUANTS=ON" makepkg or just add it to ~/.bashrc or ~/.bash_profile - it should work fine with helpers too
envolution commented on 2025-08-26 07:41 (UTC)
@Davidyz I'm open to it, but I'd be more inclined to add it as an optional environment variable similarly to how the recipe sets GGML_NATIVE - let me think on it as I could possibly extend it to support multiple customised options that may deviate from standard builds
Davidyz commented on 2025-08-26 07:34 (UTC)
Hi, thanks for submitting this package (and llama.cpp-cuda-f16 too)!
Is it possible to add GGML_CUDA_FA_ALL_QUANTS=ON to the build options (for this package and llama.cpp-cuda-f16)? This option gives more flexible KV cache quantization options (and combinations). The build time is indeed longer, but I personally don't think it's too bad. It'll be a nice addition for users seeking lower VRAM usage.
JamesMowery commented on 2025-08-02 01:27 (UTC) (edited on 2025-08-02 01:28 (UTC) by JamesMowery)
Just wanted to say thank you for getting this working. It's finally building now without me having to edit the PKGBUILD every time to force it to point to gcc-13. :)
envolution commented on 2025-08-01 21:46 (UTC)
@AlxQ should be okay now - thanks for the report
envolution commented on 2025-08-01 19:48 (UTC)
@AlxQ it's picking up the commit from the aur repo somehow - I can have a look into this later today.
.git/logs/refs/remotes/origin/HEAD:0000000000000000000000000000000000000000 9ca426da1fdf involution involution@gmail.com 1754071863 -0400 clone: from ssh://aur.archlinux.org/llama.cpp-cuda.git
It's definitely the right upstream code - just not properly reflected in the version strings
AlxQ commented on 2025-08-01 16:30 (UTC)
The current PKGBUILD builds only b982-9ca426d build
$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
version: 982 (9ca426d)
built with cc (GCC) 15.1.1 20250729 for x86_64-pc-linux-gnu
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip