Package Details: llama.cpp-cuda b9305-1

Git Clone URL: https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy)
Package Base: llama.cpp-cuda
Description: Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations)
Upstream URL: https://github.com/ggml-org/llama.cpp
Licenses: MIT
Conflicts: ggml, libggml, llama.cpp
Provides: llama.cpp
Submitter: txtsd
Maintainer: fabse
Last Packager: fabse
Votes: 17
Popularity: 1.77
First Submitted: 2024-10-26 20:17 (UTC)
Last Updated: 2026-05-24 12:33 (UTC)

Dependencies (18)

Required by (5)

Sources (3)

Pinned Comments

txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

Latest Comments

« First ‹ Previous 1 2 3 4 5 6 7 8 9 10 Next › Last »

envolution commented on 2025-08-26 10:23 (UTC)

Davidyz awesome, thanks for the info

Davidyz commented on 2025-08-26 10:01 (UTC)

Just came across this PR: CUDA: replace GGML_CUDA_F16 with CUDA arch checks

Davidyz commented on 2025-08-26 08:34 (UTC)

@envolution thanks for the quick response. That makes sense. It'll indeed be a good idea to just allow customizing extra build opts from an env var.

envolution commented on 2025-08-26 08:12 (UTC) (edited on 2025-08-26 08:13 (UTC) by envolution)

@Davidyz it'll update shortly with b6280 (just build testing). I've added aur_llamacpp_cmakeopts to the build script, you can prepend it to makepkg like aur_llamacpp_cmakeopts="-DGGML_CUDA_FA_ALL_QUANTS=ON" makepkg or just add it to ~/.bashrc or ~/.bash_profile - it should work fine with helpers too

envolution commented on 2025-08-26 07:41 (UTC)

@Davidyz I'm open to it, but I'd be more inclined to add it as an optional environment variable similarly to how the recipe sets GGML_NATIVE - let me think on it as I could possibly extend it to support multiple customised options that may deviate from standard builds

Davidyz commented on 2025-08-26 07:34 (UTC)

Hi, thanks for submitting this package (and llama.cpp-cuda-f16 too)!

Is it possible to add GGML_CUDA_FA_ALL_QUANTS=ON to the build options (for this package and llama.cpp-cuda-f16)? This option gives more flexible KV cache quantization options (and combinations). The build time is indeed longer, but I personally don't think it's too bad. It'll be a nice addition for users seeking lower VRAM usage.

JamesMowery commented on 2025-08-02 01:27 (UTC) (edited on 2025-08-02 01:28 (UTC) by JamesMowery)

Just wanted to say thank you for getting this working. It's finally building now without me having to edit the PKGBUILD every time to force it to point to gcc-13. :)

envolution commented on 2025-08-01 21:46 (UTC)

@AlxQ should be okay now - thanks for the report

envolution commented on 2025-08-01 19:48 (UTC)

@AlxQ it's picking up the commit from the aur repo somehow - I can have a look into this later today.

.git/logs/refs/remotes/origin/HEAD:0000000000000000000000000000000000000000 9ca426da1fdf involution involution@gmail.com 1754071863 -0400 clone: from ssh://aur.archlinux.org/llama.cpp-cuda.git

It's definitely the right upstream code - just not properly reflected in the version strings

AlxQ commented on 2025-08-01 16:30 (UTC)

The current PKGBUILD builds only b982-9ca426d build

$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
version: 982 (9ca426d)
built with cc (GCC) 15.1.1 20250729 for x86_64-pc-linux-gnu