@sserhi you may want to manage that in your /etc/makepkg.conf
- MAKEFLAGS
Search Criteria
Package Details: llama.cpp-cuda b6774-1
Package Actions
Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
---|---|
Package Base: | llama.cpp-cuda |
Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
Upstream URL: | https://github.com/ggerganov/llama.cpp |
Licenses: | MIT |
Conflicts: | ggml, libggml, llama.cpp |
Provides: | llama.cpp |
Replaces: | llama.cpp-cuda-f16 |
Submitter: | txtsd |
Maintainer: | envolution |
Last Packager: | envolution |
Votes: | 11 |
Popularity: | 1.73 |
First Submitted: | 2024-10-26 20:17 (UTC) |
Last Updated: | 2025-10-16 01:49 (UTC) |
Dependencies (11)
- cuda (cuda11.1AUR, cuda-12.2AUR, cuda12.0AUR, cuda11.4AUR, cuda11.4-versionedAUR, cuda12.0-versionedAUR, cuda-12.5AUR)
- curl (curl-gitAUR, curl-c-aresAUR)
- gcc-libs (gcc-libs-gitAUR, gccrs-libs-gitAUR, gcc-libs-snapshotAUR)
- glibc (glibc-gitAUR, glibc-eacAUR)
- nvidia-utils (nvidia-410xx-utilsAUR, nvidia-440xx-utilsAUR, nvidia-430xx-utilsAUR, nvidia-340xx-utilsAUR, nvidia-525xx-utilsAUR, nvidia-510xx-utilsAUR, nvidia-535xx-utilsAUR, nvidia-utils-teslaAUR, nvidia-470xx-utilsAUR, nvidia-550xx-utilsAUR, nvidia-utils-betaAUR, nvidia-vulkan-utilsAUR, nvidia-390xx-utilsAUR)
- cmake (cmake3AUR, cmake-gitAUR) (make)
- python-numpy (python-numpy-gitAUR, python-numpy1AUR, python-numpy-mkl-binAUR, python-numpy-mklAUR, python-numpy-mkl-tbbAUR) (optional) – needed for convert_hf_to_gguf.py
- python-pytorch (python-pytorch-cxx11abiAUR, python-pytorch-cxx11abi-optAUR, python-pytorch-cxx11abi-cudaAUR, python-pytorch-cxx11abi-opt-cudaAUR, python-pytorch-cxx11abi-rocmAUR, python-pytorch-cxx11abi-opt-rocmAUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-rocm) (optional) – needed for convert_hf_to_gguf.py
- python-safetensorsAUR (python-safetensors-binAUR) (optional) – needed for convert_hf_to_gguf.py
- python-sentencepieceAUR (python-sentencepiece-gitAUR, python-sentencepiece-binAUR) (optional) – needed for convert_hf_to_gguf.py
- python-transformersAUR (optional) – needed for convert_hf_to_gguf.py
Required by (0)
Sources (3)
envolution commented on 2025-10-02 08:12 (UTC)
sserhii commented on 2025-10-01 23:45 (UTC) (edited on 2025-10-01 23:46 (UTC) by sserhii)
Hey,
Any chance to parallelize the build like this:
❯ git diff
diff --git a/PKGBUILD b/PKGBUILD
index 0a18397..056a382 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -96,7 +96,7 @@ build() {
_cmake_options+=(${aur_llamacpp_cmakeopts})
fi
cmake "${_cmake_options[@]}"
- cmake --build build
+ cmake --build build -j"$(nproc)"
}
It provides a noticeable speed up, at least for me: the build process takes about 2.5 minutes with the patch applied vs about 15 minutes without.
Thanks!
envolution commented on 2025-08-27 07:29 (UTC) (edited on 2025-08-27 07:31 (UTC) by envolution)
@undefinedmethod - the changes don't affect any of the build flags unless explicitly enabled so this seems possibly due to a recent commit - possibly https://github.com/ggml-org/llama.cpp/pull/15587
I'd need to see the output of makepkg -L
- this would create a build log which would show me what was detected at build time. You can link to it using a pastebin type service, or at https://github.com/envolution/aur/issues - For now you can try run aur_llamacpp_build_universal=true makepkg -fsi
to build the default cuda architectures - I also have prebuilds at https://github.com/envolution/aur/releases/tag/llama.cpp-cuda if you'd like to try that as it has the universal flag set already
undefinedmethod commented on 2025-08-27 07:16 (UTC)
@Davidyz not sure if this is to do with recent flags added to build process but lllama-server no longer detects GPU:
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes /home/undefinedmethod/.cache/yay/llama.cpp-cuda/src/llama.cpp/ggml/src/ggml-cuda/../ggml-cuda/common.cuh:112: ggml was not compiled with any CUDA arch <= 750
Tested on seperate machines, 4070 Super, 5070
envolution commented on 2025-08-26 10:23 (UTC)
Davidyz awesome, thanks for the info
Davidyz commented on 2025-08-26 10:01 (UTC)
Just came across this PR: CUDA: replace GGML_CUDA_F16 with CUDA arch checks
Davidyz commented on 2025-08-26 08:34 (UTC)
@envolution thanks for the quick response. That makes sense. It'll indeed be a good idea to just allow customizing extra build opts from an env var.
envolution commented on 2025-08-26 08:12 (UTC) (edited on 2025-08-26 08:13 (UTC) by envolution)
@Davidyz it'll update shortly with b6280
(just build testing). I've added aur_llamacpp_cmakeopts
to the build script, you can prepend it to makepkg like aur_llamacpp_cmakeopts="-DGGML_CUDA_FA_ALL_QUANTS=ON" makepkg
or just add it to ~/.bashrc
or ~/.bash_profile
- it should work fine with helpers too
envolution commented on 2025-08-26 07:41 (UTC)
@Davidyz I'm open to it, but I'd be more inclined to add it as an optional environment variable similarly to how the recipe sets GGML_NATIVE - let me think on it as I could possibly extend it to support multiple customised options that may deviate from standard builds
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip