This package now uses system libggml so it should work alongside whisper.cpp
Tests and examples building has been turned off.
kompute is removed.
Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
---|---|
Package Base: | llama.cpp-cuda |
Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
Upstream URL: | https://github.com/ggerganov/llama.cpp |
Licenses: | MIT |
Conflicts: | llama.cpp |
Provides: | llama.cpp |
Submitter: | txtsd |
Maintainer: | txtsd |
Last Packager: | txtsd |
Votes: | 6 |
Popularity: | 0.140905 |
First Submitted: | 2024-10-26 20:17 (UTC) |
Last Updated: | 2025-06-20 14:59 (UTC) |
This package now uses system libggml so it should work alongside whisper.cpp
Tests and examples building has been turned off.
kompute is removed.
Why are you using Kompute, particularly from the fork (https://github.com/nomic-ai/kompute.git - fork insted of mainline https://github.com/KomputeProject/kompute repo), specifically for llama.cpp CUDA version? I think it would be more convenient to create a separate package for the Kompute backend and have llama.cpp-cuda depend only on CUDA-related dependencies? without Kompute backend dependencies.
llama.cpp-cuda: /usr/lib/libggml-base.so exists in the file system (owned by whisper.cpp-cuda)
llama.cpp-cuda: /usr/lib/libggml-cpu.so exists in the file system (owned by whisper.cpp-cuda)
llama.cpp-cuda: /usr/lib/libggml-cuda.so exists in the file system (owned by whisper.cpp-cuda)
llama.cpp-cuda: /usr/lib/libggml.so exists in the file system (owned by whisper.cpp-cuda)
An error occurred, and no packages were updated.
-> Error during installation: [/home/chi/.cache/yay/llama.cpp-cuda/llama.cpp-cuda-b4762-1-x86_64.pkg.tar.zst] - exit status 1
I recommend include the model template files into the package
https://github.com/ggml-org/llama.cpp/tree/master/models/templates
so we can choose the model template file directly, no need to download these again
You should export CUDA_PATH and NVCC_CCBIN.
Check /etc/profile.d/cuda.sh
To get this to pass cmake I had to edit the PKGBUILD and add cmake options:
-DCMAKE_CUDA_COMPILER=/opt/cuda/bin/nvcc
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-13
I tried pointing it to NVCC via environment variables but it ended up using the wrong GCC version if I did that, which caused compiler errors in CMakeDetermineCompilerId.cmake:865
.
@txtsd, setting CMAKE_CUDA_ARCHITECTURES to my hardware number fixes this problem.
This error appears on the build stage, so it can be reproduced without video card.
@ioctl Sorry, I don't have the necessary hardware to test. Does not setting CMAKE_CUDA_ARCHITECTURES
make it work correctly?
I have errors running this app on the latest Archlinux on the GeForce RTX 3060 .
The first, there a lot of the following build warning: "nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used"
Then, there are a lot of run errors: "/home/build/.cache/yay/llama.cpp-cuda/src/llama.cpp/ggml/src/ggml-cuda/mmv.cu:51: ERROR: CUDA kernel mul_mat_vec has no device code compatible with CUDA arch 520. ggml-cuda.cu was compiled for: 520"
Setting correct (to my hardware) number instead of "native" in the -DCMAKE_CUDA_ARCHITECTURES cmake option fixes this problem.
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip