I recommend include the model template files into the package
https://github.com/ggml-org/llama.cpp/tree/master/models/templates
so we can choose the model template file directly, no need to download these again
Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
---|---|
Package Base: | llama.cpp-cuda |
Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
Upstream URL: | https://github.com/ggerganov/llama.cpp |
Licenses: | MIT |
Conflicts: | libggml, llama.cpp |
Provides: | llama.cpp |
Submitter: | txtsd |
Maintainer: | txtsd |
Last Packager: | txtsd |
Votes: | 6 |
Popularity: | 1.53 |
First Submitted: | 2024-10-26 20:17 (UTC) |
Last Updated: | 2025-02-21 17:51 (UTC) |
I recommend include the model template files into the package
https://github.com/ggml-org/llama.cpp/tree/master/models/templates
so we can choose the model template file directly, no need to download these again
You should export CUDA_PATH and NVCC_CCBIN.
Check /etc/profile.d/cuda.sh
To get this to pass cmake I had to edit the PKGBUILD and add cmake options:
-DCMAKE_CUDA_COMPILER=/opt/cuda/bin/nvcc
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-13
I tried pointing it to NVCC via environment variables but it ended up using the wrong GCC version if I did that, which caused compiler errors in CMakeDetermineCompilerId.cmake:865
.
@txtsd, setting CMAKE_CUDA_ARCHITECTURES to my hardware number fixes this problem.
This error appears on the build stage, so it can be reproduced without video card.
@ioctl Sorry, I don't have the necessary hardware to test. Does not setting CMAKE_CUDA_ARCHITECTURES
make it work correctly?
I have errors running this app on the latest Archlinux on the GeForce RTX 3060 .
The first, there a lot of the following build warning: "nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used"
Then, there are a lot of run errors: "/home/build/.cache/yay/llama.cpp-cuda/src/llama.cpp/ggml/src/ggml-cuda/mmv.cu:51: ERROR: CUDA kernel mul_mat_vec has no device code compatible with CUDA arch 520. ggml-cuda.cu was compiled for: 520"
Setting correct (to my hardware) number instead of "native" in the -DCMAKE_CUDA_ARCHITECTURES cmake option fixes this problem.
@v1993 I've uploaded llama.cpp-cuda-f16. Please let me know if it works as expected!
I'll give it a look later today and see if a newer package is warranted in that case. Thanks for your input!
To be honest, I'm not 100% sure (it's a pretty old option and tacking down its origins is kinda tricky), but I'd expect at least a performance degradation on older GPUs (Nvidia used to be really bad at fp16 on older architectures).
@v1993 Does that have to be a separate package, or will making the change in this package suffice without breaking things for users of older GPUs?
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip