Package Details: llama.cpp-cuda b5845-1

Git Clone URL: https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy)
Package Base: llama.cpp-cuda
Description: Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations)
Upstream URL: https://github.com/ggerganov/llama.cpp
Licenses: MIT
Conflicts: ggml, libggml, llama.cpp
Provides: llama.cpp
Submitter: txtsd
Maintainer: txtsd
Last Packager: txtsd
Votes: 8
Popularity: 1.66
First Submitted: 2024-10-26 20:17 (UTC)
Last Updated: 2025-07-08 13:50 (UTC)

Pinned Comments

txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

Latest Comments

« First ‹ Previous 1 2 3 Next › Last »

ioctl commented on 2024-12-14 11:17 (UTC)

I have errors running this app on the latest Archlinux on the GeForce RTX 3060 .

The first, there a lot of the following build warning: "nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used"

Then, there are a lot of run errors: "/home/build/.cache/yay/llama.cpp-cuda/src/llama.cpp/ggml/src/ggml-cuda/mmv.cu:51: ERROR: CUDA kernel mul_mat_vec has no device code compatible with CUDA arch 520. ggml-cuda.cu was compiled for: 520"

Setting correct (to my hardware) number instead of "native" in the -DCMAKE_CUDA_ARCHITECTURES cmake option fixes this problem.

txtsd commented on 2024-12-06 13:37 (UTC)

@v1993 I've uploaded llama.cpp-cuda-f16. Please let me know if it works as expected!

txtsd commented on 2024-12-02 02:25 (UTC)

I'll give it a look later today and see if a newer package is warranted in that case. Thanks for your input!

v1993 commented on 2024-12-01 14:53 (UTC)

To be honest, I'm not 100% sure (it's a pretty old option and tacking down its origins is kinda tricky), but I'd expect at least a performance degradation on older GPUs (Nvidia used to be really bad at fp16 on older architectures).

txtsd commented on 2024-12-01 14:38 (UTC)

@v1993 Does that have to be a separate package, or will making the change in this package suffice without breaking things for users of older GPUs?

v1993 commented on 2024-12-01 14:29 (UTC)

Would it be possible to have a package version with GGML_CUDA_F16 enabled? It's a nice performance boost on newer GPUs. Thank you for your work on this package!

Poscat commented on 2024-11-28 09:46 (UTC)

@txtsd thank you

txtsd commented on 2024-11-25 07:05 (UTC)

Builds are not static anymore, and the service file has been fixed.

txtsd commented on 2024-11-24 03:16 (UTC)

@Poscat Thank you for your input! The service file was inherited from a previous version and maintainer of the package. I admit that the service was not tested.

The static builds were created to allow for side-by-side installation with whisper.cpp, since they both install libggml files.

Poscat commented on 2024-11-24 03:12 (UTC)

diff --git a/llama.cpp.service b/llama.cpp.service
index 4678d85..be89f9b 100644
--- a/llama.cpp.service
+++ b/llama.cpp.service
@@ -7,7 +7,7 @@ Type=simple
 EnvironmentFile=/etc/conf.d/llama.cpp
 ExecStart=/usr/bin/llama-server $LLAMA_ARGS
 ExecReload=/bin/kill -s HUP $MAINPID
-Restart=never
+Restart=no

 [Install]
 WantedBy=multi-user.target

Also your sysetmd service file is wrong. Did you even test your package?