@ioctl Sorry, I don't have the necessary hardware to test. Does not setting CMAKE_CUDA_ARCHITECTURES
make it work correctly?
Search Criteria
Package Details: llama.cpp-cuda b4375-1
Package Actions
Git Clone URL: | https://aur.archlinux.org/llama.cpp-cuda.git (read-only, click to copy) |
---|---|
Package Base: | llama.cpp-cuda |
Description: | Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations) |
Upstream URL: | https://github.com/ggerganov/llama.cpp |
Licenses: | MIT |
Conflicts: | libggml, llama.cpp |
Provides: | llama.cpp |
Submitter: | txtsd |
Maintainer: | txtsd |
Last Packager: | txtsd |
Votes: | 5 |
Popularity: | 2.40 |
First Submitted: | 2024-10-26 20:17 (UTC) |
Last Updated: | 2024-12-22 11:12 (UTC) |
Dependencies (12)
- blas-openblas
- blas64-openblas
- cuda (cuda11.1AUR, cuda-12.2AUR, cuda12.0AUR, cuda11.4AUR, cuda11.4-versionedAUR, cuda12.0-versionedAUR)
- curl (curl-quiche-gitAUR, curl-http3-ngtcp2AUR, curl-gitAUR, curl-c-aresAUR)
- gcc-libs (gcc-libs-gitAUR, gccrs-libs-gitAUR, gcc11-libsAUR, gcc-libs-snapshotAUR)
- glibc (glibc-gitAUR, glibc-linux4AUR, glibc-eacAUR, glibc-eac-binAUR, glibc-eac-rocoAUR)
- openmp
- python (python37AUR, python311AUR, python310AUR)
- python-numpy (python-numpy-flameAUR, python-numpy-gitAUR, python-numpy1AUR, python-numpy-mkl-binAUR, python-numpy-mkl-tbbAUR, python-numpy-mklAUR)
- python-sentencepieceAUR (python-sentencepiece-gitAUR)
- cmake (cmake-gitAUR) (make)
- git (git-gitAUR, git-glAUR) (make)
Required by (0)
Sources (4)
txtsd commented on 2024-12-15 15:23 (UTC)
ioctl commented on 2024-12-14 11:17 (UTC)
I have errors running this app on the latest Archlinux on the GeForce RTX 3060 .
The first, there a lot of the following build warning: "nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used"
Then, there are a lot of run errors: "/home/build/.cache/yay/llama.cpp-cuda/src/llama.cpp/ggml/src/ggml-cuda/mmv.cu:51: ERROR: CUDA kernel mul_mat_vec has no device code compatible with CUDA arch 520. ggml-cuda.cu was compiled for: 520"
Setting correct (to my hardware) number instead of "native" in the -DCMAKE_CUDA_ARCHITECTURES cmake option fixes this problem.
txtsd commented on 2024-12-06 13:37 (UTC)
@v1993 I've uploaded llama.cpp-cuda-f16. Please let me know if it works as expected!
txtsd commented on 2024-12-02 02:25 (UTC)
I'll give it a look later today and see if a newer package is warranted in that case. Thanks for your input!
v1993 commented on 2024-12-01 14:53 (UTC)
To be honest, I'm not 100% sure (it's a pretty old option and tacking down its origins is kinda tricky), but I'd expect at least a performance degradation on older GPUs (Nvidia used to be really bad at fp16 on older architectures).
txtsd commented on 2024-12-01 14:38 (UTC)
@v1993 Does that have to be a separate package, or will making the change in this package suffice without breaking things for users of older GPUs?
v1993 commented on 2024-12-01 14:29 (UTC)
Would it be possible to have a package version with GGML_CUDA_F16
enabled? It's a nice performance boost on newer GPUs. Thank you for your work on this package!
Poscat commented on 2024-11-28 09:46 (UTC)
@txtsd thank you
txtsd commented on 2024-11-25 07:05 (UTC)
Builds are not static anymore, and the service file has been fixed.
txtsd commented on 2024-11-24 03:16 (UTC)
@Poscat Thank you for your input! The service file was inherited from a previous version and maintainer of the package. I admit that the service was not tested.
The static builds were created to allow for side-by-side installation with whisper.cpp
, since they both install libggml files.
Pinned Comments
txtsd commented on 2024-10-26 20:17 (UTC) (edited on 2024-12-06 14:15 (UTC) by txtsd)
Alternate versions
llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip