AUR (en) - llama.cpp-cuda-f16

Search Criteria

Enter search criteria

Search by

Keywords

Out of Date

Sort by

Sort order

Per page

Package Details: llama.cpp-cuda-f16 b6075-1

Package Actions

Git Clone URL:	https://aur.archlinux.org/llama.cpp-cuda-f16.git (read-only, click to copy)
Package Base:	llama.cpp-cuda-f16
Description:	Port of Facebook's LLaMA model in C/C++ (with NVIDIA CUDA optimizations)
Upstream URL:	https://github.com/ggerganov/llama.cpp
Licenses:	MIT
Conflicts:	ggml, libggml, llama.cpp
Provides:	llama.cpp
Submitter:	txtsd
Maintainer:	envolution
Last Packager:	envolution
Votes:	3
Popularity:	0.061910
First Submitted:	2024-12-06 13:35 (UTC)
Last Updated:	2025-08-02 21:34 (UTC)

Dependencies (11)

cuda (cuda11.1^AUR, cuda-12.2^AUR, cuda12.0^AUR, cuda11.4^AUR, cuda11.4-versioned^AUR, cuda12.0-versioned^AUR)
curl (curl-git^AUR, curl-c-ares^AUR)
gcc-libs (gcc-libs-git^AUR, gccrs-libs-git^AUR, gcc-libs-snapshot^AUR)
glibc (glibc-git^AUR, glibc-linux4^AUR, glibc-eac^AUR)
nvidia-utils (nvidia-410xx-utils^AUR, nvidia-440xx-utils^AUR, nvidia-430xx-utils^AUR, nvidia-340xx-utils^AUR, nvidia-470xx-utils^AUR, nvidia-utils-tesla^AUR, nvidia-550xx-utils^AUR, nvidia-565xx-utils^AUR, nvidia-525xx-utils^AUR, nvidia-510xx-utils^AUR, nvidia-390xx-utils^AUR, nvidia-vulkan-utils^AUR, nvidia-utils-beta^AUR, nvidia-535xx-utils^AUR)
cmake (cmake3^AUR, cmake-git^AUR) (make)
python-numpy (python-numpy-git^AUR, python-numpy1^AUR, python-numpy-mkl-tbb^AUR, python-numpy-mkl^AUR, python-numpy-mkl-bin^AUR) (optional) – needed for convert_hf_to_gguf.py
python-pytorch (python-pytorch-cxx11abi^AUR, python-pytorch-cxx11abi-opt^AUR, python-pytorch-cxx11abi-cuda^AUR, python-pytorch-cxx11abi-opt-cuda^AUR, python-pytorch-cxx11abi-rocm^AUR, python-pytorch-cxx11abi-opt-rocm^AUR, python-pytorch-cuda, python-pytorch-opt, python-pytorch-opt-cuda, python-pytorch-opt-rocm, python-pytorch-rocm) (optional) – needed for convert_hf_to_gguf.py
python-safetensors^AUR (python-safetensors-bin^AUR) (optional) – needed for convert_hf_to_gguf.py
python-sentencepiece^AUR (python-sentencepiece-git^AUR) (optional) – needed for convert_hf_to_gguf.py
python-transformers^AUR (optional) – needed for convert_hf_to_gguf.py

Required by (0)

Sources (3)

Pinned Comments

txtsd commented on 2024-12-06 14:15 (UTC)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

Latest Comments

1 2 Next › Last »

JamesMowery commented on 2025-08-02 01:27 (UTC) (edited on 2025-08-02 01:28 (UTC) by JamesMowery)

Just wanted to say thank you for getting this working. It's finally building now without me having to edit the PKGBUILD every time to force it to point to gcc-13. :)

envolution commented on 2025-07-31 06:19 (UTC)

just a couple of minor changes in the build

python modules are now identified as optional for the conversion script
changed from git to tar.gz due to the git repo initial sync being so large

JamesMowery commented on 2025-07-10 01:35 (UTC) (edited on 2025-07-10 02:16 (UTC) by JamesMowery)

Finally I have a solution! Courtesy of Gemini Pro 2.5 and about 10 back and forths with it. Apparently the issue is that I have GCC 15 installed, and nvcc requires GCC 13. So this is what I had to do:

Had to install gcc13 from AUR (this took like 45 minutes)
Had to change the PKGBUILD options section:

options=(!lto !debug)

Had to add to the PKGBUILD build() function:

-DCMAKE_C_COMPILER=/usr/bin/gcc-13
-DCMAKE_CXX_COMPILER=/usr/bin/g++-13
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-13

makepkg -si
Had to approve the removal of libggml-cuda-f16-git

So many insane hoops to go through to get this working. I don't really understand why I'm the only one having this issue? Is it maybe because I'm using CachyOS?

Either way, if anyone runs into the craziness I've gone through, I hope the above instructions help get you back up and running (at least temporarily)! I've tested it for about 30 minutes and everything seems pretty good.

@txtsd is there anything you can take from the above to maybe get this working with the existing PKGBUILD? Maybe somehow forcing nvcc + gcc to play together nicer? Much appreciated!

JamesMowery commented on 2025-07-09 03:26 (UTC) (edited on 2025-07-09 03:28 (UTC) by JamesMowery)

I just gave it another shot. Getting a new error now:

-- Adding CPU backend variant ggml-cpu: -march=native
-- Found CUDAToolkit: /opt/cuda/targets/x86_64-linux/include (found version "12.9.86")
-- CUDA Toolkit found
-- Using CUDA architectures: native
CMake Error at /usr/share/cmake/Modules/CMakeDetermineCompilerId.cmake:909 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

  Compiler: /opt/cuda/bin/nvcc

  Build flags:

  Id flags: --keep;--keep-dir;tmp -v



  The output was:

  2

  nvcc warning : Support for offline compilation for architectures prior to
  '<compute/sm/lto>_75' will be removed in a future release (Use
  -Wno-deprecated-gpu-targets to suppress warning).

  #$ _NVVM_BRANCH_=nvvm

  #$ _SPACE_=

  #$ _CUDART_=cudart

  #$ _HERE_=/opt/cuda/bin

  #$ _THERE_=/opt/cuda/bin

  #$ _TARGET_SIZE_=

  #$ _TARGET_DIR_=

  #$ _TARGET_DIR_=targets/x86_64-linux

  #$ TOP=/opt/cuda/bin/..

  #$ CICC_PATH=/opt/cuda/bin/../nvvm/bin

  #$ NVVMIR_LIBRARY_DIR=/opt/cuda/bin/../nvvm/libdevice

  #$ LD_LIBRARY_PATH=/opt/cuda/bin/../lib:/opt/cuda/lib64

  #$
  PATH=/opt/cuda/bin/../nvvm/bin:/opt/cuda/bin:/opt/cuda/bin:/home/james/miniforge3/condabin:/home/james/.local/bin:/opt/miniconda3/bin:/home/james/Scripts:/usr/local/bin:/usr/bin:/home/james/.lmstudio/bin

  #$ INCLUDES="-I/opt/cuda/bin/../targets/x86_64-linux/include"

  #$ LIBRARIES= "-L/opt/cuda/bin/../targets/x86_64-linux/lib/stubs"
  "-L/opt/cuda/bin/../targets/x86_64-linux/lib"

  #$ CUDAFE_FLAGS=

  #$ PTXAS_FLAGS=

  #$ rm tmp/a_dlink.reg.c

  #$ gcc -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -E -x c++ -D__CUDACC__
  -D__NVCC__ "-I/opt/cuda/bin/../targets/x86_64-linux/include"
  -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=9
  -D__CUDACC_VER_BUILD__=86 -D__CUDA_API_VER_MAJOR__=12
  -D__CUDA_API_VER_MINOR__=9 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1
  -D__CUDACC_DEVICE_ATOMIC_BUILTINS__=1 -include "cuda_runtime.h" -m64
  "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp4.ii"

  #$ cudafe++ --c++17 --gnu_version=150101 --display_error_number
  --orig_src_file_name "CMakeCUDACompilerId.cu" --orig_src_path_name
  "/mnt/big/.cache/paru/clone/llama.cpp-cuda/src/build/CMakeFiles/4.0.3-dirty/CompilerIdCUDA/CMakeCUDACompilerId.cu"
  --allow_managed --m64 --parse_templates --gen_c_file_name
  "tmp/CMakeCUDACompilerId.cudafe1.cpp" --stub_file_name
  "CMakeCUDACompilerId.cudafe1.stub.c" --gen_module_id_file
  --module_id_file_name "tmp/CMakeCUDACompilerId.module_id"
  "tmp/CMakeCUDACompilerId.cpp4.ii"

  /usr/include/c++/15.1.1/type_traits(554): error: type name is not allowed

        : public __bool_constant<__is_pointer(_Tp)>
                                              ^
...
...
...

  /usr/include/c++/15.1.1/bits/stl_algobase.h(1239): error: type name is not
  allowed

        || __is_pointer(_ValueType1)
                        ^



  /usr/include/c++/15.1.1/bits/stl_algobase.h(1412): error: type name is not
  allowed

      && __is_pointer(_II1) && __is_pointer(_II2)
                      ^



  /usr/include/c++/15.1.1/bits/stl_algobase.h(1412): error: type name is not
  allowed

      && __is_pointer(_II1) && __is_pointer(_II2)
                                            ^



  25 errors detected in the compilation of "CMakeCUDACompilerId.cu".

  # --error 0x2 --





Call Stack (most recent call first):
  /usr/share/cmake/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
  /usr/share/cmake/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
  /usr/share/cmake/Modules/CMakeDetermineCUDACompiler.cmake:139 (CMAKE_DETERMINE_COMPILER_ID)
  ggml/src/ggml-cuda/CMakeLists.txt:43 (enable_language)


-- Configuring incomplete, errors occurred!
==> ERROR: A failure occurred in build().
    Aborting...
error: failed to build 'llama.cpp-cuda-b5849-1':
error: packages failed to build: llama.cpp-cuda-b5849-1

This is for the llama.cpp-cuda, but same exact error for f16 as well.

brewkro commented on 2025-07-09 01:27 (UTC)

Currently working as of current version. Not getting the build error anymore. I just want to say thanks to @txtsd

JamesMowery commented on 2025-07-06 20:24 (UTC) (edited on 2025-07-06 20:25 (UTC) by JamesMowery)

Update 2: Disregard the "fix" I posted below. All the models are not loading into memory now (seems like the block/layer loading is busted) and my computer is crashing often. So I'm guessing there really is a deeper problem. This is so darn frustrating.

JamesMowery commented on 2025-07-06 19:38 (UTC) (edited on 2025-07-06 19:42 (UTC) by JamesMowery)

Update: I got it working by changing this in the PKGBUILD:

-DLLAMA_USE_SYSTEM_GGML=OFF

and then I had to delete the following files:

llama.cpp-cuda-f16: /usr/include/ggml-alloc.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-backend.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-blas.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cann.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cpp.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cpu.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-cuda.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-metal.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-opt.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-rpc.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-sycl.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml-vulkan.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/ggml.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/include/gguf.h exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/cmake/ggml/ggml-config.cmake exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/cmake/ggml/ggml-version.cmake exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/libggml-base.so exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/libggml-cpu.so exists in filesystem (owned by libggml-cuda-f16-git)
llama.cpp-cuda-f16: /usr/lib/libggml.so exists in filesystem (owned by libggml-cuda-f16-git)

After doing that, It appears to be working just fine now.

I don't know how to modify the PKGBUILD to make it delete all those files automatically and stuff. I know it's probably very bad to do it this way. But I'm glad I got a temporary fix in order until this gets sorted.

txtsd commented on 2025-07-04 02:28 (UTC)

Must be something about recent commits then. llama.cpp is a very fast moving software and is prone to bugs during building due to the nature of the package.

brewkro commented on 2025-07-03 22:00 (UTC)

I'm getting the same error that @JamesMowery is getting. I've tried a rebuild of both lib.ggml and both versions of the llama.cpp-cuda as well.