Package Details: llama.cpp b5809-1

Git Clone URL: https://aur.archlinux.org/llama.cpp.git (read-only, click to copy)
Package Base: llama.cpp
Description: Port of Facebook's LLaMA model in C/C++
Upstream URL: https://github.com/ggerganov/llama.cpp
Licenses: MIT
Submitter: txtsd
Maintainer: txtsd
Last Packager: txtsd
Votes: 9
Popularity: 1.51
First Submitted: 2024-10-26 15:38 (UTC)
Last Updated: 2025-07-02 17:35 (UTC)

Pinned Comments

txtsd commented on 2024-10-26 20:14 (UTC) (edited on 2024-12-06 14:14 (UTC) by txtsd)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

Latest Comments

1 2 3 Next › Last »

txtsd commented on 2025-06-15 12:03 (UTC) (edited on 2025-06-15 12:03 (UTC) by txtsd)

This package now uses system libggml so it should work alongside whisper.cpp

Tests and examples building has been turned off.
kompute is removed.

visad commented on 2025-06-13 15:04 (UTC) (edited on 2025-06-13 15:04 (UTC) by visad)

Conflicts with i.e. libggml-git AUR package (needed by whisper.cpp package), as this includes its own version of libggml. Maybe some paths separation (i.e. move this to /usr/local) could help if internal libggml is required and shared version could not be used, or linking to libggml in dynamic fashion? As for now atleast the "conflicts" section of PKGBUILD should be filled in with libggml as the first measure I think :)

NoelJacob commented on 2025-06-05 01:10 (UTC)

Why build tests? switch off tests by default

cfillion commented on 2025-03-01 01:52 (UTC) (edited on 2025-03-01 02:11 (UTC) by cfillion)

@kelvie

  1. It doesn't actually use it, it's just downloaded for no reason (this package is the CPU backend). The nomic-ai fork is what llama.cpp uses for its Kompute backend (GGML_KOMPUTE=ON).
  2. CMAKE_BUILD_TYPE=None is the standard for building Arch packages. See the CMake package guidelines page in the wiki: makepkg provides its own release flags.

(llama.cpp assertions via GGML_ASSERT are always enabled in all build types.)

kelvie commented on 2025-03-01 01:20 (UTC)

2 questions:

  1. Why does this use a strange fork of kompute?
  2. Why is CMAKE_BUILD_TYPE set to none rather than Release? I think this leaves asserts enabled.

txtsd commented on 2025-02-04 04:41 (UTC)

@cfillion I got rid of OpenBLAS. Please let me know if it's working as expected.

cfillion commented on 2025-01-27 12:05 (UTC) (edited on 2025-01-27 12:51 (UTC) by cfillion)

Fair enough. Is OpenBLAS actually faster for most users though? It's unbearably slow here.

export TEXT="Arch Linux defines simplicity as without unnecessary additions or modifications. It ships software as released by the original developers (upstream) with minimal distribution-specific (downstream) changes: patches not accepted by upstream are avoided, and Arch's downstream patches consist almost entirely of backported bug fixes that are obsoleted by the project's next release."

# GGML_BLAS=ON GGML_BLAS_VENDOR=OpenBLAS
$ time llama-cli -m Mistral-Small-Instruct-2409-Q6_K_L.gguf -tb 16 -p "$TEXT" -n 1 -no-cnv
llama_perf_context_print: prompt eval time =   10825.25 ms /    76 tokens (  142.44 ms per token,     7.02 tokens per second)
291.99s user 1.91s system 2544% cpu 11.551 total

# GGML_BLAS=OFF
$ time llama-cli -m Mistral-Small-Instruct-2409-Q6_K_L.gguf -tb 16 -p "$TEXT" -n 1 -no-cnv
llama_perf_context_print: prompt eval time =    2248.24 ms /    76 tokens (   29.58 ms per token,    33.80 tokens per second)
40.49s user 0.54s system 1395% cpu 2.940 total

# GGML_BLAS=OFF (w/ multiprocessing to imitate OpenBLAS's behavior, ruling it out as a single cause)
$ time llama-cli -m Mistral-Small-Instruct-2409-Q6_K_L.gguf -tb 32 -p "$TEXT" -n 1 -no-cnv
llama_perf_context_print: prompt eval time =    2350.10 ms /    76 tokens (   30.92 ms per token,    32.34 tokens per second)
85.28s user 0.78s system 2784% cpu 3.091 total

txtsd commented on 2025-01-27 04:26 (UTC)

@cfillion I'm already maintaining several variants of this package. I don't think I'll add another variant unless it's popularly requested.

Thanks for the feedback though!

cfillion commented on 2025-01-26 23:07 (UTC) (edited on 2025-01-26 23:15 (UTC) by cfillion)

GGML_BLAS=ON with OpenBLAS is significantly slower at prompt processing on my CPU (9950X w/ n_threads = n_threads_batch = 16) than OFF. Could there be a variant of the package without it?

Also, isn't the kompute submodule unnecessary/unused unless GLML_KOMPUTE=ON is set? (which would be for a separate llama.cpp-kompute package?)

txtsd commented on 2024-10-27 12:19 (UTC)

@abitrolly Because of the maintenance overhead. It's easier to main as separate packages.