Package Details: llama.cpp b4681-1

Git Clone URL: https://aur.archlinux.org/llama.cpp.git (read-only, click to copy)
Package Base: llama.cpp
Description: Port of Facebook's LLaMA model in C/C++
Upstream URL: https://github.com/ggerganov/llama.cpp
Licenses: MIT
Submitter: txtsd
Maintainer: txtsd
Last Packager: txtsd
Votes: 6
Popularity: 1.23
First Submitted: 2024-10-26 15:38 (UTC)
Last Updated: 2025-02-10 11:02 (UTC)

Pinned Comments

txtsd commented on 2024-10-26 20:14 (UTC) (edited on 2024-12-06 14:14 (UTC) by txtsd)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

Latest Comments

1 2 Next › Last »

txtsd commented on 2025-02-04 04:41 (UTC)

@cfillion I got rid of OpenBLAS. Please let me know if it's working as expected.

cfillion commented on 2025-01-27 12:05 (UTC) (edited on 2025-01-27 12:51 (UTC) by cfillion)

Fair enough. Is OpenBLAS actually faster for most users though? It's unbearably slow here.

export TEXT="Arch Linux defines simplicity as without unnecessary additions or modifications. It ships software as released by the original developers (upstream) with minimal distribution-specific (downstream) changes: patches not accepted by upstream are avoided, and Arch's downstream patches consist almost entirely of backported bug fixes that are obsoleted by the project's next release."

# GGML_BLAS=ON GGML_BLAS_VENDOR=OpenBLAS
$ time llama-cli -m Mistral-Small-Instruct-2409-Q6_K_L.gguf -tb 16 -p "$TEXT" -n 1 -no-cnv
llama_perf_context_print: prompt eval time =   10825.25 ms /    76 tokens (  142.44 ms per token,     7.02 tokens per second)
291.99s user 1.91s system 2544% cpu 11.551 total

# GGML_BLAS=OFF
$ time llama-cli -m Mistral-Small-Instruct-2409-Q6_K_L.gguf -tb 16 -p "$TEXT" -n 1 -no-cnv
llama_perf_context_print: prompt eval time =    2248.24 ms /    76 tokens (   29.58 ms per token,    33.80 tokens per second)
40.49s user 0.54s system 1395% cpu 2.940 total

# GGML_BLAS=OFF (w/ multiprocessing to imitate OpenBLAS's behavior, ruling it out as a single cause)
$ time llama-cli -m Mistral-Small-Instruct-2409-Q6_K_L.gguf -tb 32 -p "$TEXT" -n 1 -no-cnv
llama_perf_context_print: prompt eval time =    2350.10 ms /    76 tokens (   30.92 ms per token,    32.34 tokens per second)
85.28s user 0.78s system 2784% cpu 3.091 total

txtsd commented on 2025-01-27 04:26 (UTC)

@cfillion I'm already maintaining several variants of this package. I don't think I'll add another variant unless it's popularly requested.

Thanks for the feedback though!

cfillion commented on 2025-01-26 23:07 (UTC) (edited on 2025-01-26 23:15 (UTC) by cfillion)

GGML_BLAS=ON with OpenBLAS is significantly slower at prompt processing on my CPU (9950X w/ n_threads = n_threads_batch = 16) than OFF. Could there be a variant of the package without it?

Also, isn't the kompute submodule unnecessary/unused unless GLML_KOMPUTE=ON is set? (which would be for a separate llama.cpp-kompute package?)

txtsd commented on 2024-10-27 12:19 (UTC)

@abitrolly Because of the maintenance overhead. It's easier to main as separate packages.

abitrolly commented on 2024-10-27 12:12 (UTC)

@txtid why one source package can not produce all these binaries? Why the binaries can not be renamed to avoid conflict?

txtsd commented on 2024-10-27 09:17 (UTC)

@abitrolly The split packages all conflict with each other. You can only have one installed at any point, so it doesn't make sense to build all just to get one. Additionally, the -hip and -cuda packages take very long to build.

abitrolly commented on 2024-10-27 09:14 (UTC)

I don't think anyone wants to install the 20GB+ dependencies and compile all variants just to get one part of the split package.

@txtsd actually it would be nice to get everything what is necessary to get the most optimized build for a system in one go.

txtsd commented on 2024-10-26 20:14 (UTC) (edited on 2024-12-06 14:14 (UTC) by txtsd)

Alternate versions

llama.cpp
llama.cpp-vulkan
llama.cpp-sycl-fp16
llama.cpp-sycl-fp32
llama.cpp-cuda
llama.cpp-cuda-f16
llama.cpp-hip

txtsd commented on 2024-10-26 15:25 (UTC)

I'm merging this package into llama.cpp since that's the upstream name, and . is allowed in Arch package names.

llama.cpp-* packages will be separate packages. I don't think anyone wants to install the 20GB+ dependencies and compile all variants just to get one part of the split package.