Package Details: python-vllm-rocm 0.20.0-1

Git Clone URL: https://aur.archlinux.org/python-vllm-rocm.git (read-only, click to copy)
Package Base: python-vllm-rocm
Description: high-throughput and memory-efficient inference and serving engine for LLMs (ROCm support)
Upstream URL: https://github.com/vllm-project/vllm
Licenses: Apache-2.0
Submitter: davispuh
Maintainer: davispuh
Last Packager: davispuh
Votes: 3
Popularity: 0.96
First Submitted: 2026-02-24 22:16 (UTC)
Last Updated: 2026-04-30 00:06 (UTC)

Latest Comments

davispuh commented on 2026-04-30 00:10 (UTC)

@Orion-zhen I don't think that's good idea. All ROCM packages will have such issue and it's different for every shell. That would only work for bash. It's better for your build script to do that.

@chiz that happens because intel-oneapi-mkl was updated so you also need to update python-pytorch-opt-rocm (or rebuild your PyTorch).

chiz commented on 2026-04-26 16:38 (UTC)

there is a error:

OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory

Orion-zhen commented on 2026-04-06 13:21 (UTC) (edited on 2026-04-06 13:24 (UTC) by Orion-zhen)

Please add this in build() in case that a newly installed rocm can't be recognized:

  if [[ -z "${ROCM_PATH}" ]]; then
    source /etc/profile
  fi

I failed to build this package in GitHub Action runner.

davispuh commented on 2026-04-05 15:41 (UTC)

python-cbor2 is indeed required, added it as dependency.

But others I don't have them installed and Qwen3.5-4B works fine without any issues. I guess it depends on what model you want to run.

Also I tested that python-compressed-tensors builds and installs with --nocheck (only check step fails for me). It's needed for gpt-oss-20b but I couldn't get it to run on my RX 7900 XTX (No MXFP4 MoE backend supports the deployment configuration).

bnjbvr commented on 2026-03-30 20:47 (UTC) (edited on 2026-03-30 20:48 (UTC) by bnjbvr)

Fwiw, I had to also install the following packages, so as to try to make it work:

  • python-cbor2
  • python-openai-harmony
  • python-model-hosting-container-standards
  • python-jmespath
  • python-compressed-tensors

Otherwise, the command vllm serve … wouldn't work. That being said, python-compressed-tensors didn't seem to install successfully, and the package is marked as orphaned, so I eventually had to drop the whole thing, unfortunately.

davispuh commented on 2026-03-26 20:03 (UTC) (edited on 2026-03-26 20:05 (UTC) by davispuh)

Did LLM wrote that...?

Anyway I fixed it a bit and pushed updated version.

The main issue was that rocminfo has another place where it shows gfx so for me it was gfx1100;gfx11 because of:

      Name:                    amdgcn-amd-amdhsa--gfx11-generic   

This updated/fixed version handles that correctly and:

  1. You can specify your GPUs in either PYTORCH_ROCM_ARCH or ROCM_ARCH

  2. You can not specify anything and auto-detection will build for your GPUs

  3. You can specify empty ROCM_ARCH= to build for all ROCm architectures/all GPUs

cmhacks commented on 2026-03-26 13:54 (UTC) (edited on 2026-03-26 13:58 (UTC) by cmhacks)

Hi davispuh, thank you so much for your quick response and for being open to this! Your willingness to collaborate and improve the package is truly appreciated — maintaining AUR packages takes a lot of effort and dedication, and it doesn't go unnoticed.

Here's a patch that implements both the ROCM_ARCH environment variable support and auto-detection as you suggested:

diff --git a/PKGBUILD b/PKGBUILD
index ee537a9..8cd1ac2 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -90,7 +90,19 @@ build() {
   # Limit the number of parallel jobs to avoid OOM
   export MAX_JOBS=$_jobs
   export VLLM_TARGET_DEVICE=rocm
-  export PYTORCH_ROCM_ARCH="gfx906;gfx908;gfx90a;gfx942;gfx1100;gfx1101;gfx1200;gfx1201"
+  if [[ -n "$ROCM_ARCH" ]]; then
+    export PYTORCH_ROCM_ARCH="$ROCM_ARCH"
+  elif command -v rocminfo &>/dev/null; then
+    _detected_archs=$(rocminfo | grep -oP 'gfx\d+' | sort -u | tr '\n' ';' | sed 's/;$//')
+    if [[ -n "$_detected_archs" ]]; then
+      export PYTORCH_ROCM_ARCH="$_detected_archs"
+    else
+      export PYTORCH_ROCM_ARCH="gfx906;gfx908;gfx90a;gfx942;gfx1100;gfx1101;gfx1200;gfx1201"
+    fi
+  else
+    export PYTORCH_ROCM_ARCH="gfx906;gfx908;gfx90a;gfx942;gfx1100;gfx1101;gfx1200;gfx1201"
+  fi
+  echo "Building for PYTORCH_ROCM_ARCH: $PYTORCH_ROCM_ARCH"
   # Build
   python setup.py bdist_wheel --dist-dir=dist

What it does:

  1. If the ROCM_ARCH environment variable is set, it uses that value directly (e.g., ROCM_ARCH="gfx1201" makepkg -si).
  2. If ROCM_ARCH is not set, it auto-detects the installed GPU architectures using rocminfo and builds only for those.
  3. If neither works (no env var, no rocminfo, or detection fails), it falls back to the current full list of architectures.

This is fully backward-compatible. Existing users who don't set ROCM_ARCH and don't have rocminfo available at build time will get the same behavior as before. For everyone else, it can reduce build times by up to 8x.

Thank you again for all the time and effort you put into maintaining this package — it makes ROCm + vLLM accessible to the entire Arch community. Looking forward to your feedback!

davispuh commented on 2026-03-26 12:29 (UTC) (edited on 2026-03-26 12:30 (UTC) by davispuh)

Right now you can simply edit PKGBUILD and set PYTORCH_ROCM_ARCH for your GPUs.

Also for me I use ccache and with 24-core CPU it builds quite fast.

But yes, I think it could be implemented that if ROCM_ARCH env is set then use it for PYTORCH_ROCM_ARCH.

And if it's not set then yeah auto detection could be nice.

But my TODO list is already so long that I won't have time to work on this in nearest future so if you send a PKGBUILD patch I can apply that.

cmhacks commented on 2026-03-26 10:52 (UTC)

Hi, thanks for maintaining this package!

Currently, PYTORCH_ROCM_ARCH is set to compile for all 8 GPU architectures (gfx906, gfx908, gfx90a, gfx942, gfx1100, gfx1101, gfx1200, gfx1201), which results in extremely long build times — often over an hour — since every HIP kernel is compiled once per target.

Most users only have one GPU and only need a single architecture. Would it be possible to either:

Split into per-architecture packages (e.g., python-vllm-rocm-gfx906, python-vllm-rocm-gfx1201, etc.) so users can install only the one matching their hardware, or Auto-detect the system GPU at build time using rocminfo or AMDGPU_TARGETS to compile only for the installed hardware?

This would drastically reduce build times (up to ~8x faster) and resource usage for end users.

Thanks for considering this!