Package Details: python-flash-attn 2.8.3-1

Git Clone URL: https://aur.archlinux.org/python-flash-attn.git (read-only, click to copy)
Package Base: python-flash-attn
Description: Fast and memory-efficient exact attention
Upstream URL: https://github.com/Dao-AILab/flash-attention
Licenses: BSD-3-Clause
Provides: python-flash-attention
Submitter: Smoolak
Maintainer: Smoolak
Last Packager: Smoolak
Votes: 0
Popularity: 0.000000
First Submitted: 2025-12-11 01:40 (UTC)
Last Updated: 2025-12-11 01:40 (UTC)

Latest Comments

graysky commented on 2026-03-28 15:48 (UTC)

You need swap. I have 96G of RAM and it also consumed all of it. I created a 256G swap file which peaked at 40-50G.

You can also save building all supported arches by adding:

export FLASH_ATTN_CUDA_ARCHS=x

Where x = (compute capability x 10) of your specific card. For example, the CC on a 4090 is 8.9 so 8.9x10 = 89.

export FLASH_ATTN_CUDA_ARCHS=89

See: https://developer.nvidia.com/cuda-gpus for the other architectures.

zwastik commented on 2026-02-14 04:13 (UTC)

I can't compile this package, it swallows all my 64GB of ram and crashes. I tried putting

MAX_JOBS=4 

in the build() section but I did not made a difference.