Arch Linux User Repository

Search Criteria

Enter search criteria

Search by

Keywords

Out of Date

Sort by

Sort order

Per page

Git Clone URL:	https://aur.archlinux.org/python-flash-attn.git (read-only, click to copy)
Package Base:	python-flash-attn
Description:	Fast and memory-efficient exact attention
Upstream URL:	https://github.com/Dao-AILab/flash-attention
Licenses:	BSD-3-Clause
Provides:	python-flash-attention
Submitter:	Smoolak
Maintainer:	Smoolak
Last Packager:	Smoolak
Votes:	0
Popularity:	0.000000
First Submitted:	2025-12-11 01:40 (UTC)
Last Updated:	2025-12-11 01:40 (UTC)

You need swap. I have 96G of RAM and it also consumed all of it. I created a 256G swap file which peaked at 40-50G.

You can also save building all supported arches by adding:

export FLASH_ATTN_CUDA_ARCHS=x

Where x = (compute capability x 10) of your specific card. For example, the CC on a 4090 is 8.9 so 8.9x10 = 89.

export FLASH_ATTN_CUDA_ARCHS=89

I can't compile this package, it swallows all my 64GB of ram and crashes. I tried putting

MAX_JOBS=4

in the build() section but I did not made a difference.