Having accidentally discovered this project and seen the results of comparing different optimizations, I wanted to write a comment. These are exclusively my observations, perhaps they will be useful. I am also a fan of forcing compilation flags ;)
I played with gzip with different build flags on the same test file and the difference in speed was several percents. My processor can only do SSE4.1 , but according to my friend, there is a difference between SSE3 and AVX in this experimet, although small and in favor of AVX.
When building the Linux kernel, attempts to set the desired compiler flags from the outside fail. The reason for this is that the build scripts reassign them during operation. It is possible that same applies to Chromium too. The only successful example when this works for the Linux kernel is when I replace all -O2 with "-O2 and desired flags" in all makeconfigs. When I force the use of SIMD instructions in this way, the kernel becomes inoperative. There, in one of the kernel build scripts where simds are specifically disabled for this reason, and contain a corresponding warning.
By themselves, -msse3 and etc. have little effect. In my experiments: unrolling loops may speed up only some specific loops and slows down all other code; simplifying loops, moving conditions outside the loop, reducing the use of stack and registers, ftracer, prefetch, autovectorization and autosimdization give the most performance gain. The corresponding gcc flags are in its manual. vlc, mpv, inkscape and even ffmpeg with its assembler code fragments works a little bit faster with this.. well, on my core2duo like cpu and armv7 linux-box.
Pinned Comments
xiota commented on 2024-02-09 01:30 (UTC) (edited on 2024-02-10 05:08 (UTC) by xiota)
This package now uses the SSE3 version because benchmarks on my computers show no performance benefit from using the AVX/AVX2 versions.
xiota commented on 2024-01-18 04:21 (UTC) (edited on 2024-02-21 09:22 (UTC) by xiota)
I made alternate PKGBUILDs: SSE3, AVX, AVX2. (
makepkg -p PKGBUILD.avx2
)However, there is no point making dedicated packages for each because there is no performance benefit from using different versions.
The "normal" version ("AVX") does reference avx2 in the config, but the exact compiler flags are unspecified. All 64-bit versions contain AVX and AVX2 instructions. So too do the Chromium binaries from the official Arch repos. Chromium-based browsers probably detect processor capabilities at runtime, so attempting to target specific instruction sets would not be expected to significantly improve performance.
Here are my results from browserbench.org speedometer 2.1, using fresh profiles:
SSE3
: 83.8±1.0AVX
: 83.8±1.0 (not a mistake, SSE3 and AVX had identical results)AVX2
: 83.6±1.9xiota commented on 2023-10-10 04:01 (UTC) (edited on 2024-02-10 05:06 (UTC) by xiota)
This is an autoupdating package that attempts to download and package the latest version available.
_autoupdate=false
Avoid flagging and commenting at the same time for the same issue.