diff options
author | Mark Peschel | 2023-08-18 11:21:52 -0400 |
---|---|---|
committer | GitHub | 2023-08-18 11:21:52 -0400 |
commit | 61390636986f3a38ccb7a98e3f4573da77554c4f (patch) | |
tree | 3d453d47a88182d2d6cd4b446c39295b2e4577d2 | |
parent | 6a3bf90d8e6a9e2ee25bbaf622f559ba2f4960e8 (diff) | |
download | aur-61390636986f3a38ccb7a98e3f4573da77554c4f.tar.gz |
README 2.12 -> 2.13. Clarity, typos, and safer instructions.
Replace 2.12 with 2.13 and update links appropriately. Add link to Google's tensorflow/tensorflow. Move "breadcrumbs" section to "how to fix this pkgbuild" section since that's when people will need it.
git is effectively a build dependency. Suggest --syncdeps for n00bs. Remove suggestion to update "_known_good_commit" since that might break our patches.
Move (borked) 2.14 instructions to end and add warnings, libxcrypt-compat dependency.
Replace dd commands with tee cuz I guess it's cleaner.
Misc. typos and clarity fixes.
-rw-r--r-- | README.md | 83 |
1 files changed, 52 insertions, 31 deletions
diff --git a/README.md b/README.md index 5511cff7693e..2c96d606670c 100644 --- a/README.md +++ b/README.md @@ -1,40 +1,20 @@ # tensorflow-amd-git -This repository is the PKGBUILD, patches, and test scripts for building the `tensorflow-amd-git` and `python-tensorflow-amd-git` packages on Arch Linux. -Unlike the `tensorflow-rocm` series of packages, which pull from the official tensorflow/tensorflow repo, +This repository contains the PKGBUILD, patches, and test scripts for building the `tensorflow-amd-git` and `python-tensorflow-amd-git` [AUR packages](https://aur.archlinux.org/packages/tensorflow-amd-git) for Arch Linux. +Unlike the `tensorflow-rocm` series of packages, which pull from [Google's tensorflow/tensorflow repo](https://github.com/tensorflow/tensorflow/), this package pulls directly from [AMD's upstream fork.](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/) -Specifically, as of 2023-07-04 it draws on the "r2.12-rocm-enhanced" branch on the commit of [the last successful build.](http://ml-ci.amd.com:21096/job/tensorflow/job/release-rocmfork-r212-rocm-enhanced/job/release-build-whl/lastSuccessfulBuild/) -If that link is dead, you may be able to find an equivalent by following these breadcrumbs: -* [tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/) -> -* [Community Supported TensorFlow Builds](https://github.com/tensorflow/build#community-supported-tensorflow-builds) -> -* [Linux AMD ROCm GPU Stable : TF 2.x Build Status Release 2.12](http://ml-ci.amd.com:21096/job/tensorflow/job/nightly-rocmfork-develop-upstream/job/nightly-build-whl/lastSuccessfulBuild/) +Specifically, as of 2023-08-18 it draws on the "r2.13-rocm-enhanced" branch on the commit of [the last successful build.](http://ml-ci.amd.com:21096/job/tensorflow/job/release-rocmfork-r213-rocm-enhanced/job/release-build-whl/lastSuccessfulBuild/) ## Build instructions ```sh -sudo pacman -S base-devel +sudo pacman -S base-devel git git clone https://aur.archlinux.org/packages/tensorflow-amd-git cd tensorflow-amd-git -# At this point, you may want to open PKGBUILD and udpate _known_good_commit. -makepkg +makepkg --syncdeps sudo pacman -U tensorflow*.zst python-tensorflow*.zst ``` -## Tensorflow 2.14 -You can build TensorFlow 2.14 from the unstable "develop-upstream" branch with two changes: update the "last good commit" hash, and change the branch of the git source link. -You can find a good commit hash from [AMD's Jenkins CI server.](http://ml-ci.amd.com:21096/job/tensorflow/job/nightly-rocmfork-develop-upstream/job/nightly-build-whl/lastSuccessfulBuild/) -Then update the PKBUILD like: -```sh -... -#_known_good_commit=de8086e14ae3152906e1137c212d2f7bb8ea463a - _known_good_commit=1d35245a829159ef76b3a403d704a78dcd672bbf -... -#source=('tensorflow-upstream-rocm::git+https://github.com/ROCmSoftwarePlatform/tensorflow-upstream#branch=r2.12-rocm-enhanced' - source=('tensorflow-upstream-rocm::git+https://github.com/ROCmSoftwarePlatform/tensorflow-upstream#branch=develop-upstream' -... -``` -I have not tested this. - ## Building for other machines -This package uses `-march=native` to enable CPU specific optimizations. If you are building for other people to use do: +This package uses `-march=native` to enable CPU specific optimizations. If you are building for other people, do: ```sh sed "s/-march=native/-march=x86-64/" -i PKGBUILD ``` @@ -45,12 +25,12 @@ If you are building for a graphics card not installed on your machine, such as i you must create the file `/opt/rocm/bin/target.lst` and add each gfx architecture you want to build for. Something like: ```sh -echo -e "gfx900\ngfx904\ngfx906\ngfx908\ngfx90a\ngfx1030" | sudo dd of=/opt/rocm/bin/target.lst +echo -e "gfx900\ngfx904\ngfx906\ngfx908\ngfx90a\ngfx1030" | sudo tee /opt/rocm/bin/target.lst ``` As of 2023-07-05, I have confirmed that RX 580 (gfx803) is NOT supported. I have confirmed that you can use RX 6750 XT (gfx1031) by setting `target.lst` before building: ```sh -echo -e "gfx1030" | sudo dd of=/opt/rocm/bin/target.lst +echo -e "gfx1030" | sudo tee /opt/rocm/bin/target.lst ``` and by setting this environment variable before running: ```sh @@ -59,10 +39,51 @@ export HSA_OVERRIDE_GFX_VERSION=10.3.0 You can find your gfx by running `/opt/rocm/bin/rocminfo | grep gfx` after installing `rocminfo`. ## Fixing this PKGBUILD when it inevitably breaks -To learn what the build environment is supposed to look like, see the official Dockerfile for building tensorflow: [ROCmSoftwarePlatform/tensorflow-upstream in tensorflow/tools/ci_build/Dockerfile.rocm](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/tensorflow/tools/ci_build/Dockerfile.rocm). That's where I found out about `target.lst` among other things; hopefully it helps. +To learn what the build environment is supposed to look like, see the official Dockerfile for building tensorflow: [tensorflow/tools/ci_build/Dockerfile.rocm in ROCmSoftwarePlatform/tensorflow-upstream](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/tensorflow/tools/ci_build/Dockerfile.rocm). That's where I found out about `target.lst` among other things; hopefully it helps. + +If you are completely unable to build the repository, you may have luck with installing AMD's pre-built python wheels. pip seems to think tensorflow-rocm is only supported up to python 3.10, so you will have to download the wheel [directly from AMD's website](http://ml-ci.amd.com:21096/job/tensorflow/job/release-rocmfork-r213-rocm-enhanced/job/release-build-whl/lastSuccessfulBuild/) and install it: `pip install tensorflow_rocm-2.13*cp311-cp311-manylinux2014_x86_64.whl` When I tried it, it did seem to run on GPU and not CPU. + +If AMD's website no longer hosts the python wheels, you may find an equivalent by following these breadcrumbs: +* [tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/) -> +* [Community Supported TensorFlow Builds](https://github.com/tensorflow/build#community-supported-tensorflow-builds) -> +* [Linux AMD ROCm GPU Stable : TF 2.x Build Status Release 2.12](http://ml-ci.amd.com:21096/job/tensorflow/job/release-rocmfork-r212-rocm-enhanced/job/release-build-whl/lastSuccessfulBuild/) + +Lastly, it appears the official way to use TensorFlow on ROCm is by a docker image. You may get random permission errors while downloading because it takes forever to download, and your system clock may desynchronize with the server. To fix the permissions errors, `sudo systemctl start systemd-timesyncd`. Otherwise, follow [the official instructions here](https://hub.docker.com/r/rocm/tensorflow) or [my longer tutorial here.](https://github.com/mpeschel10/test-tensorflow-rocm) -If you are unable to build this, you may have luck with installing the python wheels. pip seems to think tensorflow-rocm is [only supported up to python 3.10](https://pypi.org/project/tensorflow-rocm/), so you will have to download the wheel [directly from AMD's website](http://ml-ci.amd.com:21096/job/tensorflow/job/release-rocmfork-r212-rocm-enhanced/job/release-build-whl/lastSuccessfulBuild/) and install it: `pip install tensorflow_rocm-2.12*cp311-cp311-manylinux2014_x86_64.whl` +## Tensorflow 2.14 +This PKGBUILD is not able to build Tensorflow 2.14 as a drop-in replacement. I suggest three changes to get you started; after that, you're on your own. + + 1. Update the "last good commit" hash. +You can find a good commit hash from [AMD's Jenkins CI server.](http://ml-ci.amd.com:21096/job/tensorflow/job/nightly-rocmfork-develop-upstream/job/nightly-build-whl/lastSuccessfulBuild/) +Then update the PKBUILD like: + +```sh +... +#_known_good_commit=c19cfffe476ec3338fb24ed3ce2baabfc558076e + _known_good_commit=81b90075b3309e3c538915d212f0149daf9cd2a6 +... +``` + + 2. Change the branch of the git source link. +```sh +... +#source=('tensorflow-upstream-rocm::git+https://github.com/ROCmSoftwarePlatform/tensorflow-upstream#branch=r2.13-rocm-enhanced' + source=('tensorflow-upstream-rocm::git+https://github.com/ROCmSoftwarePlatform/tensorflow-upstream#branch=develop-upstream' +... +``` + + 3. Add the `libxcrypt-compat` dependency. +```sh +... +#makedepends=('python-numpy' 'git' 'python-wheel' \ + makedepends=('python-numpy' 'git' 'python-wheel' 'libxcrypt-compat'\ +... +``` +Then +```sh +pacman -S libxcrypt-compat +``` -Lastly, it appears the officially intended way to use TensorFlow on ROCm is by a docker image. It takes forever to download, so your system clock may desynchronize with the server, so you may get random permission errors while downloading. If you do, `sudo systemctl start systemd-timesyncd`. Otherwise, follow [the official instructions here](https://hub.docker.com/r/rocm/tensorflow) or [my longer tutorial here.](https://github.com/mpeschel10/test-tensorflow-rocm) +Good luck. ### Pull requests welcome. |