python-onnxruntime: re-enable LTO

As mentioned in [1], LTO works again. Also switches to the new way to configure parallelism of nvcc. Upstream introduces a similar mechanism in onnxruntime 1.10 [2], and my original approach (added in [3]) is always overridden. [1] https://aur.archlinux.org/packages/python-onnxruntime#comment-886838 [2] https://github.com/microsoft/onnxruntime/pull/8974 [3] https://github.com/archlinuxcn/repo/commit/99c193b303a416811c9cdac6b12c30804ad5acb6
author: Chih-Hsuan Yen 2022-10-29 00:19:50 +0800
committer: Chih-Hsuan Yen 2022-10-29 00:19:50 +0800
commit: 9c392fb542979981fe0026e0fe3cc361a5f00a36 (patch)
tree: 7568dee7109670c8408ab190a23e5cedb92b13d2
parent: 58086ca4650eddd8024f3ad58e7b996160c71bda (diff)
download: aur-9c392fb542979981fe0026e0fe3cc361a5f00a36.tar.gz
2 files changed, 1 insertions, 4 deletions
diff --git a/.SRCINFO b/.SRCINFO
index a7bb02693ce1..9769d708c48a 100644
--- a/.SRCINFO
+++ b/.SRCINFO
@@ -29,7 +29,6 @@ pkgbase = python-onnxruntime
 	depends = re2
 	depends = openmpi
 	depends = libprotobuf-lite.so
-	options = !lto
 	source = git+https://github.com/microsoft/onnxruntime#tag=v1.13.1
 	source = git+https://github.com/onnx/onnx.git
 	source = git+https://github.com/dcleblanc/SafeInt.git
diff --git a/PKGBUILD b/PKGBUILD
index d9fb7fc19a30..05c813211e0d 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -37,8 +37,6 @@ sha512sums=('SKIP'
             '8f0bd7ae59f86f002c88368a8c2852b9613363771aae61f91a90bfc13dcd3173e43d7988a59ccef86657cf6abfcc53837bbf445c216a7994a765a7e0770d0f5f'
             '7d55b0d4232183a81c20a5049f259872150536eed799d81a15e7f10b5c8b5279b443ba96d7b97c0e4338e95fc18c9d6f088e348fc7002256ee7170d25b27d80d'
             'ab48d27be98a88d3c361e1d0aac3b1e078096c0902ba7a543261a1c24faed0f1f44947a1b7ea1f264434cd2199b9d563d2447c14b6afbdf9900e68a65f7d2619')
-# CUDA seems not working with LTO
-options+=('!lto')
 
 if [[ $_ENABLE_CUDA = 1 ]]; then
   pkgname+=(onnxruntime-cuda)
@@ -103,7 +101,6 @@ build() {
     #    total processes may be much larger than the number of cores - let
     #    the scheduler handle it.
     cmake_args+=(
-      -DCMAKE_CUDA_FLAGS="-t0"
       -DCMAKE_CUDA_ARCHITECTURES="$_CUDA_ARCHITECTURES"
       -DCMAKE_CUDA_STANDARD_REQUIRED=ON
       -DCMAKE_CXX_STANDARD_REQUIRED=ON
@@ -112,6 +109,7 @@ build() {
       -DCMAKE_CUDA_COMPILER:PATH=/opt/cuda/bin/nvcc
       -Donnxruntime_CUDNN_HOME=/usr
       -Donnxruntime_USE_NCCL=ON
+      -Donnxruntime_NVCC_THREADS=0
     )
   fi
author	Chih-Hsuan Yen	2022-10-29 00:19:50 +0800
committer	Chih-Hsuan Yen	2022-10-29 00:19:50 +0800
commit	9c392fb542979981fe0026e0fe3cc361a5f00a36 (patch)
tree	7568dee7109670c8408ab190a23e5cedb92b13d2
parent	58086ca4650eddd8024f3ad58e7b996160c71bda (diff)
download	aur-9c392fb542979981fe0026e0fe3cc361a5f00a36.tar.gz