Package Details: nvhpc 24.3-1

Git Clone URL: https://aur.archlinux.org/nvhpc.git (read-only, click to copy)
Package Base: nvhpc
Description: NVIDIA HPC SDK
Upstream URL: https://gitlab.com/badwaik/archlinux/aur/nvhpc
Keywords: compiler cuda fortran pgi portland
Licenses: custom
Conflicts: pgi-compilers
Replaces: pgi-compilers
Submitter: a.kudelin
Maintainer: jayesh
Last Packager: jayesh
Votes: 14
Popularity: 0.003721
First Submitted: 2020-10-20 12:54 (UTC)
Last Updated: 2024-04-03 00:02 (UTC)

Dependencies (5)

Required by (0)

Sources (2)

Latest Comments

« First ‹ Previous 1 2 3 4 5 6 Next › Last »

ylee commented on 2022-12-22 01:38 (UTC) (edited on 2022-12-22 01:39 (UTC) by ylee)

@jayesh,

I think I've messed up with my environment variables. I had CUDA_HOME=/opt/cuda on my .zshrc, which I suspect makes the issue that unable to locate CUDA toolkits. Deleting my CUDA_HOME resolve the issue and doesn't need to have a separate NVHPC_CUDA_HOME.

For the gcc11 dependency issue (unsupported GNU version things, mentioned in the previous comment), I found that Arch's cuda community package resolved the issue by making symlinks. I've confirmed that similar approach could eliminate the "unsupported GNU version" warnings for nvcc in this package also:

ln -s /usr/bin/gcc-11 /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.8/bin/gcc
ln -s /usr/bin/g++-11 /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.8/bin/g++

This approach requires an additional dependency of gcc11 as the main version of gcc on Arch is now 12. Apparently, the upcoming newer version of CUDA, version 12, supports gcc12, but not sure how long it takes to bring the newer version of CUDA to nvhpc.

Still, I'm unsure how to resolve the nvlink warnings for nvfortran, which I also mentioned before. Maybe, this is also related to the gcc11 issue. It looks like the code is working well, so we could just ignore them for now.

aitzkora commented on 2022-12-21 15:58 (UTC)

@jayesh : thanks for taking time to post the question on nvidia forum and post a solution. Now it works 👍

jayesh commented on 2022-12-21 12:47 (UTC)

So, the module file is provided by Nvidia. Do you mean that or do you mean the nvhpc.sh which is provided by AUR? For the second one, I can make the change in AUR, for the first one, we will need to contact NVIDIA to make the change (we can temporarily patch AUR however).

Also, if it is the module file, would it possible to make a MWE which shows the issue? I can forward that issue to NVIDIA then. Thanks!

ylee commented on 2022-12-20 23:55 (UTC) (edited on 2022-12-20 23:56 (UTC) by ylee)

Hello @jayesh,

Thank you for updating. Now I can use mpirun without a problem, but I think the modulefile should set an environment variable NVHPC_CUDA_HOME to /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.8. Without this, nvhpc compilers can't locate the CUDA toolkits.

With the NVHPC_CUDA_HOME, I'm able to compile and run the CUDA program, but with some warnings. I guess this is because Arch's current gcc is version 12, which apparently is not supported by nvhpc. For example, nvcc complains like,

❯ nvcc hello.cu
In file included from /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.8/include/cuda_runtime.h:83,
                 from <command-line>:
/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.8/include/crt/host_config.h:132:2: error: #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  132 | #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
      |  ^~~~~

And nvfortran prints,

nvlink warning : Skipping incompatible '/usr/lib64/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../lib64/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib64/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../lib64/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib64/librt.a' when searching for -lrt
nvlink warning : Skipping incompatible '/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../lib64/librt.a' when searching for -lrt

Not sure if there's an elegant way to switch back and forth between gcc11 and gcc12.

jayesh commented on 2022-12-20 13:04 (UTC) (edited on 2022-12-20 13:08 (UTC) by jayesh)

Based on the discussions here: https://forums.developer.nvidia.com/t/mpi-fortran-on-archlinux/237491

I've now patched nvhpc (22.11-2 now) to use OpenMPI-4.0.x as the default MPI : https://gitlab.com/badwaik/archlinux/aur/nvhpc/-/commit/2ffaa1da97c479cd63d06d3310fd48be74c650e7

This should solves the problems for now till NVHPC shifts to using HPC-X as the default MPI. Please let me know if you encounter any problems with the new arrangements.

jayesh commented on 2022-12-19 12:54 (UTC) (edited on 2022-12-19 12:54 (UTC) by jayesh)

All the older versions are also making the program crash in the same way, so now I'm definitely sure that the error comes from some mismatch in the either libc or some similar libraries. So, I don't see any advantage in making the older versions available without more information.

jayesh commented on 2022-12-19 12:25 (UTC) (edited on 2022-12-19 12:54 (UTC) by jayesh)

Thank you for the comment. I am not an expert in Fortran, but I have managed the reproduce the issue with the official Nvidia tarball (not just with AUR), so I'm assuming something is incompatible with ArchLinux and Nvidia's NVHPC compiler itself. I've filed a discussion forum post to the same effect: https://forums.developer.nvidia.com/t/mpi-fortran-on-archlinux/237491

In the meantime, I'll be making the versions 22.7 and 22.9 available as AUR packages as well so that people can downgrade to them temporarily.

UPDATE: All the older versions are also making the program crash in the same way, so now I'm definitely sure that the error comes from some mismatch in the either libc or some similar libraries. So, I don't see any advantage in making the older versions available without more information.

ylee commented on 2022-12-19 05:36 (UTC)

@jayesh I have the same error as @aitzkora has, and the /usr/bin/mpirun also fails with the following error message:

❯ /usr/bin/mpirun -np 4 ./a.out
[clink:950989] *** Process received signal ***
[clink:950989] Signal: Segmentation fault (11)
[clink:950989] Signal code: Address not mapped (1)
[clink:950989] Failing at address: (nil)
[clink:950989] [ 0] /usr/lib/libc.so.6(+0x38a00)[0x7f4c7ac51a00]
[clink:950989] *** End of error message ***
[1]    950989 segmentation fault (core dumped)  /usr/bin/mpirun -np 4 ./a.out

It was tested with the same hello world program. Let me know if you have any suggestions.

aitzkora commented on 2022-12-16 10:18 (UTC)

thanks @jayesh for taking time. I will use the system-wide mpirun as a work-around but I was surprise that does not work with the mpirun provided by NVIDIA. Effectively, some hidden init process such as hydra could prevent mpirun programs to run correctly. Bests

jayesh commented on 2022-12-14 11:02 (UTC)

@aitzkora

I have seen that using native mpirun with /usr/bin/mpirun -n 3 ./main works perfectly for me. Can you confirm if it works for you as well? In that case, I'll say there's some issue in using the nvhpc's mpirun. I am suspecting that you need to use Hydra or similar runtime to use nvhpc's mpirun correctly.