Package Details: nvidia-container-toolkit 1.13.5-1

Git Clone URL: https://aur.archlinux.org/nvidia-container-toolkit.git (read-only, click to copy)
Package Base: nvidia-container-toolkit
Description: NVIDIA container runtime toolkit
Upstream URL: https://github.com/NVIDIA/nvidia-container-toolkit
Keywords: docker nvidia nvidia-docker runc
Licenses: Apache
Conflicts: nvidia-container-runtime, nvidia-container-runtime-hook
Replaces: nvidia-container-runtime-hook
Submitter: jshap
Maintainer: kiendang
Last Packager: kiendang
Votes: 40
Popularity: 4.02
First Submitted: 2019-07-28 01:19 (UTC)
Last Updated: 2023-07-19 14:35 (UTC)

Pinned Comments

jshap commented on 2019-07-28 01:43 (UTC) (edited on 2019-07-29 22:32 (UTC) by jshap)

see the release notes here for why this exists: https://github.com/NVIDIA/nvidia-container-runtime/releases/tag/3.1.0

tl;dr: nvidia-docker is deprecated because docker now has native gpu support, which this package is required to use. :)

Latest Comments

1 2 3 4 5 6 .. 8 Next › Last »

r0l1 commented on 2023-11-21 16:15 (UTC) (edited on 2023-11-21 16:17 (UTC) by r0l1)

Here is a shorter up-to-date PKGBUILD. A special patch for go 1.21 is required. Otherwise there is a segfault. Please also keep in mind, that this PKGBUILD is adapted to fit to my custom libnvidia-container PKGBUILD file: https://aur.archlinux.org/pkgbase/libnvidia-container#comment-944659

PKGBUILD
pkgname=nvidia-container-toolkit
pkgver=1.14.3
pkgrel=1
pkgdesc='NVIDIA container runtime toolkit'
arch=('x86_64')
url='https://github.com/NVIDIA/nvidia-container-toolkit'
license=('Apache')

makedepends=('go')
depends=('libnvidia-container>=1.14.0')
options=(!lto)
backup=('etc/nvidia-container-runtime/config.toml')

install="nvidia-container-toolkit.install"
source=("${pkgname}-${pkgver}-${pkgrel}.tar.gz"::"${url}/archive/v${pkgver}.tar.gz"
        'go-nvml-79.patch')
sha256sums=('a8dbb6a8d45fe8cb2ecbb7b5d49c332e0e7270e8988e57d2a8587ab1e004f6dd'
            '769eba6d4f1342904a8aa058f2f2f3f06f81178ebdaab0e95125903a6da75b4d')

build() {
    cd "${pkgname}-${pkgver}"

    # Patch go-nvml to work with go 1.21. Otherwise it will segfault during docker run.
    # https://github.com/NVIDIA/go-nvml/pull/79
    # https://github.com/NVIDIA/go-nvml/issues/36
    patch -d "vendor/github.com/NVIDIA/go-nvml" -p1 < "${srcdir}/go-nvml-79.patch"

    make cmds
}

package() {
    cd "${pkgname}-${pkgver}"

    # Install binaries.
    install -D -m755 nvidia-container-runtime "${pkgdir}/usr/bin/nvidia-container-runtime"
    install -D -m755 nvidia-container-runtime.cdi "${pkgdir}/usr/bin/nvidia-container-runtime.cdi"
    install -D -m755 nvidia-container-runtime-hook "${pkgdir}/usr/bin/nvidia-container-runtime-hook"
    install -D -m755 nvidia-container-runtime.legacy "${pkgdir}/usr/bin/nvidia-container-runtime.legacy"
    install -D -m755 nvidia-ctk "${pkgdir}/usr/bin/nvidia-ctk"

    # Symlink hook.
    ln -sf "nvidia-container-runtime-hook" "${pkgdir}/usr/bin/nvidia-container-toolkit"

    # Create config.
    mkdir -p "${pkgdir}/etc/nvidia-container-runtime"
    ./nvidia-ctk --quiet config --config-file="${pkgdir}/etc/nvidia-container-runtime/config.toml" --in-place
}

nvidia-container-toolkit.install
post_install() {
    echo "Patching '/etc/docker/daemon.json' to include the nvidia-container-runtime"
    /usr/bin/nvidia-ctk runtime configure --runtime=docker
}

post_remove() {
    echo "IMPORTANT: manually remove the nvidia-container-runtime from '/etc/docker/daemon.json'"
}
go-nvml-79.patch
From b8d34ba5dc71c7b5a261bbdfdec63fe337fac2a5 Mon Sep 17 00:00:00 2001
From: braydonk <braydonk@google.com>
Date: Tue, 3 Oct 2023 03:32:50 +0000
Subject: [PATCH] gen/nvml: add --export-dynamic linker flag

Signed-off-by: braydonk <braydonk@google.com>
---
 pkg/nvml/const.go | 2 +-
 pkg/nvml/nvml.go  | 2 +-
 32files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pkg/nvml/const.go b/pkg/nvml/const.go
index 1a0efaf..a9a3a56 100644
--- a/pkg/nvml/const.go
+++ b/pkg/nvml/const.go
@@ -18,7 +18,7 @@
 package nvml

 /*
-#cgo LDFLAGS: -Wl,--unresolved-symbols=ignore-in-object-files
+#cgo LDFLAGS: -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files
 #cgo CFLAGS: -DNVML_NO_UNVERSIONED_FUNC_DEFS=1
 #include "nvml.h"
 #include <stdlib.h>
diff --git a/pkg/nvml/nvml.go b/pkg/nvml/nvml.go
index f63dfe8..bf2d6fc 100644
--- a/pkg/nvml/nvml.go
+++ b/pkg/nvml/nvml.go
@@ -18,7 +18,7 @@
 package nvml

 /*
-#cgo LDFLAGS: -Wl,--unresolved-symbols=ignore-in-object-files
+#cgo LDFLAGS: -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files
 #cgo CFLAGS: -DNVML_NO_UNVERSIONED_FUNC_DEFS=1
 #include "nvml.h"
 #include <stdlib.h>

ghthor commented on 2023-11-09 19:35 (UTC) (edited on 2023-11-12 18:38 (UTC) by ghthor)

@jshap

I removed the following from both

nvidia-container-toolkit

nvidia-container-runtime

diff --git a/PKGBUILD b/PKGBUILD
index 332dd59..0c7b60d 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -39,7 +39,6 @@ build() {
   GOOS=linux \
   go build -v \
     -modcacherw \
-    -buildmode=pie \
     -gcflags "all=-trimpath=${PWD}" \
     -asmflags "all=-trimpath=${PWD}" \
     -o bin \

And now everything is working.

My assumption is that libnvidia cannot handle being loaded into a PIE executable; BUT I don't really know much about this area.

EDIT:

I think to keep buildmode=pie in the these 2 go packages, we'll need to also compile the libnvidia-container-tools[1] using -fpic[2]. And right after that in the documentation we see something talking about the PLT from the following error.

❯ sudo nvidia-ctk runtime configure --runtime=docker
nvidia-ctk: error while loading shared libraries: unexpected PLT reloc type 0x00

EDIT 2:

It appears that libnvidia-container.so IS being built with -fpic[4]. So I'm really unsure how to resolve this issue while keeping -buildmode=pie in nvidia-container-{runtime,toolkit}.

EDIT 3:

The only binary that is dynamically linked to the libnvidia-container.so is nvidia-container-cli, which is installed from the libnvidia-container-toolkit AUR package.

❯ for c in nvidia-container-runtime nvidia-container-toolkit nvidia-container-cli nvidia-container-runtime-hook; do printf '%s\n' $c; ldd $(command -v $c) | grep libnvidia; done
nvidia-container-runtime
nvidia-container-toolkit
nvidia-container-cli
    libnvidia-container.so.1 => /usr/bin/../lib/libnvidia-container.so.1 (0x00007f3690811000)
nvidia-container-runtime-hook

❯ pacman -Qo $(command -v nvidia-container-cli)
/usr/bin/nvidia-container-cli is owned by libnvidia-container-tools 1.13.5-1

jshap commented on 2023-10-04 17:05 (UTC)

Apologies for the delays on keeping this updated. I have been talking to the team that develops these tools upstream to try and best plan for the future of them, but it shouldn't be much longer.

xiota commented on 2023-09-20 09:04 (UTC) (edited on 2023-11-20 19:53 (UTC) by xiota)

Please remove replaces directive, in accordance with AUR submission guidelines.

Obseer commented on 2023-09-20 03:08 (UTC) (edited on 2023-09-20 03:09 (UTC) by Obseer)

@GeorgeRaven After doing that, it works with podman, but not with Docker. How do you make it work with Docker?

$ sudo podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1060 3GB

$ docker run --gpus all nvidia/cuda:12.1.1-runtime-ubuntu22.04 nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.
ERRO[0001] error waiting for container:

GeorgeRaven commented on 2023-09-06 08:34 (UTC) (edited on 2023-09-06 08:36 (UTC) by GeorgeRaven)

This is just a note to anyone who is up-to-date and is trying to use the NCT with CDI to pass GPUS via config in /etc/cdi e.g. for pytorch. When trying to use nvidia-ctk resulting in nvidia-ctk: error while loading shared libraries: unexpected PLT reloc type 0x00

You may want to see current discussion at the following, with temporary binary PKGBUILD: https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/issues/17

So that you can properly generate your CDI config:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

After this I can now run rootless podman gpu containers:

> podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1080 Ti

and

> podman run --rm --device nvidia.com/gpu=0 --userns keep-id -it docker.io/pytorch/pytorch python -c "import torch; print(torch.cuda.get_device_name(0))"
NVIDIA GeForce GTX 1080 Ti

jshap commented on 2023-07-24 14:57 (UTC) (edited on 2023-07-24 14:58 (UTC) by jshap)

@bitflipper First, that is a build error for libnvidia-container, not for this package. Second, manjaro is not supported on the aur, only archlinux is. I do not know what the output of uname -m produces on your machine, but if it does not match one of x86_64, ppc64le, or aarch64 then it will not be supported by this application as a requirement of the build system from https://github.com/NVIDIA/libnvidia-container, not from this pkgbuild.

bitflipper commented on 2023-07-21 07:43 (UTC)

I'm attempting to install nvidia-container-toolkit via paru, but receiving the following error:

==> Starting prepare()...
patching file Makefile
patching file mk/common.mk
Hunk #1 succeeded at 27 (offset 1 line).
patching file mk/elftoolchain.mk
Hunk #1 succeeded at 42 (offset 1 line).
patching file deps/src/elftoolchain-0.7.1/mk/elftoolchain.lib.mk
patching file modprobe-utils/nvidia-modprobe-utils.c
patching file modprobe-utils/nvidia-modprobe-utils.h
==> Sources are ready.
libnvidia-container-1.13.5-1 (libnvidia-container libnvidia-container-tools): parsing pkg list...
==> Making package: libnvidia-container 1.13.5-1 (Fri 21 Jul 2023 12:05:37 AM PDT)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> WARNING: Using existing $srcdir/ tree
==> Removing existing $pkgdir/ directory...
==> Starting build()...
/home/redacted/.cache/paru/clone/libnvidia-container/src/libnvidia-container-1.13.5/mk/common.mk:55: *** Unsupported architecture.  Stop.
==> ERROR: A failure occurred in build().
    Aborting...
error: failed to build 'libnvidia-container-1.13.5-1 (libnvidia-container libnvidia-container-tools)': 
error: can't build nvidia-container-toolkit-1.13.5-1, deps not satisfied: libnvidia-container-tools>=1.9.0
error: packages failed to build: libnvidia-container-1.13.5-1 (libnvidia-container libnvidia-container-tools)  nvidia-container-toolkit-1.13.5-1

This is really odd, I'm running Manjaro and my pc is x86_64. Not really sure what else to try here. Anybody know what's going on?

kiendang commented on 2023-07-19 14:24 (UTC)

@Sparticuz does this only happen with the latest version (1.13.4)? If so that should not be the problem with this PKGBUILD since we didn't modify the build process at all (and haven't done so for a very long time).

Sparticuz commented on 2023-07-17 19:18 (UTC)

I couldn't install the latest version without upgrading my version of go to 1.20. I had gcc-go installed and had to replace it with just regular go.