Ways to isolate GPU cards in Ubuntu

2025-09-04 #GPU #AI #H100 #k8s #Infra

To make sure OS user is assigned specific GPU or GPUs, which makes sure isolation of GPU resources, there are couple ways to do it.

TLDR;

Ways in short:

Quick User-Specific GPU Assignment (via CUDA_VISIBLE_DEVICES in system environment)
System-Level Enforcement with Cgroups + systemd
Best practice in a bigger cluster, slurm or k8s resource scheduler

Quick User-Specific GPU Assignment

Given there are 0-7 GPUs by index, here is the simplest method if you just want a user to see only GPU 0 or 1: Edit the user’s shell configuration (e.g., ~/.bashrc or ~/.bash_profile) and add the following line:

export CUDA_VISIBLE_DEVICES=0,1,6

⚠️ Note: This only hides GPUs from CUDA applications. If another user also sets CUDA_VISIBLE_DEVICES, there’s no hard isolation—you’ll need system-level enforcement if strict exclusivity is required.

Whole-GPU exclusivity with systemd + cgroups

For stricter guarantees, you can enforce GPU exclusivity at the system level, self-contained Bash script that sets up (or removes) systemd+cgroup v2 device isolation so one user can access only the GPUs you specify (e.g., 0,1). It writes a user@.service drop-in with DevicePolicy=closed and DeviceAllow=/dev/nvidiaX for the selected GPUs, plus the NVIDIA control devices. It also has a verify command and an optional flag to add CUDA_VISIBLE_DEVICES to the user’s shell.

shell script

How to use

Assign GPUs 0 and 1 to user alice and set CUDA env for convenience:
```
sudo ./gpu-assign.sh apply -u alice -g 0,1 --set-env
```
Check the applied policy and what nvidia-smi shows for alice:
```
sudo ./gpu-assign.sh verify -u alice
```
Remove the restriction for alice:
```
sudo ./gpu-assign.sh remove -u alice
```

Slurm/K8s is a long story

TBD

Prev Next