Cookie Consent by Free Privacy Policy Generator Monitor GPU utilization with nvidia-smi | Igor Moiseev
Software

Monitor GPU utilization with nvidia-smi

When a training run is GPU-bound, the bottleneck is rarely the model — it's the data pipeline starving the device. nvidia-smi, with the right flags (--query-gpu, --format=csv, -l 1), turns into a live dashboard for utilisation, memory, power draw and temperature, parseable from any shell loop.

By Igor Moiseev · · 4 min read ·

When training you’d love to know how efficiently GPU is utilized. Nvidia provides a tool nvidia-smi with a driver.

Just invoking it without any parameters it gives you a matrix with basic GPU parameters

$ nvidia-smi
Fri Dec  2 23:13:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 20%   50C    P5    25W / 250W |    744MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4804      G   /usr/lib/xorg/Xorg                292MiB |
|    0   N/A  N/A      4918      G   /usr/bin/gnome-shell              108MiB |
|    0   N/A  N/A     10549      G   ...390539104842029425,131072      340MiB |
+-----------------------------------------------------------------------------+

but how to monitor continuously the GPU usage, we have to use keys

$ nvidia-smi dmon -s pucvmet
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk pviol tviol    fb  bar1 sbecc dbecc   pci rxpci txpci
# Idx     W     C     C     %     %     %     %   MHz   MHz     %  bool    MB    MB  errs  errs  errs  MB/s  MB/s
    0    24    46     -     2     4     0     0   810  1151     0     0   791     6     -     -     0    14     0
    0    19    46     -     0     3     0     0   810  1151     0     0   791     6     -     -     0     0     2
    0    20    45     -     0     3     0     0   810  1151     0     0   791     6     -     -     0     0     2
    0    21    46     -     1     3     0     0   810  1151     0     0   791     6     -     -     0     0     2
    0    19    45     -     9    10     0     0   810  1151     0     0   821     6     -     -     0     0     0

the parameters to watch