Train Resnet50 on ImageNet with PyTorch

Without further due, here is a one pager code for training Resnet50 on ImageNet in PyTorch:

import torch
import torchvision
import torchvision.transforms as transforms

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set hyperparameters
num_epochs = 10
batch_size = 64
learning_rate = 0.001

# Initialize transformations for data augmentation
transform = transforms.Compose([
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

# Load the ImageNet Object Localization Challenge dataset
train_dataset = torchvision.datasets.ImageFolder(

train_loader =, batch_size=batch_size, shuffle=True, num_workers=2)

# Load the ResNet50 model
model = torchvision.models.resnet50(pretrained=True)

# Parallelize training across multiple GPUs
model = torch.nn.DataParallel(model)

# Set the model to run on the device
model =

# Define the loss function and optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model...
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        # Move input and label tensors to the device
        inputs =
        labels =

        # Zero out the optimizer

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass

    # Print the loss for every epoch
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}')

print(f'Finished Training, Loss: {loss.item():.4f}')

This code will train Resnet50 model on the ImageNet dataset for 10 epochs using ADAM optimizer with a learning rate of 0.001. The model is trained on GPU if available, otherwise it is trained on CPU.

Note that the code is adjusted to run with ImageNet Object Localization Challenge on Kaggle. You may check some results in the notebook Train Resnet50 on Imagenet with PyTorch.

Expanded explanation of each training step

Import the necessary PyTorch modules:

import torch
import torchvision
import torchvision.transforms as transforms
  • torch provides tensors and basic mathematical operations
  • torchvision provides utilities for loading and preprocessing image data
  • torchvision.transforms is a submodule of torchvision that provides functions for performing image preprocessing

Set the device to use for training:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

This line sets the device variable to “cuda” if a GPU is available, otherwise it sets it to “cpu”. The model and tensors will be moved to this device later in the code.

Prepare transformations for data augmentation

transform = transforms.Compose([
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

This block of code makes up the set of transformations that will be applied during training. In particular, the transforms.Normalize takes two arguments:

  • [0.485, 0.456, 0.406] - the mean of the data along each channel (i.e., the red, green, and blue channels for an image).
  • [0.229, 0.224, 0.225] - the standard deviation of the data along each channel.

These exact values are used for normalizing data that has been pre-trained on the ImageNet dataset. They are based on the statistics of the ImageNet dataset, which consists of a large number of natural images.

Load the ImageNet dataset:

train_dataset = torchvision.datasets.ImageFolder(

This line loads ImageNet dataset in Kaggle’s format and applies all transformations defined above.

Create a dataloader for the dataset:

train_loader =, batch_size=batch_size, shuffle=True, num_workers=2)

The function creates a dataloader for the dataset. The batch_size parameter specifies the number of samples per batch, the shuffle parameter specifies whether to shuffle the data at each epoch, and the num_workers parameter specifies the number of worker threads to use for loading the data

A rule of thumb for the number of workers is the number of CPU cores minus 1 for controlling processes os.cpu_count() or multiprocessing.cpu_count() can help with this.

Load the Resnet50 model:

model = torchvision.models.resnet50(pretrained=True)

This line uses the torchvision.models.resnet50 function to load the Resnet50 model, with the pretrained parameter set to True to use the pretrained weights.

Parallelize training across multiple GPUs

model = torch.nn.DataParallel(model)

torch.nn.DataParallel wraps a model and splits the input across available GPUs, then it replicates the model on each GPU. The model is then run in parallel on each GPU, with the results from each GPU being collected and concatenated together. Normally this significantly speeds up training process, especially for large models on GPUs with a high number of parallel processing cores.

Move the model to the device:

This line moves the model and its parameters to the device specified earlier.

Set the loss function and optimizer:

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

The torch.nn.CrossEntropyLoss function creates a cross entropy loss criterion, which is commonly used for classification tasks. The torch.optim.Adam function creates an Adam optimizer with the specified learning rate. The model.parameters() method returns a list of the model’s trainable parameters, which the optimizer will adjust during training.

Train the model:

for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        # Move input and label tensors to the device
        inputs =
        labels =

        # Zero out the optimizer

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass

    # Print the loss for every epoch
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}')

This block of code trains the model for 10 epochs. An epoch is a complete pass through the training data.

In each epoch, the code iterates over the dataloader, which yields batches of inputs and labels. The inputs and labels are moved to the device, and the gradients are zeroed using the optimizer.zero_grad() method.

During training process, gradients of the model's parameters are computed using backpropagation, which involves propagating the loss gradient back through the model's layers to compute the gradients of the model's parameters. These gradients are used to update model's parameters using the optimizer's update rule.

However, if you don't zero gradients before each training step, gradients will accumulate and the update rule will be based on the sum of the gradients over all previous training steps. This can cause the model's parameters to oscillate or diverge, leading to poor convergence and potentially poor model performance.

By calling optimizer.zero_grad() before each training step, you reset the gradients of the model's parameters to zero, ensuring that the update rule is based only on the gradients of the current training step.

Next, the model performs a forward pass on the inputs, producing output logits. The loss is then computed using output logits and labels, and the model’s gradients are computed using the loss.backward() method. Finally, the optimizer takes a step to update the model’s parameters using gradients.

Training environment with Docker

FROM pytorch/pytorch

# Install additional dependencies
RUN apt-get update && apt-get install -y \
    wget \

# Copy Kaggle credentials 
RUN mkdir -p /root/.kaggle
ADD kaggle.json /root/.kaggle

# Install Kaggle toolchain and download imagenet 
# Rememeber to join competition
RUN pip install -q kaggle
RUN kaggle competitions download -c imagenet-object-localization-challenge
RUN unzip -d imagenet-object-localization-challenge

# Download pretrained model and store in the image layer 
# Available at the path /root/.cache/torch/
RUN python -c "import torchvision; model = torchvision.models.resnet50(pretrained=True)"

# Copy the code
COPY /app/

# Set the working directory

# Run the code
CMD ["python", ""]

This Dockerfile is based on pytorch/pytorch image, which provides all necessary dependencies for running PyTorch programs with GPU acceleration.

The Dockerfile installs wget and unzip utilities, which are needed to download the ImageNet dataset. It then downloads the dataset and extracts images to the imagenet-object-localization-challenge directory.

Next, the Dockerfile copies file, which should contain the code for training the ResNet50 model, to the /app directory. It then sets the working directory to /app and specifies that the python command should be run when the container is started.

It is important to note that you have to provide your own secret in kaggle.json file to be able to download locally the data from ImageNet Object Localization Challenge on Kaggle. Here is the local directory structure

➜  2022-12-18-one-pager-training-resnet-on-imagenet git:(master) ✗ tree
├── .gitignore
├── Dockerfile
├── kaggle.json

To build the Docker container, you can run the following command:

docker build -t resnet50 .

This will build the Docker container and tag it with the name “resnet50”.

To run the container, you can use the following command:

docker run --gpus all -it resnet50

This will run the container with access to all available GPUs and start the container in interactive mode.

This will start the training of the ResNet50 model on the ImageNet dataset. You should see the running loss printed to the console as the training progresses.

I hope this helps! Let me know if you have any questions.

Find all tables without primary key in PostgreSQL

Search across all databases (schemas) for tables without primary key

The following query obtains the list of tables without primary key, those who destroys the database performance

FROM information_schema.tables as t
LEFT JOIN information_schema.table_constraints as tc 
ON (
        t.table_schema = t.table_schema
    AND t.table_name = tc.table_name 
    AND tc.constraint_type = 'PRIMARY KEY'
	t.table_type = 'BASE TABLE'
AND t.table_schema not in ('pg_catalog', 'information_schema')
AND tc.constraint_name is NULL

In this example, the table tables is used to find all tables registered in PostgreSQL and LEFT JOIN with table_constraints is used to select relative constraints.

WHERE clause is used to filter out the system related databases and filter the tables that do not have a primary key.

Restrict search for tables without primary key to a specific databases (schema)

FROM information_schema.tables as t
LEFT JOIN information_schema.table_constraints as tc 
ON (
        t.table_schema = t.table_schema
    AND t.table_name = tc.table_name 
    AND tc.constraint_type = 'PRIMARY KEY'
	t.table_type = 'BASE TABLE'
AND t.table_schema not in ('pg_catalog', 'information_schema')
AND t.table_schema = '<database name>' -- put database name here
AND tc.constraint_name is NULL

A friendly advise, the result list of these queries should be an Empty set.

Happy querying!

Fixing error 'Matplotlib is currently using agg'

Have you ever built graphs in matplotlib or any other GUI running in docker? You will have to overcome some obstacles

  1. missing X11 server inside docker
  2. missing X11-related libraries to able to render your GUI elements

For example, I’m writing this simple python program

import matplotlib.pyplot as plt
import numpy as np

# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')

First allow to connect to X11 server by

xhost local:root

And I want to run it in the docker container, by emitting the command

docker run -it --rm --env="DISPLAY" -v "/tmp/.X11-unix:/tmp/.X11-unix:rw" -v $PWD:/app python-container python /app/

/app/examples/ UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.

To fix the error Matplotlib is currently using agg, one dependency must be installed. Below I provide a minimalist Dockerfile which includes the most important X11 library dependency libx11-6, which works fine on Ubuntu 20-22.04 and Debian 11+.

FROM python

RUN apt-get update && \
    apt-get install -y \
        python3 \
        python3-setuptools \
        libx11-6 \

RUN pip3 install numpy matplotlib


et voilà, we have perfectly running minimalist docker container that is able to connect to a local X11 server and create GUI elements!

Note 1. To build an image run

docker build -t python-container .

Note 2. What does --env="DISPLAY" stand for? --env propagates $DISPLAY environment variable into the container.

Note 3. What -v "/tmp/.X11-unix:/tmp/.X11-unix:rw" does? It mounts X11 socket into the /tmp folder inside the container that will be used by GUI applications to render GUI elements.

Monitor GPU utilization with nvidia-smi

When training you’d love to know how efficiently GPU is utilized. Nvidia provides a tool nvidia-smi with a driver.

Just invoking it without any parameters it gives you a matrix with basic GPU parameters

$ nvidia-smi
Fri Dec  2 23:13:41 2022       
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 20%   50C    P5    25W / 250W |    744MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A      4804      G   /usr/lib/xorg/Xorg                292MiB |
|    0   N/A  N/A      4918      G   /usr/bin/gnome-shell              108MiB |
|    0   N/A  N/A     10549      G   ...390539104842029425,131072      340MiB |

but how to monitor continuously the GPU usage, we have to use keys

$ nvidia-smi dmon -s pucvmet
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk pviol tviol    fb  bar1 sbecc dbecc   pci rxpci txpci
# Idx     W     C     C     %     %     %     %   MHz   MHz     %  bool    MB    MB  errs  errs  errs  MB/s  MB/s
    0    24    46     -     2     4     0     0   810  1151     0     0   791     6     -     -     0    14     0
    0    19    46     -     0     3     0     0   810  1151     0     0   791     6     -     -     0     0     2
    0    20    45     -     0     3     0     0   810  1151     0     0   791     6     -     -     0     0     2
    0    21    46     -     1     3     0     0   810  1151     0     0   791     6     -     -     0     0     2
    0    19    45     -     9    10     0     0   810  1151     0     0   821     6     -     -     0     0     0

the parameters to watch

  • p - Power Usage (in Watts) and Gpu/Memory Temperature (in C) if supported
  • u - Utilization (SM, Memory, Encoder and Decoder Utilization in %)
  • c - Proc and Mem Clocks (in MHz)
  • v - Power Violations (in %) and Thermal Violations (as a boolean flag)
  • m - Frame Buffer and Bar1 memory usage (in MB)
  • e - ECC (Number of aggregated single bit, double bit ecc errors) and PCIe Replay errors
  • t - PCIe Rx and Tx Throughput in MB/s (Maxwell and above)

Cleaning up Docker space

Regularly cleaning your dangling containers and images.

Step 1: cleaning containers, don’t worry it destroys only stopped containers

docker ps -aq| xargs docker rm

Step 2: removing dangling images

docker rmi $(docker images -q --filter "dangling=true")

Docker itself offers a number of tools to prune and clean up space

  1. Inspecting docker filesystem: docker system df
  2. Pruning stopped containers: docker container prune
  3. Removing all local volumes: docker volume prune
  4. docker system prune will remove
    • all stopped containers
    • all networks not used by at least one container
    • all dangling images
