GradCAM Walkthrough

Date: October 31, 2024
Lecture Duration: 2.5 hours
Topic Overview: This lecture dives into the interpretability of deep learning models, specifically Convolutional Neural Networks (CNNs). We explore Grad-CAM (Gradient-weighted Class Activation Mapping), a powerful technique used to visualize and understand which regions of an image a model relies on to make its predictions.

1. Understanding Model Interpretability

We started by discussing the “black box” nature of deep learning and why interpretability matters in computer vision tasks.

The Need for Visualization: Understanding that high accuracy isn’t enough; we need to verify that our models are looking at the right features (e.g., classifying a shark based on its fins, not just the blue water).
Grad-CAM Intuition: We explored the theory behind Grad-CAM. It uses the gradients of a target concept flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

2. Implementing Grad-CAM with PyTorch

The core of the lecture was a hands-on walkthrough implementing Grad-CAM from scratch using a pre-trained VGG19 model.

Model and Data Preparation: Loading the VGG19 model, downloading sample images (elephant, shark, iguana), and applying the necessary preprocessing transformations (resizing, tensor conversion, normalization).
The Grad-CAM Pipeline:
- Forward pass to get predictions and feature maps.
- Backward pass to compute gradients with respect to the target class.
- Global average pooling of gradients to obtain neuron importance weights.
- Creating the heatmap by taking a weighted combination of forward activation maps followed by a ReLU activation.
Visualization: Overlaying the resulting heatmap onto the original image to clearly see the model’s areas of focus.

Lecture Slides

Core Reading

Paper: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (Selvaraju et al., IJCV 2020)

Interactive Demonstration

Below is the complete Jupyter Notebook used in class. It contains the step-by-step PyTorch implementation of Grad-CAM applied to various sample images.

← Back to Computer Vision