GradCAM Walkthrough
Date: October 31, 2024
Lecture Duration: 2.5 hours
Topic Overview: This lecture dives into the interpretability of deep learning models, specifically Convolutional Neural Networks (CNNs). We explore Grad-CAM (Gradient-weighted Class Activation Mapping), a powerful technique used to visualize and understand which regions of an image a model relies on to make its predictions.
1. Understanding Model Interpretability
We started by discussing the “black box” nature of deep learning and why interpretability matters in computer vision tasks.
- The Need for Visualization: Understanding that high accuracy isn’t enough; we need to verify that our models are looking at the right features (e.g., classifying a shark based on its fins, not just the blue water).
- Grad-CAM Intuition: We explored the theory behind Grad-CAM. It uses the gradients of a target concept flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.
2. Implementing Grad-CAM with PyTorch
The core of the lecture was a hands-on walkthrough implementing Grad-CAM from scratch using a pre-trained VGG19 model.
- Model and Data Preparation: Loading the VGG19 model, downloading sample images (elephant, shark, iguana), and applying the necessary preprocessing transformations (resizing, tensor conversion, normalization).
- The Grad-CAM Pipeline:
- Forward pass to get predictions and feature maps.
- Backward pass to compute gradients with respect to the target class.
- Global average pooling of gradients to obtain neuron importance weights.
- Creating the heatmap by taking a weighted combination of forward activation maps followed by a ReLU activation.
- Visualization: Overlaying the resulting heatmap onto the original image to clearly see the model’s areas of focus.
Lecture Slides
Core Reading
- Paper: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (Selvaraju et al., IJCV 2020)
Interactive Demonstration
Below is the complete Jupyter Notebook used in class. It contains the step-by-step PyTorch implementation of Grad-CAM applied to various sample images.