Date: September 19, 2024
Lecture Duration: 1.5 hours
Topic Overview: In previous lectures, we treated images as static grids of pixels. In this lecture, we unlock two critical capabilities: Geometric Manipulation (understanding how to warp, align, and correct images) and Feature Recognition (moving from raw pixels to semantic meaning, like “this is a face”).
1. 2D Planar Transformations
We begin by defining the mathematical rules that allow us to map points from one view to another. This is the foundation for image stabilization, panoramas, and augmented reality.
We will explore the hierarchy of 2D transformations, ordered by their complexity and Degrees of Freedom (DOF):
- Translation: Moving an image on the x-y plane.
- Euclidean (Rigid): Adding rotation while preserving lengths.
- Similarity: Adding uniform scaling.
- Affine: Adding shear (parallel lines remain parallel).
- Projective (Homography): Simulating perspective changes (parallel lines converge).
2. Lens Distortions
The pinhole camera model is an idealization. Real-world lenses introduce imperfections, particularly Radial Distortion, where straight lines appear curved. We will simulate and visualize:
- Barrel Distortion: Common in wide-angle lenses (GoPro, Fish-eye).
- Pincushion Distortion: Common in telephoto lenses.
3. Feature Representation: From SIFT to Deep Learning
How does a computer recognize a car or a face? We discuss the evolution of feature extraction:
- Bag of Visual Words (BoW): A classical NLP-inspired approach that treats image features (SIFT) like “words” in a document to classify scenes.
- Deep Face Recognition: A modern approach using Deep Learning (Convolutional Neural Networks) to map faces into a 128-dimensional Euclidean space where distance equals similarity.
4. Evaluation Metrics: Intersection over Union (IoU)
Finally, we address a critical question: How do we grade an Object Detector? Unlike simple classification (True/False), detecting an object involves drawing a box. We introduce IoU, the standard metric for measuring the overlap between a predicted box and the ground truth.
Interactive Demonstration
Below is the complete Jupyter Notebook used in class. It contains Python implementations for the geometric transformation matrices, lens distortion simulations, and the DeepFace recognition pipeline.