Date: September 26, 2024
Lecture Duration: 1.5 hours
Topic Overview: In this lecture, we move beyond spatial pixel manipulation to explore images in the Frequency Domain. We also tackle the fundamental challenge of matching parts of images using Local Features (Keypoints) and explore how to represent images at multiple scales and in different color spaces to solve specific vision tasks.
1. The Frequency Domain (Fourier Transforms)
We often think of images as a grid of spatial intensities \((x, y)\). However, images can also be viewed as a sum of waves. We introduced the Fourier Transform (FT), a mathematical tool that decomposes a signal into its constituent frequencies.
- 1D Signals: We visualized how complex waveforms are constructed from simple sine waves.
- 2D Images: We learned to interpret the Magnitude Spectrum. We saw that “low frequencies” correspond to smooth gradients (backgrounds), while “high frequencies” correspond to edges and noise.
- Orientation: Using a checkerboard example, we demonstrated that the Fourier Transform captures not just the scale of texture, but its orientation.
2. Feature Detection and Matching
To stitch panoramas or track objects, we need to find “interesting” points that are unique and stable.
- Harris Corner Detector: We implemented the Harris algorithm, which finds corners by looking for windows where intensity changes in all directions (using Eigenvalues of the structure tensor).
- BRIEF Descriptor: Once a corner is found, we need to describe it. We introduced BRIEF, a binary descriptor that uses simple intensity comparisons to create a fast, compact digital fingerprint (bit-string).
- Feature Matching: We used the Brute-Force Matcher with Hamming Distance and Cross-Checking to robustly link features between two images, even when one is rotated.
3. Multi-Scale Representations (Pyramids)
Real-world objects appear at different sizes depending on their distance from the camera. To handle this, we introduced Scale Space Theory via Image Pyramids.
- Gaussian Pyramid: A sequence of downsampled images used to search for objects at different scales.
- Laplacian Pyramid: Captures the difference between scales (band-pass), effectively isolating edges and details. This is widely used for seamless image blending.
4. Color Spaces
Finally, we moved beyond the standard RGB model. We analyzed why RGB is often poor for computer vision analysis and explored alternatives:
- HSV: Separates color (Hue) from brightness (Value), making it ideal for color-based segmentation (e.g., isolating a red motorcycle).
- Lab: A perceptually uniform space where mathematical distance matches human visual perception.
- YCbCr: Separates Luminance from Chrominance, the basis for modern image and video compression.
Interactive Demonstration
Below is the complete Jupyter Notebook used in class. It contains the Python implementations for Fourier Analysis, the Harris Corner Detector, BRIEF matching, and Color Space segmentation.