Computer Vision
Classify, detect, segment, and generate visual content.
Computer vision makes machines interpret visual content — classifying, detecting, segmenting, and generating images. The methods below range from CNNs and Vision Transformers through real-time detectors like YOLO to generative diffusion models.
- Use Vision Transformers for large-scale, modern image and multimodal tasks.
- Use CNNs as strong general-purpose baselines and YOLO for real-time detection.
| # | Algorithm | Best for | Common fields |
|---|---|---|---|
| 1 | CNNs | Image classification and feature extraction |
|
| 2 | Vision Transformers | Modern image classification and multimodal models |
|
| 3 | YOLO-style Detectors | Real-time object detection |
|
| 4 | R-CNN / Faster R-CNN / Mask R-CNN | Object detection and segmentation |
|
| 5 | U-Net | Pixel-level segmentation |
|
| 6 | Diffusion Models | Image generation/editing |
|
| 7 | Classical CV + ML: SIFT, HOG, SVM | Smaller/legacy vision systems |
|