10 Chapter

Computer Vision

Classify, detect, segment, and generate visual content.

Computer vision makes machines interpret visual content — classifying, detecting, segmenting, and generating images. The methods below range from CNNs and Vision Transformers through real-time detectors like YOLO to generative diffusion models.

  • Use Vision Transformers for large-scale, modern image and multimodal tasks.
  • Use CNNs as strong general-purpose baselines and YOLO for real-time detection.
#AlgorithmBest forCommon fields
1CNNs Image classification and feature extraction
  • Medical imaging
  • manufacturing
  • retail
2Vision Transformers Modern image classification and multimodal models
  • Research
  • large-scale vision
  • document AI
3YOLO-style Detectors Real-time object detection
  • Surveillance
  • robotics
  • retail
  • autonomous systems
4R-CNN / Faster R-CNN / Mask R-CNN Object detection and segmentation
  • Medical imaging
  • autonomous driving
5U-Net Pixel-level segmentation
  • Medical imaging
  • satellite imagery
6Diffusion Models Image generation/editing
  • Design
  • media
  • advertising
7Classical CV + ML: SIFT, HOG, SVM Smaller/legacy vision systems
  • Industrial inspection
  • embedded systems