10 Chapter

Computer Vision

Classify, detect, segment, and generate visual content.

Computer vision makes machines interpret visual content — classifying, detecting, segmenting, and generating images. The methods below range from CNNs and Vision Transformers through real-time detectors like YOLO to generative diffusion models.

Use Vision Transformers for large-scale, modern image and multimodal tasks.
Use CNNs as strong general-purpose baselines and YOLO for real-time detection.

#	Algorithm	Best for	Common fields
1	CNNs	Image classification and feature extraction	Medical imaging manufacturing retail
2	Vision Transformers	Modern image classification and multimodal models	Research large-scale vision document AI
3	YOLO-style Detectors	Real-time object detection	Surveillance robotics retail autonomous systems
4	R-CNN / Faster R-CNN / Mask R-CNN	Object detection and segmentation	Medical imaging autonomous driving
5	U-Net	Pixel-level segmentation	Medical imaging satellite imagery
6	Diffusion Models	Image generation/editing	Design media advertising
7	Classical CV + ML: SIFT, HOG, SVM	Smaller/legacy vision systems	Industrial inspection embedded systems