R-CNN / Faster R-CNN / Mask R-CNN

Best for: Object detection and segmentation

How it works

$$\text{score}(r)=\text{softmax}\bigl(W\cdot\text{RoIPool}(\phi(x),r)\bigr)$$

Two-stage detectors. A Region Proposal Network (RPN) first proposes candidate boxes $r$ from shared convolutional features $\phi(x)$; features inside each box are cropped and resized by RoIPool/RoIAlign and passed to classification and box-regression heads $\text{score}(r)=\text{softmax}\bigl(W\cdot\text{RoIPool}(\phi(x),r)\bigr)$. Mask R-CNN adds a small per-class segmentation head that predicts a pixel mask for each detected instance. The two-stage design trades speed for higher accuracy on small or dense objects.

Common fields

Medical imaging · autonomous driving