Local Outlier Factor

  • LOF

Best for: Density-based outliers Aliases: LOF

How it works

$$\text{LOF}_k(x)=\frac{1}{|N_k(x)|}\sum_{o\in N_k(x)}\frac{\text{lrd}(o)}{\text{lrd}(x)}$$

Compares the local density of a point to that of its $k$ neighbours via the ratio $\text{LOF}_k(x)=\frac{1}{|N_k(x)|}\sum_{o\in N_k(x)}\frac{\text{lrd}(o)}{\text{lrd}(x)}$. The local reachability density $\text{lrd}(x)$ is the inverse of the average reachability distance $\text{reach-dist}_k(o,x)=\max(d_k(o),\|o-x\|)$ over the neighbourhood, which uses the $k$-distance to dampen spikes. Points in sparse regions have $\text{LOF}\gg 1$ and are outliers, while inliers sit at $\text{LOF}\approx 1$.

When to use

Local density-based outlier detection where anomalies are relative to their neighborhood.

Watch out

Struggles with varying densities; n_neighbors tuning matters; default threshold can be unreliable.

Common fields

Sensor data · geospatial data · fraud