Algorithm hub

Algorithmic Space Partitioning

The core database dilemma is Exact Nearest Neighbor versus Approximate Nearest Neighbor. One protects recall. The other protects latency.

The Core Dilemma: ENN vs ANN

Exact Nearest Neighbor, often called Flat Search, compares the query vector with every vector in the database. It gives 100% perfect recall under the chosen distance metric. If the true nearest vector exists in the corpus, exact search finds it.

That certainty has a cost. Flat search is an O(N) scan over the corpus, and each comparison reads the vector dimensions. At scale, this saturates CPU and memory bandwidth. A system with millions or billions of vectors cannot treat every query as a full audit and still hit interactive latency.

Approximate Nearest Neighbor changes the contract. ANN intentionally sacrifices perfect recall to reach much lower lookup times. It prunes, routes, clusters, compresses, or walks a graph. The database tradeoff is direct: ENN gives certainty and pays with latency. ANN gives speed and pays with measured recall risk.

ANN Pros

Massive latency reduction compared with Flat Search.
Can fit larger corpora in standard RAM when paired with compression.
Scales search patterns toward billions of vectors.

ANN Cons

Recall is not deterministic across all queries.
Large indexes can have high build and rebuild times.
Requires parameter tuning, such as nprobe.

Navigation grid

Choose the Algorithm Chapter

1D BST Collapse

Why scalar tree ordering breaks when vector similarity depends on distance across hundreds or thousands of dimensions.

Open chapter ->

IVF Clustering

How centroid-owned cells cut the scan set, why nprobe controls recall, and where boundary misses appear.

Open chapter ->

HNSW Graphs

How proximity graphs route search through neighbor links instead of scanning partitions or the full corpus.

Open chapter ->

Product Quantization

How vector compression reduces memory and distance cost while adding quantization error to the search path.

Open chapter ->

Technical FAQs

When should I force an exact Flat scan over HNSW?

Use a Flat scan when the dataset is small, recall must be perfect, or you need ground truth for evaluating an approximate index.

When does ANN become necessary?

ANN becomes necessary when the corpus is large enough that exact distance checks saturate CPU or memory bandwidth before meeting latency targets.

Can ANN return different results after tuning?

Yes. ANN recall depends on index type, graph depth, nprobe, candidate limits, compression, and reranking policy.

How should teams compare ENN and ANN quality?

Build judged query sets, run exact kNN as ground truth, then measure Recall@K, MRR, NDCG, latency, and memory cost for each ANN setting.

Is ANN acceptable for compliance or safety-critical search?

Only with guardrails. Use exact filters, metadata constraints, reranking, audits, and fallback paths when missing a neighbor has real cost.