Algorithm hub

Algorithmic Space Partitioning

The core database dilemma is Exact Nearest Neighbor versus Approximate Nearest Neighbor. One protects recall. The other protects latency.

The Core Dilemma: ENN vs ANN

Exact Nearest Neighbor, often called Flat Search, compares the query vector with every vector in the database. It gives 100% perfect recall under the chosen distance metric. If the true nearest vector exists in the corpus, exact search finds it.

That certainty has a cost. Flat search is an O(N) scan over the corpus, and each comparison reads the vector dimensions. At scale, this saturates CPU and memory bandwidth. A system with millions or billions of vectors cannot treat every query as a full audit and still hit interactive latency.

Approximate Nearest Neighbor changes the contract. ANN intentionally sacrifices perfect recall to reach much lower lookup times. It prunes, routes, clusters, compresses, or walks a graph. The database tradeoff is direct: ENN gives certainty and pays with latency. ANN gives speed and pays with measured recall risk.

ANN Pros

  • Massive latency reduction compared with Flat Search.
  • Can fit larger corpora in standard RAM when paired with compression.
  • Scales search patterns toward billions of vectors.

ANN Cons

  • Recall is not deterministic across all queries.
  • Large indexes can have high build and rebuild times.
  • Requires parameter tuning, such as nprobe.

Technical FAQs

When should I force an exact Flat scan over HNSW?

Use a Flat scan when the dataset is small, recall must be perfect, or you need ground truth for evaluating an approximate index.

When does ANN become necessary?

ANN becomes necessary when the corpus is large enough that exact distance checks saturate CPU or memory bandwidth before meeting latency targets.

Can ANN return different results after tuning?

Yes. ANN recall depends on index type, graph depth, nprobe, candidate limits, compression, and reranking policy.

How should teams compare ENN and ANN quality?

Build judged query sets, run exact kNN as ground truth, then measure Recall@K, MRR, NDCG, latency, and memory cost for each ANN setting.

Is ANN acceptable for compliance or safety-critical search?

Only with guardrails. Use exact filters, metadata constraints, reranking, audits, and fallback paths when missing a neighbor has real cost.