Algorithm hub
Algorithmic Space Partitioning
The core database dilemma is Exact Nearest Neighbor versus Approximate Nearest Neighbor. One protects recall. The other protects latency.
The Core Dilemma: ENN vs ANN
Exact Nearest Neighbor, often called Flat Search, compares the query vector with every vector in the database. It gives 100% perfect recall under the chosen distance metric. If the true nearest vector exists in the corpus, exact search finds it.
That certainty has a cost. Flat search is an O(N) scan over the corpus,
and each comparison reads the vector dimensions. At scale, this saturates CPU and
memory bandwidth. A system with millions or billions of vectors cannot treat every
query as a full audit and still hit interactive latency.
Approximate Nearest Neighbor changes the contract. ANN intentionally sacrifices perfect recall to reach much lower lookup times. It prunes, routes, clusters, compresses, or walks a graph. The database tradeoff is direct: ENN gives certainty and pays with latency. ANN gives speed and pays with measured recall risk.
ANN Pros
- Massive latency reduction compared with Flat Search.
- Can fit larger corpora in standard RAM when paired with compression.
- Scales search patterns toward billions of vectors.
ANN Cons
- Recall is not deterministic across all queries.
- Large indexes can have high build and rebuild times.
- Requires parameter tuning, such as
nprobe.
Navigation grid
Choose the Algorithm Chapter
1D BST Collapse
Why scalar tree ordering breaks when vector similarity depends on distance across hundreds or thousands of dimensions.
Open chapter ->IVF Clustering
How centroid-owned cells cut the scan set, why nprobe controls recall, and where boundary misses appear.
Open chapter ->HNSW Graphs
How proximity graphs route search through neighbor links instead of scanning partitions or the full corpus.
Open chapter ->Product Quantization
How vector compression reduces memory and distance cost while adding quantization error to the search path.
Open chapter ->Technical FAQs
When should I force an exact Flat scan over HNSW?
Use a Flat scan when the dataset is small, recall must be perfect, or you need ground truth for evaluating an approximate index.
When does ANN become necessary?
ANN becomes necessary when the corpus is large enough that exact distance checks saturate CPU or memory bandwidth before meeting latency targets.
Can ANN return different results after tuning?
Yes. ANN recall depends on index type, graph depth, nprobe, candidate limits, compression, and reranking policy.
How should teams compare ENN and ANN quality?
Build judged query sets, run exact kNN as ground truth, then measure Recall@K, MRR, NDCG, latency, and memory cost for each ANN setting.
Is ANN acceptable for compliance or safety-critical search?
Only with guardrails. Use exact filters, metadata constraints, reranking, audits, and fallback paths when missing a neighbor has real cost.