Validated against 3,034 emails from the Enron investigation (allen-p custodian). Ground truth filtered by custodian for accurate precision/recall measurement.
Micro-tier benchmark on a single custodian. Recall metrics reflect single-query performance - in practice, reviewers run multiple queries per topic, and cumulative recall across a review session is significantly higher. Results will vary with corpus size and composition.
Hybrid search finds over half of all relevant documents in a single query. Cumulative recall across a review session is significantly higher.
Keyword, semantic, hybrid, RAG fusion, and full pipeline. Each optimized for different use cases.
From raw email upload to fully searchable, triaged corpus with embeddings and AI relevance scores.
Embedding + LLM triage costs. Semantic and hybrid search modes are free after initial embedding.
4 parallel workers - Hetzner AX41 bare-metal (64GB RAM)
| Stage | Duration | Documents | Details |
|---|---|---|---|
| Ingest | 2.9s | 3,034 | 1,054 docs/sec throughput |
| Process | 10m 4s | 3,034 | Text extraction, metadata, MIME detection |
| Embed | 7m 54s | 3,034 | OpenAI text-embedding-3-small (1536 dims) |
| Triage | ~2m | 3,034 | AI relevance scoring with citations |
| Total | ~18 min | 3,034 | End-to-end with 4 parallel workers |
Micro-tier benchmark - ground truth: custodian-filtered Enron dataset
cli bench run --tier microThe demo matter contains the exact same Enron dataset. Search, triage, and export it yourself.
TRY_INTERACTIVE_DEMO