The hybrid RAG engine that answers right, fast, at any scale.
Vector + full-text + knowledge-graph retrieval on a 100% Rust stack. Every number below is measured on the live platform — replayable benchmark reports included.
Proven on real-world corpora — not just public QA.
Every category below was ingested and evaluated end-to-end on the live platform.
MMLongBench & multi-hop QA — fact-checked independently, zero hallucination.
Fund prospectuses, fees, ISIN codes, vintages — exact figures, exhaustive lists.
Civil/insurance codes (FR + DZ), 20,000-line manuals — article-anchored, amendment-aware.
After-sales & support documents — procedures, warranties, ticket knowledge bases.
Full product catalogues (100 MB+), batched OCR + image vision search — specs and product images.
First-class Arabic (RTL, broken-plural recall) alongside French and English — same precision.
p50 167 ms under 50 concurrent queries · ×3 faster than a Python stack · 66 unit tests green · TTFB 9 ms
Why UltraRAG
Built for precision. Engineered for load.
Vectors (Qdrant) + full-text (PostgreSQL) + dedicated entity channel + knowledge graph (Neo4j). Deterministic RRF fusion, BM25 rerank, entity boost.
→ RRF fusion + entity boost + MMR diversity
Domain-tuned · Multilingual
Tuned for 5 domains — and tunable for yours
“Tuned” is not a setting — it’s specialized retrieval algorithms that change how the engine ranks, anchors and traces evidence for each kind of corpus. Pick a profile and the pipeline re-optimizes itself. Need a new vertical (medical, insurance, after-sales…)? We tune a dedicated profile for it.
Legal
Article anchoring, amendment & temporal tracing (created → modified → abrogated), in-force-version bias.
Enterprise docs
Entity & knowledge-graph emphasis — companies, people, funds, relationships always consulted.
Timeline
Chronology-aware retrieval, "as it stands now" bias, version history across document updates.
Catalogue
Product corpora — exact-match boost + reranking; batched OCR for 100 MB+ catalogues; product images via vision search.
Generic
Balanced hybrid defaults — the query planner adapts the strategy per question.
Languages — full retrieval + answer quality
Arabic is first-class: correct stemming & normalization (hamza, tashkeel), a dedicated morphology channel for broken plurals, right-to-left rendering, and the same node & temporal tracing as French. Postgres FTS covers 8+ more analyzers (English, Spanish, German, Italian, Portuguese, Russian…).
Per-workspace control
Define each corpus — no code, no re-deploy
Every workspace is an isolated corpus with its own parameters, config and metadata. Set them from the console or one API call. All optional — leave blank for sensible defaults. Domain knowledge lives in the config, never hardcoded in the engine.
- ✓ Theme profile — the retrieval algorithm preset
- ✓ FTS language — the corpus analyzer (incl. Arabic)
- ✓ Predefined entities — domain terms you know matter
- ✓ Summary hint — for count / statistics questions
- ✓ Isolated, JWT-scoped per organization & client
What it does
An answer engine, not a search box
Hybrid 4-channel retrieval
Vector + full-text BM25 + typed entity channel + knowledge graph, fused by deterministic RRF — recall a single method misses.
Explainable answers
Every answer is sourced, with clickable inline [N] citations that open the exact PDF page or highlighted section. A live graph trace shows which entities answered.
Temporal & amendment tracing
Tracks a fact across versions — created, modified, abrogated — and answers "as it stood in 1985" vs "in force today".
Verified abstention
The engine decomposes its answer into claims, scores each against the sources, and says "insufficient evidence" rather than invent.
Vision search — real images
Ask for “the shoe cabinet with a continuous front” and get the product photo itself, linked from the page that describes it.
Interactive reading (/read)
Document on the left, chat on the right: the passages used light up live while the answer streams; click a citation to jump to the section.
Big-document ingestion
100 MB+ / 1,500-page catalogues via batched OCR; PDF, DOCX, PPTX, XLSX, images. Idempotent re-ingest by content hash.
Cost transparency
Per-query cost in dollars (embedding + LLM in/out), priced automatically from the configured model — plus a simulator to estimate a corpus before ingesting.
Multilingual — incl. Arabic
French, English and Arabic fully tuned (RTL, morphology), plus 8+ full-text analyzers. Same quality across languages.
Domain profiles
Legal, enterprise, timeline and catalogue presets re-optimize the algorithm per corpus — and we tune new verticals on request.
Resilient ingestion
Crash-safe job queue, orphan recovery, heartbeats, bounded retries with backoff, circuit breaker, and true cancellation of a running job.
Any model, every stage
LLM, embeddings, reranker, OCR and vision are each swappable — OpenAI, Mistral, Gemini, or fully local via Ollama/vLLM. One env file.
Under the hood · World-class
The S+ engine — the techniques the best RAG systems are built on
Every advanced retrieval method that defines the state of the art is built in — each one optional, each one measured. Turn on what a corpus needs; the benchmarked path stays untouched until you do.
Late-interaction reranking
Token-level MaxSim (ColBERT) — the precision technique behind the leading commercial engines, as an optional local sidecar.
Graph-native retrieval
Entity-anchored multi-hop (local) and community-scale synthesis (global) over the knowledge graph — relational answers vectors miss.
Hierarchical summaries
RAPTOR-style section summaries indexed beside the leaves: synthesis questions read the overview, detail questions read the source.
Self-correcting retrieval
Self-RAG-style recovery: when evidence is weak the engine reformulates and retries once before it answers — instead of guessing.
Agentic decomposition
Multi-part and comparative questions are split into sub-questions, each retrieved on its own, then merged into one cited answer.
Structured extraction
A JSON schema in, exact field values out — each with the source passage it came from, and an honest “not found” when it isn’t there.
Built for high-stakes domains
Where precision is non-negotiable
Legal & regulatory
Codes, contracts, jurisprudence. Article anchoring, amendment history, "in-force" reasoning — answers a lawyer can cite.
Finance
Funds, fees, ISINs, counterparty and portfolio Q&A. Exhaustive lists, deterministic counts, zero hallucination.
Enterprise knowledge
Policies, HR, procurement, board minutes across thousands of documents — with entities, relations and cross-references.
Catalogues & after-sales
Full product catalogues (100 MB+) and support documents — specs, warranties, procedures, and the product images themselves.
Public sector
Sovereign, on-premise, multilingual. Citizen and agent assistance behind your firewall, with full auditability.
Ingestion Pipeline
Universal ingestion
Universal ingestion
PDF, DOCX, PPTX, XLSX, CSV, HTML, JSON, JSONL (one document per line) and Markdown. Section-aware chunking, concurrent embedding batches (×4 throughput), LLM entity extraction with rule-based fallback.
Multi-tenant · Conversational
Conversational & multi-tenant
Persistent conversations, automatic follow-up condensation, per-conversation memory, isolated workspaces per organization and client.
- ✓ Conversations persistées par utilisateur
- ✓ Condensation automatique des questions de suivi
- ✓ Workspaces isolés par organisation & client
- ✓ JWT auth · SSE streaming token par token
- ✓ API OpenAI-compatible
L'article 41 dispose que…
265ms · 3 sources · score 0.94
Providers — swap with one env var
LLM_PROVIDER=mistral → zero code changesNo vendor lock-in
No vendor lock-in
LLM, embeddings, NER and reranker are each independently swappable: OpenAI, Mistral, DeepSeek, Qwen, Kimi, Cohere, Jina, Voyage, HuggingFace — or fully local with Ollama/TEI. One env file, zero code changes.
White-label · Sovereign · No lock-in
Your brand. Your infrastructure. Your data stays home.
UltraRAG ships as a platform you operate, not a service you depend on. Deploy it wherever your data lives, put your own brand on it, and swap any model with one env var. Nothing leaves your perimeter.
White-label & OEM
- ✓ Your name, logo, colors and domain
- ✓ Embeddable console + API under your product
- ✓ Resell to your own customers as multi-tenant
- ✓ Custom domain profiles tuned for your vertical
No vendor lock-in
- ✓ Any LLM / embeddings / OCR — OpenAI, Mistral, or fully local
- ✓ Bring your own keys; zero data retention
- ✓ Standard stores: PostgreSQL · Qdrant · Neo4j · Redis
- ✓ One .env — no code changes
Security & sovereignty
Architecture
6 Rust microservices, 4 specialized stores
PostgreSQL (full-text) · Qdrant (1024-d vectors) · Neo4j (entity & relation graph) · Redis (queues + cache) — observed by Prometheus + Grafana + MLflow
Rust microservices — the hot path
AI sidecars & apps
Data stores
Observability
Proven reliability
Zero hallucination. Every answer is sourced.
Replayable benchmarks
- ✓ Finance suite (funds, fees, ISIN): 12/12
- ✓ 20,000-line document suite: 10/10
- ✓ Fact-checking independent of the LLM judge
- ✓ MMLongBench head-to-head: beats EdgeQuake on its own 123-question set (+11 pts)
Deterministic answers
- ✓ Same question ⇒ same answer (fixed seed)
- ✓ Exhaustive lists: 15/15 items, 8 runs out of 8
- ✓ “Not in corpus” instead of a hallucination
Under real load
- ✓ 50 concurrent: p50 167 ms
- ✓ 500 concurrent: p95 < 2 s
- ✓ 0.3% CPU / 54 MB RAM on the gateway
Engagement
Premium, by design
A platform license — not metered tokens. Start with a measured pilot, scale to production, own it end-to-end.
Pilot
- ✓ Your documents, your benchmarks
- ✓ Replayable accuracy report
- ✓ Hosted or your cloud
- ✓ Go / no-go in two weeks
Production
- ✓ SLA + support
- ✓ Multi-tenant workspaces
- ✓ Monitoring (Grafana/MLflow)
- ✓ Model & profile tuning
Enterprise · White-label
- ✓ Your brand & domain
- ✓ On-premise or air-gapped
- ✓ Custom vertical profiles
- ✓ Source escrow on request
Deployment
One command. Full platform.
$ ./start_rust_stack.sh
✓ postgres · redis · qdrant · neo4j healthy
✓ 6 Rust microservices healthy
✓ reranker · tei-embed healthy
✓ frontend · admin · vitrine ready
✓ grafana · prometheus · mlflow monitoring
→ App http://localhost:5058
→ API http://localhost:5150
→ Dashboards http://localhost:5056 (Grafana)
→ MLflow http://localhost:5062
→ Metrics http://localhost:9090 (Prometheus)~25 MB Docker images per service. Runs on one server or a cluster. On-premise, private cloud or SaaS — your data stays home.
Investors · Buyers · Testers
Precision is the product. Come measure it yourself.
Live demo on your own documents, replayable benchmarks, open technical due diligence. License, white-label or acquisition — let’s talk.