For developers building AI-powered tools in industries like logistics, finance, and healthcare, one challenge consistently stands in the way: getting large language models (LLMs) to reliably extract data from real-world documents. PDF invoices, 200-page reports, handwritten forms, and scanned IDs often trip up otherwise powerful AI systems.
Retab, a new startup founded by engineers who faced this problem firsthand, has just launched to solve it. Alongside the public debut of its platform, the company also announced $3.5 million in pre-seed funding led by VentureFriends, Kima Ventures, and K5 Global, with participation from Eric Schmidt (via StemAI), Olivier Pomel (CEO, Datadog), and Florian Douetteau (CEO, Dataiku).
Rather than being another LLM provider, Retab sits one layer above—offering a developer-first platform for document AI that lets users define exactly what data they want to extract, then handles the entire process: labeling, evaluating, prompt engineering, model benchmarking, and routing.
“People keep building AI demos that look magical but fall apart in production,” said Louis de Benoist, co-founder and CEO of Retab. “We built Retab because we were tired of wiring up brittle pipelines just to extract a few fields from a document. Now, developers can focus on the schema they want—we handle the rest.”
What Retab Actually Does
At its core, Retab turns unstructured documents—PDFs, scans, forms—into clean, structured JSON or tabular outputs that developers can drop into production systems. Built as an SDK and platform, it abstracts away all the complexity of building AI-powered data extraction workflows.
Companies using Retab simply describe the schema of the data they want. Retab then auto-generates labeled datasets, selects the optimal LLM(s), refines prompts, and handles error detection and retry logic. It ensures production-grade accuracy through three core innovations:
- Self-Optimizing Schemas: Retab uses an internal AI agent to iteratively test and refine extraction instructions using real examples, eliminating the need for manual tuning.
- Intelligent Model Routing: The platform is model-agnostic and automatically benchmarks across LLMs (e.g., OpenAI, Anthropic, Google), routing each document to the best model based on cost, speed, and accuracy requirements. This has enabled some users to cut processing costs by up to 100x.
- k-LLM Consensus & Guided Reasoning: Instead of relying on a single model's output, Retab enforces step-by-step reasoning (chain-of-thought) and runs multiple models in parallel to reach a consensus. If uncertainty remains, it's flagged or recalculated, giving developers confidence in every answer.
This orchestration layer gives developers the power to turn error-prone document flows—like contract parsing, identity verification, or invoice analysis—into scalable, self-correcting systems.
From Logistics to Infrastructure
The founders originally built Retab’s foundation while automating internal processes for document-heavy operations in the logistics industry. But as they refined the tooling, they realized its value far exceeded any single use case. Today, Retab is already being used by dozens of companies across:
- Logistics: Parsing bills of lading, customs manifests, and delivery records
- Finance: Extracting risk factors and financial metrics from long-form reports
- Healthcare: Automating intake forms, claims, and medical records
One trucking company used Retab to identify the smallest, fastest model configuration that met their 99% accuracy requirement—reducing compute cost and latency without sacrificing performance. A financial firm cut days off quarterly analysis by using Retab to extract structured risk indicators from investor documents.
“The AI economy depends on turning messy, human-readable documents into structured, verifiable data,” said Florian Douetteau, CEO of Dataiku. “Retab is the platform that makes that leap possible at scale.”
Looking Ahead
Retab is now expanding beyond documents: upcoming releases will allow users to extract data from webpages and dynamic content, opening the door to use cases like competitive analysis, compliance scraping, and onboarding automation. Integrations with tools like Zapier, n8n, and Dify are also on the way, letting Retab slot into existing workflows without custom code.
Long-term, Retab aims to become the middleware layer between the world’s unstructured data and the AI agents that rely on it—whether that’s for enterprise search, RPA, or AI copilots.
Despite having just ten employees, Retab is already being recognized as a foundational building block for developers building AI-native products—not just another vendor, but a toolset to operationalize the messy reality of real-world data.
The post Retab Raises $3.5M and Launches AI-Powered Platform to Turn Messy Documents into Structured Data appeared first on Unite.AI.