Open Source Self-Hosted Structure-Aware Production-Ready

The "Set & Forget"
Data Pipeline for Enterprise RAG

Convert unstructured data into structured knowledge. Zero data egress. 100% Data Sovereignty.

Get Started on GitHub

❮ ❯

🔄

Process PDF, DOCX, PPTX, HTML, TXT, JSON and more through a single, robust AI worker pipeline.

📂

Auto-sync Google Drive folders with incremental updates. Works with OAuth for secure access.

🔍

Advanced retrieval using Dense vectors + Sparse SPLADE vectors via Qdrant for superior accuracy.

🛡️

100% self-hosted via Docker. You own your infrastructure. No external API calls for data processing.

⚡

Built with TypeScript and Python. Includes structured logging, Prometheus metrics, and rate limiting.

✨

Automated analysis to flag poor content. Auto-fix pipeline merges short chunks and splits long ones.

The "Set & Forget"Data Pipeline for Enterprise RAG