AI Document-Extraction & Responsibility-File Manager
Romanian legal/compliance document tooling
Overview
A Next.js + Firebase build that builds compliance “responsibility files” from real-world business documents. It ingests uploaded PDFs (invoices, dispatch/transport notes, reception records, permits/avize), uses an LLM to extract structured fields, and links them together into auditable dossiers.
The Challenge
Assembling a traceability/responsibility dossier means re-keying data from stacks of scanned invoices, transport documents, reception reports, and official permits into a consistent structure. The build automates capture and extraction and maintains the relationships between linked documents, replacing slow, error-prone manual entry.
What We Built
A Next.js (App Router, TypeScript) front end with feature views per document type, InvoicesView, DispatchNotesView, ReceptieView, AvizView, ResponsibilityFilesView, plus edit/link modals (EditInvoiceModal, LinkDocumentModal, CreateResponsibilityFileModal) and a FileUpload/ProcessingModal ingestion flow. Auth runs through an AuthContext on Firebase Auth; data persists in Firestore (lib/firestore.ts) with security rules and indexes. A Firebase Cloud Function (functions/) handles the AI path: pdf-to-png-converter rasterizes uploaded PDFs and the OpenAI SDK extracts structured fields, written back to Firestore. The repo also ships generated type/analysis documentation describing the bidirectional document-linking model.
Technologies & Approach
Next.js/React/Tailwind for the UI; Firebase as the full serverless backend (Auth, Firestore, Cloud Functions, Storage) with rules and indexes; OpenAI for field extraction, fed by PDF-to-PNG preprocessing so scanned pages can be read as images; jszip and file-saver for export and crypto-js for hashing. Document logic is organized so each compliance artifact maps to its own view, type, and Firestore collection.
Outcome / Impact
A working build proving an end-to-end document-automation loop, upload, rasterize, AI-extract, store, and cross-link, on a serverless Firebase stack, validating the approach later carried into the studio’s broader dossier-generation products.
Capabilities Demonstrated
- AI extraction of structured fields from scanned PDFs
- PDF-to-image preprocessing for LLM document reading
- Bidirectional document linking and dossier assembly
- Serverless Firebase backend (Auth, Firestore, Functions, Storage)