In-Browser ID/Document OCR with PaddleOCR ONNX Models
A leading Romanian retail bank
Why It Exists
ID and document OCR in banking ideally runs on-device: keeping sensitive imagery in the browser avoids shipping personal documents to a server and cuts latency. This project explores a fully client-side OCR pipeline as an alternative/complement to server-side document reading in the bank’s onboarding flows.
What We Built
A browser OCR engine built on PaddleOCR’s PP-OCRv3 multilingual models. Shell scripts (download_models.sh, convert.sh) fetch the Paddle inference models, latin recognition, multilingual detection and the mobile angle classifier, and convert each to ONNX via paddle2onnx. The client/ is a TypeScript app (okapi-ocr.ts, build scripts, bun.lockb) that runs the three-stage detect → classify → recognize pipeline entirely in the browser using onnxruntime-web, with OpenCV.js (@techstark/opencv-js) and js-clipper for image pre/post-processing and pdf.js for PDF input. It ships a browser-test harness and Netlify build/deploy configuration.
Technologies & Approach
PaddleOCR models exported to ONNX so they run via ONNX Runtime Web (WASM/WebGL) with no backend; OpenCV.js for box detection and perspective handling; pdf.js to OCR document pages; Bun for fast TS builds. Packaging the detection, classification and recognition models together reproduces a full OCR stack on the client.
Outcome / Impact
Proved that a PaddleOCR-grade pipeline can run client-side in the browser for document/ID text extraction, validating a privacy-preserving, low-latency OCR option for onboarding without sending images to a server.
Capabilities Demonstrated
- On-device (in-browser) OCR with no server round-trip
- Converting PaddleOCR/Paddle models to ONNX (paddle2onnx)
- Running ML inference in the browser via ONNX Runtime Web
- Computer-vision pre/post-processing with OpenCV.js
- Document and ID text extraction, including from PDFs