Passport MRZ Scanner & Parser (Tesseract.js)
Overview
A browser-based build that reads the Machine-Readable Zone (MRZ) of a passport from an uploaded photo using Tesseract.js, then parses the two MRZ lines into structured identity fields. Evaluated and adapted from the public MRZ-Scanner-JS project.
Why It Exists
Onboarding and KYC flows often need to read passport/ID data from a photo without server round-trips. This build validates that MRZ capture and parsing can run entirely client-side in the browser.
What We Built
A single-page app (index.html) that loads an image, brightens and rasterises it through a Canvas (brightness(140%)) to aid recognition, runs Tesseract.js OCR, and feeds the result to a hand-written mrz-parser.js. The parser implements the ICAO 9303 TD3 layout, slicing document type, issuing country, surname/given names, document number, nationality, date of birth, sex, expiry, personal number, and the associated check digits.
Technologies & Approach
Pure client-side JavaScript with Tesseract.js for OCR and the Canvas API for preprocessing; no backend. The MRZ parser encodes the fixed-position field layout and check-digit structure of machine-readable travel documents.
Outcome / Impact
Demonstrated a fully in-browser passport-reading flow, image preprocessing, OCR, and standards-based MRZ parsing, proving the approach for privacy-friendly, server-free identity capture. Positioned as evaluation/R&D adapted from a public project.
Capabilities Demonstrated
- In-browser OCR with Tesseract.js
- ICAO 9303 MRZ parsing and check-digit handling
- Canvas-based image preprocessing for recognition accuracy
- Privacy-friendly, server-free document capture