MITM HTTP Proxy with Data Extraction
Overview
A Node.js man-in-the-middle HTTP proxy that intercepts traffic, rewrites selected responses on the fly, and scans request/response data for patterns of interest (emails and common personal-data field names), logging findings to a graph database, network-security research.
Why It Exists
Built to explore how an intercepting proxy can transparently rewrite responses and mine passing traffic for structured data, as a hands-on study of MITM techniques and inline content manipulation.
What We Built
A proxy on top of the http-mitm-proxy library that: hooks onRequest to conditionally rewrite a target script request to a different upstream host/path; applies a regex to detect email addresses and watches for a keyword list of sensitive field names (username, email, phone, IMEI, address, operator, etc.); and connects to an OrientDB instance to persist captured records. A periodic timer flushes in-memory history.
Technologies & Approach
http-mitm-proxy for TLS-capable interception in Node; regular expressions and keyword matching for lightweight extraction; OrientDB (graph/document) as the capture store. The codebase is compact and clearly exploratory.
Outcome / Impact
A working proof of intercept-rewrite-and-extract over live HTTP, demonstrating practical understanding of proxy internals and data-capture techniques. Archived R&D from the same networking builds family.
Capabilities Demonstrated
- TLS-capable MITM proxying in Node.js
- Inline response rewriting and upstream redirection
- Regex/keyword-based extraction from intercepted traffic
- Persisting capture data to a graph/document database