Metadata-Repair Utility (Metascraper / Browserless)
An influencer-marketing media-intelligence platform
Overview
A small Node.js utility for repairing and backfilling article metadata across the platform’s corpus, using Metascraper and Browserless to re-extract titles, authors, dates, images and descriptions.
Why It Exists
Ingested articles sometimes have missing or wrong metadata. This tool re-fetches pages and re-extracts clean metadata to remediate data quality in the core store.
What We Built
A focused index.js script combining metascraper (with the title/author/date/image/description/url rules), html-get + browserless for robust headless-browser fetching, and url-metadata as a fallback extractor.
Technologies & Approach
Metascraper for rule-based extraction, Browserless for rendering JS-heavy pages headlessly, plus a secondary extractor for resilience, a pragmatic, single-purpose data-fix tool.
Outcome / Impact
Improved metadata completeness and accuracy across the platform’s content without touching the main ingestion code path.
Capabilities Demonstrated
- Robust web metadata extraction with Metascraper + headless browsers
- Targeted data-quality remediation scripts