Tooling · 2025

Anti-Bot Screenshot & Content Scraper

Overview

A Python scraper that captures full-page screenshots and text from sites that actively defend against automation. It uses stealth browser tooling and a virtual-display setup to render and extract content from protected pages, including social platforms and news sites.

Why It Exists

Many high-value sources sit behind bot-detection, Cloudflare-style challenges and CAPTCHAs that defeat naive scrapers. This tool exists to reliably reach and capture those pages, both as visual evidence (screenshots) and as extracted text.

What We Built

Two complementary entry points: run.py drives the nodriver undetected Chrome library to navigate and save screenshots, while s.py uses SeleniumBase in undetected (uc=True) CDP mode with activate_cdp_mode and uc_gui_click_captcha() to bypass challenges, running inside a PyAutoGUI + Xvfb virtual display (via Xlib) so GUI-level CAPTCHA clicks work headlessly. A Dockerfile builds a full Ubuntu image with Google Chrome, fonts, Xvfb and SeleniumBase for reproducible headless runs. Captured artefacts (timestamped screenshots, downloaded files, logs) are written out per target.

Technologies & Approach

SeleniumBase UC/CDP mode and nodriver for stealth, detection-resistant browsing; PyAutoGUI + Xvfb + Xlib to perform real GUI interactions (CAPTCHA clicks) without a physical display; Docker to package Chrome and all native dependencies for consistent execution. The dual-engine approach hedges against any single anti-bot technique.

Outcome / Impact

A working capability for extracting content and screenshots from bot-protected and CAPTCHA-gated pages, a building block for data-collection pipelines where standard scrapers fail.

Capabilities Demonstrated

Stealth / anti-detection web scraping at scale
CAPTCHA and challenge handling via CDP + GUI automation
Headless rendering with virtual displays (Xvfb)
Dockerised, reproducible browser-automation environments

More work See all →

Product 2026