Event-Driven Crawl & Processing Pipeline (Trigger.dev)
An influencer-marketing media-intelligence platform
Overview
The modern, event-driven processing pipeline for the platform, built on Trigger.dev v3 (self-hosted) with Firebase Functions. It orchestrates crawling, feed cleaning and downstream jobs as durable background tasks.
The Challenge
Ingestion and enrichment work is bursty, long-running and failure-prone. It needs a job system with retries, durability and observability rather than ad-hoc cron scripts, and one that can be self-hosted on the team’s own Kubernetes infrastructure.
What We Built
A functions/ codebase organised into jobs/ and trigger/ task definitions with a trigger.config.ts, deployed via trigger.dev deploy --self-hosted. Artefacts in the repo (cache_crawl_response.txt, debug_cleaned_feed.txt) show the crawl-and-clean flow that feeds the core engine. Firebase configuration ties jobs into the wider serverless layer.
Technologies & Approach
Trigger.dev v3 provides durable, retryable, observable background tasks; self-hosting keeps it on the platform’s own infra (see the infra repo). TypeScript across job definitions for type safety.
Outcome / Impact
Replaced fragile scripted ingestion with a managed, retry-capable, self-hosted pipeline, improving reliability and visibility of the crawl/clean/enrich workflow.
Capabilities Demonstrated
- Designing durable, event-driven background pipelines with Trigger.dev
- Self-hosting workflow orchestration on Kubernetes
- Web crawling and feed-cleaning automation at scale