Product · 2025

Event-Driven Crawl & Processing Pipeline (Trigger.dev)

An influencer-marketing media-intelligence platform

Overview

The modern, event-driven processing pipeline for the platform, built on Trigger.dev v3 (self-hosted) with Firebase Functions. It orchestrates crawling, feed cleaning and downstream jobs as durable background tasks.

The Challenge

Ingestion and enrichment work is bursty, long-running and failure-prone. It needs a job system with retries, durability and observability rather than ad-hoc cron scripts, and one that can be self-hosted on the team’s own Kubernetes infrastructure.

What We Built

A functions/ codebase organised into jobs/ and trigger/ task definitions with a trigger.config.ts, deployed via trigger.dev deploy --self-hosted. Artefacts in the repo (cache_crawl_response.txt, debug_cleaned_feed.txt) show the crawl-and-clean flow that feeds the core engine. Firebase configuration ties jobs into the wider serverless layer.

Technologies & Approach

Trigger.dev v3 provides durable, retryable, observable background tasks; self-hosting keeps it on the platform’s own infra (see the infra repo). TypeScript across job definitions for type safety.

Outcome / Impact

Replaced fragile scripted ingestion with a managed, retry-capable, self-hosted pipeline, improving reliability and visibility of the crawl/clean/enrich workflow.

Capabilities Demonstrated

Designing durable, event-driven background pipelines with Trigger.dev
Self-hosting workflow orchestration on Kubernetes
Web crawling and feed-cleaning automation at scale

More work See all →

Product 2026