← All work
Infrastructure · 2025

Versioned PostgreSQL Schema & Migration Management

A media-monitoring / data-orchestration platform

Overview

The single source of truth for the platform’s PostgreSQL schema: a Flyway-managed migration repository paired with maintained per-table documentation. It defines the core data model shared by the ingestion, matching, and orchestration services.

Why It Exists

A data platform with many services touching the same database needs disciplined, versioned schema evolution and clear, current documentation. Centralizing migrations and table docs prevents drift, makes changes reviewable, and lets every service depend on one canonical model.

What We Built

A Flyway project (configured via flyway.conf and Docker Compose against a dedicated PostgreSQL database) holding versioned SQL migrations and callbacks under sql/, alongside a docs/ tree documenting each table. The documented model captures the platform’s core entities: hits (news articles/content), hit_contents (article bodies stored separately), hit_keyword_positions (term positions within content), queries (search queries with deterministic UUIDs), the hit_queries junction, and the subscription model (subscriptions, subscription_headers, subscription_header_queries). Conventions for timestamps and identifiers are documented so contributors apply changes consistently.

Technologies & Approach

Flyway for forward-only, versioned migrations run via a Docker Compose service against PostgreSQL. Markdown documentation lives next to the migrations so schema and docs evolve together. The deterministic-UUID approach for queries enables stable cross-system references, important for deduplication and the matching pipeline.

Outcome / Impact

Gives the whole platform a reliable, reviewable schema-change workflow and human-readable documentation of the data model, reducing the risk of drift across the ingestion, orchestration, and matching services.

Capabilities Demonstrated

  • Versioned, forward-only schema management with Flyway
  • Treating database schema as code with reproducible Docker-based runs
  • Maintaining living, per-table schema documentation
  • Modeling content, keyword-position, query, and subscription domains in PostgreSQL
More work See all →