← All work
Infrastructure · 2026

Self-Hosted LLM Gateway on Hetzner with Cloudflare Tunnel (IaC)

Overview

A Terraform-driven deployment that stands up a self-hosted LiteLLM gateway, a single OpenAI-compatible endpoint in front of many model providers, on Hetzner Cloud, fronted by a Cloudflare Tunnel and wired to Langfuse for observability and cost tracking.

Why It Exists

Routing all LLM traffic through one self-operated gateway gives centralized key management, provider failover, and per-call cost/usage telemetry without exposing the origin server directly to the internet. This repo is the infrastructure-as-code that makes that gateway reproducible.

What We Built

A complete IaC stack: main.tf provisions the Hetzner server (hcloud provider) with TLS and random providers, fetches Cloudflare’s published IP ranges over the http provider to lock down firewalling, and bootstraps the host via cloud-init.yaml. A tunnel/ directory and ed25519 keys configure the Cloudflare Tunnel so the proxy is reachable over a managed hostname rather than a public port. LiteLLM config enables a DB-backed model store with Langfuse success/failure callbacks, and a custom image_cost_callback.py adds image-generation cost accounting. A Postgres dump captures the model/config state.

Technologies & Approach

Terraform (HCL) with the hcloud, tls, random, and http providers; cloud-init for host provisioning; Cloudflare Tunnel for zero-exposed-port ingress; LiteLLM as the gateway with Langfuse callbacks; Python for the custom cost callback. State and a DB dump are versioned for reproducibility.

Outcome / Impact

Delivered a reproducible, secured, observable self-hosted LLM gateway, centralizing provider access, cost tracking, and key management behind one endpoint. A working internal infrastructure deliverable.

Capabilities Demonstrated

  • Infrastructure-as-code provisioning on Hetzner Cloud with Terraform
  • Zero-exposed-port ingress via Cloudflare Tunnel
  • Operating a self-hosted multi-provider LLM gateway (LiteLLM)
  • LLM cost/usage observability with Langfuse and custom callbacks
More work See all →