OpenAI Fine-Tuning Pipeline for Workflow Data
Overview
A small Node.js + Python pipeline that turns dumped n8n workflow data into a fine-tuned OpenAI model. It covers the full path from raw dumps to a ready-to-train JSONL dataset and a launched fine-tuning job.
Why It Exists
To explore whether a model fine-tuned on real workflow examples could assist with generating or reasoning about n8n automations. The repo packages the data wrangling and training-job orchestration needed to validate that idea quickly.
What We Built
A staged set of scripts: dump.js and process_dumps.py ingest and normalize raw exports from a dump/ directory, enrich.js augments records, prepareFinetune.js assembles them into the finetune.jsonl training file, and finetune.js uploads the file and creates an OpenAI fine-tuning job targeting gpt-4o-2024-08-06. Configuration is handled via dotenv and an .env.example.
Technologies & Approach
Node.js (ESM) with the official openai SDK for upload and job creation, plus a Python preprocessing step for the heavier dump parsing. Training data is shaped into OpenAI’s JSONL chat format. The two-language split keeps data wrangling in Python while job orchestration stays in JS.
Outcome / Impact
A working end-to-end fine-tuning proof: from raw workflow dumps to a submitted training job. Validates the studio’s ability to stand up custom-model pipelines and prepare domain-specific training data.
Capabilities Demonstrated
- End-to-end LLM fine-tuning pipelines (prep, upload, job creation)
- Training-data extraction, enrichment, and JSONL formatting
- OpenAI API automation across Node.js and Python