← All work
Product · 2025

GraphRAG Knowledge-Graph Indexing over Media Corpus

An influencer-marketing media-intelligence platform

Overview

An evaluation of Microsoft’s GraphRAG to build a knowledge graph from the platform’s media corpus, extracting entities and relationships with an LLM and enabling graph-based retrieval that goes beyond flat vector search.

Why It Exists

Influencer-marketing intelligence benefits from understanding entities (people, brands, topics) and how they connect across articles. Standard RAG retrieves passages; GraphRAG builds a structured graph that supports community summaries and multi-hop reasoning. This repo evaluated that approach.

What We Built

A GraphRAG 0.2.1 pipeline configured with input/, output/, cache/, prompts/ and a lancedb/ store. The dependency set (graphrag, LanceDB, dask/fastparquet, Azure Search/Identity/Blob) reflects the standard GraphRAG indexing stack adapted to the platform’s content. Framed as integration/evaluation rather than ground-up work.

Technologies & Approach

Microsoft GraphRAG drives entity/relationship extraction and graph construction; LanceDB stores embeddings; Parquet + Dask handle the intermediate data; Azure provides the LLM/search backends. Prompts customised for the media domain.

Outcome / Impact

Demonstrated how knowledge-graph RAG could enrich the platform’s search and analytics with entity relationships and topic communities, validating the technique against the real corpus.

Capabilities Demonstrated

  • Integrating and evaluating Microsoft GraphRAG
  • Building knowledge graphs from unstructured media text
  • Operating LanceDB and Parquet-based AI data pipelines
More work See all →