Retrieval-Augmented Search over Media Content (LangChain + Meilisearch)
An influencer-marketing media-intelligence platform
Overview
A retrieval-augmented-generation build that puts natural-language Q&A and semantic search over the platform’s article and post corpus, using LangChain with Meilisearch as the vector store and OpenAI embeddings.
Why It Exists
The platform holds a huge corpus of titled news articles and social posts. Keyword search alone misses semantic intent. This R&D explored RAG-style retrieval, embedding posts and answering queries grounded in retrieved content.
What We Built
Two Python scripts (index.py, search.py) wiring LangChain’s Meilisearch vector store to OpenAI text-embedding-3-small embeddings (1536-dim), with a MultiQueryRetriever and ChatOpenAI for answer synthesis. Documents are templated from each post’s title and description, indexed in a hosted Meilisearch instance.
Technologies & Approach
LangChain for retrieval orchestration, Meilisearch for fast hybrid/vector search, OpenAI for embeddings and generation. A lightweight script-based build rather than a service.
Outcome / Impact
Validated semantic retrieval and RAG over the existing Meilisearch-indexed corpus, informing how AI search could be layered onto the core engine. (Note: build contained hard-coded credentials, flagged for rotation before any production use.)
Capabilities Demonstrated
- Building RAG pipelines with LangChain
- Using Meilisearch as a vector store with OpenAI embeddings
- Multi-query retrieval for improved semantic recall