Healthcare Document RAG Chatbot (LangChain + Pinecone)
A US health-insurance provider
Overview
An early build RAG chatbot for a US health-insurance provider, answering questions over health-insurance and medical-insurance documents. Built on the platform’s gpt4-langchain-pdf-chatbot foundation, it grounds GPT-4 answers in ingested insurance PDFs via Pinecone.
Why It Exists
Health-insurance members and staff struggle to find answers buried in dense plan and contract documents. This build validated whether a document-grounded chatbot could answer plan questions accurately enough to be worth pursuing for the healthcare vertical.
What We Built
A Next.js chatbot (docs/medical-insurance source set) using LangChain for the RAG pipeline: an ingestion script chunks and embeds PDFs (pdf-parse) into a Pinecone index (@pinecone-database/pinecone), and the chat route streams GPT-4 answers grounded in retrieved passages (@microsoft/fetch-event-source, react-markdown, remark-gfm). The UI is a lightweight Next.js + Tailwind + Radix surface. The commit history is short (a few days in May 2023), consistent with a focused vertical evaluation rather than a long-lived build.
Technologies & Approach
LangChain + Pinecone + GPT-4 over Next.js, the same proven RAG-over-PDF pattern the platform applied across enterprise verticals, here pointed at healthcare insurance content with streaming answers for responsiveness.
Outcome / Impact
Validated document-grounded Q&A over insurance material for the healthcare vertical and exercised the reusable PDF-to-Pinecone RAG pipeline. Scoped as a short evaluation, it fed into the platform’s broader, productized chatbot stack.
Capabilities Demonstrated
- RAG over regulated healthcare/insurance documents
- LangChain + Pinecone ingestion and retrieval
- Streaming GPT-4 chat with Markdown rendering
- Rapid vertical build delivery