Face & Gesture-Driven Conversation-Starter Display
A conversational / social-engagement platform
Overview
An interactive event installation that recognizes attendees on camera, watches for a hand-raise gesture, and uses an LLM to generate a tailored conversation starter for the people in front of it, then displays and shares it. It blends computer vision, gesture tracking, and agentic content generation into a live social-engagement experience for in-person events.
The Challenge
Networking events are full of people who want to talk but lack an opener. The goal was a context-aware “spot” that could identify who is standing in front of it, sense a deliberate trigger gesture, and instantly produce a relevant, person-specific conversation prompt to break the ice, running reliably on event hardware.
What We Built
A Python application orchestrated from src/main.py that wires together several modules: faces.py (face recognition over a known-attendee encodings set), hands.py (MediaPipe hand-gesture detection with a configurable min-hold time and multi-hand support), agent.py (LLM generation of conversation topics and short articles, with multiple prompt “styles” and strict JSON output), display.py (a FastAPI + WebSocket live display server), ingest.py, and db.py (SQLAlchemy persistence). Configuration is centralized in config.yaml, CNN vs HOG recognition models, match tolerance, gesture confidence thresholds, the Claude model id, and per-event context. The system deploys to Modal for GPU-backed serverless inference (modal_app.py, with model/photo/encoding volumes), and a companion Cloudflare Worker (worker/, D1 + Wrangler) handles sharing. A bundled facial-recognition-build and test suite round it out.
Technologies & Approach
Python with face_recognition and OpenCV for identity, MediaPipe for gesture sensing, and the Claude Agent SDK (Claude Sonnet 4.5) for content generation. FastAPI + WebSockets drive the real-time display; Modal provides GPU serverless deployment; SQLAlchemy persists data; and a Cloudflare Workers/D1 sharing layer extends it to the web. Configuration-first design keeps recognition and gesture behavior tunable per venue.
Outcome / Impact
As a working build it proved an end-to-end real-time pipeline, camera in, recognition and gesture detection, LLM content out, live display and share, was feasible on serverless GPU infrastructure for live events, combining CV, gesture UX, and agentic generation in one installation.
Capabilities Demonstrated
- Real-time facial recognition with configurable CNN/HOG models and tolerances
- Hand-gesture interaction via MediaPipe as a deliberate trigger UX
- Agentic LLM content generation with structured JSON output and prompt-style variety
- FastAPI + WebSocket live display for an interactive installation
- GPU-serverless deployment on Modal with versioned model/data volumes
- Cloudflare Workers/D1 sharing layer for the engagement loop