
Teach how to ground LLM responses in external documents using embeddings and retrieval to avoid hallucination and provide evidence-backed answers. Covers chunking, vector similarity, reranking, and contamination risks.
Embeddings explained: vector space intuition and similarity metrics.
Chunking strategies and metadata tagging for source attribution.
Building a simple retrieval pipeline (local or hosted vector DB conceptually).
Prompt composition patterns that include retrieved context while controlling token budgets.
Reranking and filtering to remove low-relevance passages.
Attribution & citation techniques to surface source excerpts.
Contamination risks and how to avoid leaking test data into context.
Activities
Build a demo Q&A over a curated document set (e.g., course handbook + policies). Deliver a short video demo showing retrieval steps and why answers improved.
📦 Deliverable
Code/recipe + sample queries, retrieved contexts, and correctness comparison vs baseline LLM-only responses.
Embedding libraries (conceptual), sample notebooks for vector search, reading on RAG best practices.
Modules 1–3 recommended.
Students learn to make the AI cite real materials — increasing trust and usefulness for parent/staff-facing tools.
APPLY TODAY FOR THE 2025/2026 ACADEMIC SESSION.