
Safety engineering for chatbots: content moderation, escalation to humans, privacy-preserving logging, and transparent refusals. Emphasis on policy + technical controls.
Moderation strategies and classifier-based pre-filters.
Escalation thresholds and human review queue design.
Privacy-preserving logging and redaction patterns.
Tone and wording for refusals and safe messaging.
Testing moderation with adversarial examples.
Activities
Add a moderation layer to a demo chatbot and simulate adversarial inputs; produce a moderation effectiveness report.
📦 Deliverable
Moderation report and sample logs with redaction.
Moderation API examples (conceptual) and policy templates.
Modules 1–5.
Ensures chatbots remain safe and escalate appropriately — a key parental concern.
APPLY TODAY FOR THE 2025/2026 ACADEMIC SESSION.