/work
I help companies ship safer, more reliable agentic systems through targeted sprints and advisory.
Ways to work with me
LLM Safety & Reliability Sprint
2–3 weeksWe evaluate your current agentic system, create a custom evaluation harness, and define quality gates to prevent regression.
- ✓Custom eval dataset & metric definition
- ✓Automated regression suite
- ✓Risk report & mitigation plan
Agentic Risk Review
1 weekA deep-dive threat modeling session to identify failure modes in your agent's architecture and reasoning loops.
- ✓Threat model document
- ✓Failure mode taxonomy
- ✓Immediate fix recommendations
Build-with-you Advisory
MonthlyOngoing architectural review and guidance on MLOps, evaluation strategies, and safety-critical implementation details.
- ✓Weekly architecture reviews
- ✓Code-level guidance
- ✓Hiring support for safety roles
Typical deliverables
Engagement snapshots
FinTech — autonomous financial analysis agent
Built custom eval harness + constrained-decoding guardrails
Details on request
Healthcare — RAG-based clinical guideline agent
Safety gate design + multi-turn evaluation suite
Details on request
Developer tooling — code-generation agent pipeline
Regression testing framework + failure taxonomy
Details on request