/work

I help companies ship safer, more reliable agentic systems through targeted sprints and advisory.

Ways to work with me

LLM Safety & Reliability Sprint

2–3 weeks

We evaluate your current agentic system, create a custom evaluation harness, and define quality gates to prevent regression.

  • Custom eval dataset & metric definition
  • Automated regression suite
  • Risk report & mitigation plan

Agentic Risk Review

1 week

A deep-dive threat modeling session to identify failure modes in your agent's architecture and reasoning loops.

  • Threat model document
  • Failure mode taxonomy
  • Immediate fix recommendations

Build-with-you Advisory

Monthly

Ongoing architectural review and guidance on MLOps, evaluation strategies, and safety-critical implementation details.

  • Weekly architecture reviews
  • Code-level guidance
  • Hiring support for safety roles

Typical deliverables

Eval harness + regression suiteRisk model + failure taxonomyQuality gates + guardrail recommendations

Engagement snapshots

Context

FinTech — autonomous financial analysis agent

Work

Built custom eval harness + constrained-decoding guardrails

Result

Details on request

Context

Healthcare — RAG-based clinical guideline agent

Work

Safety gate design + multi-turn evaluation suite

Result

Details on request

Context

Developer tooling — code-generation agent pipeline

Work

Regression testing framework + failure taxonomy

Result

Details on request