/work

I help companies ship safer, more reliable agentic systems through targeted sprints and advisory.

Ways to work with me

2–3 weeks

We evaluate your current agentic system, create a custom evaluation harness, and define quality gates to prevent regression.

1 week

A deep-dive threat modeling session to identify failure modes in your agent's architecture and reasoning loops.

Monthly

Ongoing architectural review and guidance on MLOps, evaluation strategies, and safety-critical implementation details.

Eval harness + regression suiteRisk model + failure taxonomyQuality gates + guardrail recommendations

Context

FinTech — autonomous financial analysis agent

Work

Built custom eval harness + constrained-decoding guardrails

Result

Details on request

Context

Healthcare — RAG-based clinical guideline agent

Work

Safety gate design + multi-turn evaluation suite

Result

Details on request

Context

Developer tooling — code-generation agent pipeline

Work

Regression testing framework + failure taxonomy

Result

Details on request