/research

I build diagnostics and guardrails for reasoning in LLMs and agents.

Focus: math reasoning limits, reasoning reliability, and lightweight verification.

Research pipeline

Core themes

Reasoning Validation

Can we automatically verify the logical steps in an agent's Chain of Thought?

Read (soon)Artifacts (soon)Talk

Lightweight Formal Methods

Applying lightweight formal methods to probabilistic models to guarantee certain invariants.

Read (soon)Artifacts (soon)Talk

Token-level Interpretability

Analyzing activation patterns to predict hallucinations or reasoning failures.

Read (soon)Artifacts (soon)Talk

Collaboration

I am looking for co-authors, pilot partners, and labs interested in rigorous evaluation of agentic reasoning.

Looking for

Co-authors, research labs, and pilot partners with deployed agent stacks

I bring

Benchmarks, telemetry tooling, and validation-layer prototypes

Ideal collaboration

Run eval suite on your agent stack; co-author paper on findings

Publications & artifacts

Preprints / Drafts

draftLLMs-as-Search: token-level framingPDF on request
draftFinite-Space Constraints (FSC): diagnostics + stress testsPrototype repo coming

Artifacts (tools / repos)

NjiraAI validation layer: agentic safety infraEarly access — brief on request
Agentic Eval Harness + telemetry loggerRepo coming soon