Agent evaluation

Analyze an agent failure without blaming the model

Breaks down a failed agent run into task design, context, tools, instructions, interface, and evaluation gaps.

failure analysis
agent QA
model behavior

Prompt

Review this failed agent run as a systems problem, not just a model problem.

Separate the failure into:
- User request ambiguity.
- Missing or conflicting instructions.
- Context retrieval gaps.
- Tool schema or permission issues.
- Interface feedback problems.
- Model reasoning or reliability issues.
- Missing evaluation coverage.

For each cause, give evidence from the transcript, the likely impact, and a concrete fix. End with the smallest regression test that would catch this class of failure next time.