Agent evaluation
Analyze an agent failure without blaming the model
Breaks down a failed agent run into task design, context, tools, instructions, interface, and evaluation gaps.
- failure analysis
- agent QA
- model behavior
Prompt
Review this failed agent run as a systems problem, not just a model problem. Separate the failure into: - User request ambiguity. - Missing or conflicting instructions. - Context retrieval gaps. - Tool schema or permission issues. - Interface feedback problems. - Model reasoning or reliability issues. - Missing evaluation coverage. For each cause, give evidence from the transcript, the likely impact, and a concrete fix. End with the smallest regression test that would catch this class of failure next time.