SkillEngenhariaDiagnostica
Eval Audit
Audit an inherited or unfamiliar LLM eval pipeline and produce a prioritized list of problems with concrete fixes.
Ações
PerfilDev
ProfundidadeAlta
Idiomaen-US
Objetivo
Em uma frase.
Use this skill when inheriting an LLM eval system, when unsure whether existing evals are trustworthy, or as a starting point when no eval infrastructure exists. The audit walks six diagnostic areas - error analysis, evaluator design, judge validation, human review process, labeled data, pipeline hygiene - and produces a findings report ordered by impact.
Constelação
Onde
ela vive.
Workflows que usam
Bundles que incluem