SkillEngenhariaDiagnostica

Eval Audit

Audit an inherited or unfamiliar LLM eval pipeline and produce a prioritized list of problems with concrete fixes.

Ações
PerfilDev
ProfundidadeAlta
Idiomaen-US
Objetivo

Em uma frase.

Use this skill when inheriting an LLM eval system, when unsure whether existing evals are trustworthy, or as a starting point when no eval infrastructure exists. The audit walks six diagnostic areas - error analysis, evaluator design, judge validation, human review process, labeled data, pipeline hygiene - and produces a findings report ordered by impact.

Constelação

Onde
ela vive.