Eval Matrix Agent - LLM 评测平台

概览 (Overview)

Total Runs

${ runs.length }

Test Sets

${ testSets.length }

Avg Score (All Time)

${ calculateGlobalAvg() }

${ set.description }

ID: ${ set.id } | Created: ${ formatDate(set.created_at) }

ID	Test Set	Model	Status	Score	Date	Action
#${ run.id }	${ run.test_set_name }	${ run.model_name }	${ run.status }	${ run.avg_score ? run.avg_score.toFixed(1) : '-' }/10	${ formatDate(run.created_at) }

Model ${ activeRun.run.model_name }

Status ${ activeRun.run.status }

Avg Score ${ activeRun.run.avg_score.toFixed(1) }

Score: ${ res.judge_score }/10 Latency: ${ res.latency_ms }ms

Prompt

${ res.prompt }

Expected Criteria

${ res.criteria }

Model Output

${ res.model_output }

Judge Reasoning

${ res.judge_reasoning }

P: ${ c.prompt }

E: ${ c.expected_output }

C: ${ c.criteria }