概览 (Overview)
Total Runs
${ runs.length }
Test Sets
${ testSets.length }
Avg Score (All Time)
${ calculateGlobalAvg() }
近期评测趋势
测试集管理
${ set.name }
${ set.description }
ID: ${ set.id } | Created: ${ formatDate(set.created_at) }
评测历史
| ID | Test Set | Model | Status | Score | Date | Action |
|---|---|---|---|---|---|---|
| #${ run.id } | ${ run.test_set_name } | ${ run.model_name } | ${ run.status } | ${ run.avg_score ? run.avg_score.toFixed(1) : '-' }/10 | ${ formatDate(run.created_at) } |
评测详情 #${ activeRun.run.id }
Summary
Model
${ activeRun.run.model_name }
Status
${ activeRun.run.status }
Avg Score
${ activeRun.run.avg_score.toFixed(1) }
Case Details
Score: ${ res.judge_score }/10
Latency: ${ res.latency_ms }ms
Prompt
${ res.prompt }
Expected Criteria
${ res.criteria }
Model Output
${ res.model_output }
Judge Reasoning
${ res.judge_reasoning }
Create Test Set
管理用例: ${ activeSet.name }
P: ${ c.prompt }
E: ${ c.expected_output }
C: ${ c.criteria }