Research Platform
GridArena
Evaluate, compare, and audit LLM agents on power-system tasks. Run experiments with full provenance tracking, structured action parsing, and automated evaluation.
Experiment Runs
Launch and monitor agent evaluation runs
Presets
Reusable experiment configurations
Batches
Group runs for comparative analysis
Compare
Side-by-side run comparison