Research Platform

GridArena

Evaluate, compare, and audit LLM agents on power-system tasks. Run experiments with full provenance tracking, structured action parsing, and automated evaluation.

Experiment Runs

Launch and monitor agent evaluation runs

Explore

Presets

Reusable experiment configurations

Explore

Batches

Group runs for comparative analysis

Explore

Compare

Side-by-side run comparison

Explore