GridArena Documentation
GridArena is a research platform for evaluating and auditing LLM agents on power-system operational tasks. It combines deterministic simulation, structured evaluation, and full provenance logging so experiments are reproducible end-to-end.
What you can do
- Run a single LLM agent on a benchmark case (case5 / case14 / case30) and inspect every step.
- Compose presets and execute large batch experiments with background queueing.
- Evaluate recommendations against a deterministic DC powerflow solver (or external PyPSA service).
- Compare runs side-by-side, export results to CSV / LaTeX / SVG, and validate the system itself.
Where to start
- LLM Tools Landscape (2025–2026) — survey of 24 LLM tools for power-system operations, comparison-at-a-glance table, and how GridArena fits in.
- Installation — how the platform is wired and what to configure.
- Usage — sign in, create runs, manage presets and batches.
- Experiment Workflow — the full lifecycle from prompt to evaluation.
- Architecture — system diagram and component responsibilities.
- Reproducibility — three reproducible experiments with expected metrics.
- Troubleshooting — common errors and fixes.
- (Admin) Simulation Health — run engine self-tests across IEEE case5/14/30 and review failure history.
New: Counterfactual Analysis (Layer E)
GridArena now replays each agent decision against alternative actions to compute optimality gap and decision regret.
- Workflow → Counterfactual analysis — formulas and methodology.
- Architecture → Counterfactual Engine — where it sits in the pipeline.
Citation
If you use GridArena in your research, please cite it. See the About page for APA and BibTeX entries.