GridArena Documentation

GridArena is a research platform for evaluating and auditing LLM agents on power-system operational tasks. It combines deterministic simulation, structured evaluation, and full provenance logging so experiments are reproducible end-to-end.

What you can do

  • Run a single LLM agent on a benchmark case (case5 / case14 / case30) and inspect every step.
  • Compose presets and execute large batch experiments with background queueing.
  • Evaluate recommendations against a deterministic DC powerflow solver (or external PyPSA service).
  • Compare runs side-by-side, export results to CSV / LaTeX / SVG, and validate the system itself.

Where to start

New: Counterfactual Analysis (Layer E)

GridArena now replays each agent decision against alternative actions to compute optimality gap and decision regret.

Citation

If you use GridArena in your research, please cite it. See the About page for APA and BibTeX entries.