Usage

1. Sign in

GridArena uses email + password authentication. Visit /login, create an account, and you'll land on the runs index. All data is RLS-scoped so each researcher only sees their own runs, batches, and presets.

2. Try the demo dataset

The fastest way to explore GridArena is to load the bundled demo dataset. It seeds three presets (case5 / case14 / case30) and one example completed run per preset, all marked with the [Demo] prefix so you can identify and delete them later.

Sign in to load the demo dataset.

3. Create a single run

  1. Open /new-run.
  2. Pick a benchmark case (case5 / case14 / case30) and a model.
  3. Write a task prompt or pick one from your presets.
  4. Submit — the run appears in /runs and updates as it progresses.

4. Manage presets

Presets capture a complete experiment configuration — prompt, model, evaluation mode, sampling parameters, seed. Visit /presets to create, edit, and reuse them across runs and batches.

5. Run batches

  1. Open /batches/new and pick the presets to include.
  2. Submit — runs are enqueued in the background queue (see /system-status).
  3. Watch the batch detail page; rows transition queued → running → completed.
  4. Once complete, generate a batch report and export to CSV or LaTeX.

6. Compare and export

Use /compare to view multiple runs side-by-side, and any run or batch detail page to export results, prompts, and provenance to CSV / LaTeX / SVG.

7. Manage and clean up

Every dashboard row (Runs, Presets, Batches, Ground Truth, Validation) has inline Edit and Delete actions so you can keep your workspace tidy without leaving the list view.

  • Soft-delete with undo: deleting a row shows a 5-second toast with an Undo button. The row is hidden immediately and only permanently removed once the toast expires — accidental deletes are easy to recover.
  • Bulk multi-select: on /runs, /presets, and /batches, tick the row checkboxes (or the header checkbox to select all) and a sticky action bar appears at the bottom for one-click bulk delete.
  • Editing: use Edit to rename, retitle, or adjust notes in place — useful when reorganizing experiments before generating a report.