A superhuman generals.io bot, trained from scratch with self-play reinforcement learning.
“Its ability to flow army in complex situations is phenomenal.”
Average Joe is a bot for generals.io — a real-time, fog-of-war strategy game — that taught itself to play at a superhuman level, from zero, through millions of games against itself.
- 🏆 Superhuman, from scratch — trained purely by self-play; it never sees a human game.
- 🔥 Blazing-fast simulator — runs on generals-bots, a fully-vectorized JAX environment.
- 🔁 Fully reproducible — one config and one command reproduce the released agent end to end.
- 🛠️ Powered by JAX + Equinox — a small, pure-functional, JIT-compiled training loop.
In its first 1,000 ranked games on the generals.io 1v1 ladder, Average Joe won 81.5% and finished as the #1-rated player — ahead of the strongest human and well clear of the prior AI state of the art.
Average Joe competes on the generals.io 1v1 ladder — watch its live games and replays:
The board — plus a short history of each player's army and land — is encoded as tokens and run through a small transformer with two heads: one picks the move, the other estimates who is winning.
- Policy–value transformer — pre-norm self-attention over board + temporal tokens; emits per-cell move logits and a distributional (HL-Gauss) value. ·
networks/transformer.py - Self-play PPO — one network plays both sides; GAE, top-k advantage filtering, EMA weights for evaluation. ·
train/ppo.py
Requires Python ≥ 3.11 and a JAX build for your accelerator (CPU/GPU/TPU).
pip install -e .Average Joe runs on the generals-bots
environment (the generals.core.* package — the vectorized game, observations, and reward
functions), a separate, non-PyPI package. Install it from source and make it importable
before running.
python main.py --config configs/custom/L_7d_gae90.yamlL_7d_gae90 is the config behind the released agent. Checkpoints (a regular and an EMA copy)
are written to checkpoints/<run_name>/, alongside the exact config that produced them. Any
Config field can be overridden on the CLI, e.g. --num_envs 256. configs/ also holds
map-size presets (S / M / L / default).
python evals/eval.py # vs a random opponent (pygame)
python evals/eval_selfplay.py # the agent vs itselfTraining logs to Weights & Biases when a token is present at
.secrets/wandb_token.txt; otherwise it runs console-only.




