polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments #424

Majdoddin · 2025-10-08T09:48:08Z

Background (DI-engine pattern)

DI-engine already loads classes for collector and evaluator environments via ENV_REGISTRY (using get_vec_env_setting() in
DI-engine's base_env.py). However, the simulation environment class used by MCTS in AlphaZero policies was still loaded through
hardcoded imports.

Changes

This PR completes the registry-based approach by refactoring simulation environment class loading to use ENV_REGISTRY, matching the
pattern used by DI-engine for collector/evaluator environments.

Modified train_alphazero.py to pass full_cfg and create_cfg to policy config
Replaced hardcoded _get_simulation_env() in all 3 AlphaZero policies:
- lzero/policy/alphazero.py
- lzero/policy/gumbel_alphazero.py
- lzero/policy/sampled_alphazero.py

Benefits

Consistency: All environment types (collector, evaluator, simulation) now load their classes via the same registry
Extensibility: New environments can be added without modifying core LightZero code
Reduces code duplication: ~130 lines of hardcoded imports removed

Testing

Tested and verified with:

✓ Gomoku Gumbel AlphaZero
✓ TicTacToe AlphaZero
✓ TicTacToe Sampled AlphaZero

All tests passed successfully.

puyuan1996 · 2025-10-18T12:51:31Z

Thanks for this excellent contribution, @Majdoddin! This is a great refactoring that significantly improves consistency by aligning the simulation environment loading with the DI-engine pattern.

We have just one quick question before merging: could you confirm that the changes are fully compatible with both the 'play_with_bot' and 'self_play' modes? We noticed your tests on Gomoku and TicTacToe and just wanted to verify if both scenarios were covered.

Once that's confirmed, this looks good to go.

Thanks again for the great work!

- Modified train_alphazero entry to pass full_cfg and create_cfg to policy - Replaced hardcoded _get_simulation_env() with registry-based approach - Enables extensibility for new environments without modifying core code

- Updated alphazero.py, gumbel_alphazero.py, sampled_alphazero.py - All three policies now use ENV_REGISTRY instead of hardcoded imports - Simplifies adding new environments without modifying core code

Majdoddin · 2025-10-23T10:55:40Z

Thanks! I can confirm compatibility with both modes.
I've tested the registry-based changes with both play_with_bot and self_play modes across multiple games.

All tests passed (5/5):
Sampled AlphaZero (ptree - Python MCTS):

TicTacToe Sampled AlphaZero - Bot Mode ✓
TicTacToe Sampled AlphaZero - Self-Play Mode ✓
Gomoku Sampled AlphaZero - Bot Mode ✓

AlphaZero (ctree - C++ MCTS):

TicTacToe AlphaZero - Bot Mode ✓
TicTacToe AlphaZero - Self-Play Mode ✓

All tests ran for 90 seconds each to verify environment initialization, data collection, and training starts successfully.

Testing notes:

Tested only configurations that pass on upstream main (other AlphaZero ptree configs have pre-existing bugs on main)
Built and tested a ctree_alphazero to verify ctree compatibility
Rebased PR on latest main for compatibility with recent changes

See attached test output files for full details.
test_results_main_baseline.txt
test_results_pr424.txt

puyuan1996 · 2025-10-24T03:51:21Z

Thanks for this excellent contribution！

puyuan1996 added polish Polish algorithms, tests or configs config New or improved configuration labels Oct 18, 2025

puyuan1996 changed the title ~~Use ENV_REGISTRY for loading AlphaZero simulation environments~~ polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments Oct 19, 2025

Majdoddin added 2 commits October 23, 2025 10:24

Implement registry-based simulation env for gumbel_alphazero

95b4eaa

- Modified train_alphazero entry to pass full_cfg and create_cfg to policy - Replaced hardcoded _get_simulation_env() with registry-based approach - Enables extensibility for new environments without modifying core code

Apply registry-based simulation env to all AlphaZero policies

e869965

- Updated alphazero.py, gumbel_alphazero.py, sampled_alphazero.py - All three policies now use ENV_REGISTRY instead of hardcoded imports - Simplifies adding new environments without modifying core code

Majdoddin force-pushed the fix/alphazero-registry-based-simulation-env branch from f37e2df to e869965 Compare October 23, 2025 09:07

puyuan1996 merged commit 6bb8f5f into opendilab:main Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments #424

polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments #424

Uh oh!

Majdoddin commented Oct 8, 2025

Uh oh!

puyuan1996 commented Oct 18, 2025 •

edited

Loading

Uh oh!

Majdoddin commented Oct 23, 2025 •

edited

Loading

Uh oh!

puyuan1996 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments #424

polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments #424

Uh oh!

Conversation

Majdoddin commented Oct 8, 2025

Background (DI-engine pattern)

Changes

Benefits

Testing

Uh oh!

puyuan1996 commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Majdoddin commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puyuan1996 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

puyuan1996 commented Oct 18, 2025 •

edited

Loading

Majdoddin commented Oct 23, 2025 •

edited

Loading