Skip to content

Conversation

@Majdoddin
Copy link
Contributor

Background (DI-engine pattern)

DI-engine already loads classes for collector and evaluator environments via ENV_REGISTRY (using get_vec_env_setting() in
DI-engine's base_env.py). However, the simulation environment class used by MCTS in AlphaZero policies was still loaded through
hardcoded imports.

Changes

This PR completes the registry-based approach by refactoring simulation environment class loading to use ENV_REGISTRY, matching the
pattern used by DI-engine for collector/evaluator environments.

  • Modified train_alphazero.py to pass full_cfg and create_cfg to policy config
  • Replaced hardcoded _get_simulation_env() in all 3 AlphaZero policies:
    • lzero/policy/alphazero.py
    • lzero/policy/gumbel_alphazero.py
    • lzero/policy/sampled_alphazero.py

Benefits

  • Consistency: All environment types (collector, evaluator, simulation) now load their classes via the same registry
  • Extensibility: New environments can be added without modifying core LightZero code
  • Reduces code duplication: ~130 lines of hardcoded imports removed

Testing

Tested and verified with:

  • ✓ Gomoku Gumbel AlphaZero
  • ✓ TicTacToe AlphaZero
  • ✓ TicTacToe Sampled AlphaZero

All tests passed successfully.

@puyuan1996 puyuan1996 added polish Polish algorithms, tests or configs config New or improved configuration labels Oct 18, 2025
@puyuan1996
Copy link
Collaborator

puyuan1996 commented Oct 18, 2025

Thanks for this excellent contribution, @Majdoddin! This is a great refactoring that significantly improves consistency by aligning the simulation environment loading with the DI-engine pattern.

We have just one quick question before merging: could you confirm that the changes are fully compatible with both the 'play_with_bot' and 'self_play' modes? We noticed your tests on Gomoku and TicTacToe and just wanted to verify if both scenarios were covered.

Once that's confirmed, this looks good to go.

Thanks again for the great work!

@puyuan1996 puyuan1996 changed the title Use ENV_REGISTRY for loading AlphaZero simulation environments polish(majdoddin): use ENV_REGISTRY for loading AlphaZero simulation environments Oct 19, 2025
- Modified train_alphazero entry to pass full_cfg and create_cfg to policy
- Replaced hardcoded _get_simulation_env() with registry-based approach
- Enables extensibility for new environments without modifying core code
- Updated alphazero.py, gumbel_alphazero.py, sampled_alphazero.py
- All three policies now use ENV_REGISTRY instead of hardcoded imports
- Simplifies adding new environments without modifying core code
@Majdoddin Majdoddin force-pushed the fix/alphazero-registry-based-simulation-env branch from f37e2df to e869965 Compare October 23, 2025 09:07
@Majdoddin
Copy link
Contributor Author

Majdoddin commented Oct 23, 2025

Thanks! I can confirm compatibility with both modes.
I've tested the registry-based changes with both play_with_bot and self_play modes across multiple games.

All tests passed (5/5):
Sampled AlphaZero (ptree - Python MCTS):

  • TicTacToe Sampled AlphaZero - Bot Mode ✓
  • TicTacToe Sampled AlphaZero - Self-Play Mode ✓
  • Gomoku Sampled AlphaZero - Bot Mode ✓

AlphaZero (ctree - C++ MCTS):

  • TicTacToe AlphaZero - Bot Mode ✓
  • TicTacToe AlphaZero - Self-Play Mode ✓

All tests ran for 90 seconds each to verify environment initialization, data collection, and training starts successfully.

Testing notes:

  • Tested only configurations that pass on upstream main (other AlphaZero ptree configs have pre-existing bugs on main)
  • Built and tested a ctree_alphazero to verify ctree compatibility
  • Rebased PR on latest main for compatibility with recent changes

See attached test output files for full details.
test_results_main_baseline.txt
test_results_pr424.txt

@puyuan1996
Copy link
Collaborator

Thanks for this excellent contribution!

@puyuan1996 puyuan1996 merged commit 6bb8f5f into opendilab:main Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config New or improved configuration polish Polish algorithms, tests or configs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants