Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
67a0e9a
feature(pu): add unizero multitask balance pipeline for atari and dmc
Apr 29, 2025
f083096
fix(pu): fix some adaptation bug
Apr 29, 2025
37eb118
feature(pu): add vit encoder for unizero
Apr 29, 2025
f32d63e
polish(pu): polish moe layer in transformer
May 1, 2025
c0aa747
feature(pu): add eval norm mean/medium for atari
May 5, 2025
8b3cff6
fix(pu): fix atari norm mean/median, fix collect in balance pipeline
May 7, 2025
f2c158b
polish(pu): polish config
May 7, 2025
20b42f7
fix(pu): fix dmc multitask to be compatiable with timestep (which is …
May 7, 2025
39ee55e
polish(pu): polish config
May 13, 2025
e85c449
fix(pu): fix task_id bug in balance pipeline, and polish benchmark_na…
May 14, 2025
c16d564
fix(pu): fix benchmark_name option
May 14, 2025
474b81c
polish(pu): fix norm score computation, adapt config to aliyun
May 21, 2025
50e367e
polish(pu): polish unizero_mt balance pipeline use CurriculumControll…
May 23, 2025
9171c3e
tmp
May 30, 2025
bc5003a
Merge branch 'dev-multitask-balance-clean' of https://github.com/open…
May 30, 2025
158e4a0
tmp
Jun 1, 2025
067a1ae
feature(xjy): Enhance text-based games like Jericho with text decodin…
puyuan1996 Jun 3, 2025
d66b986
tmp
Jun 4, 2025
0d5ede0
test(pu): add vit moe test
Jun 5, 2025
ca6ddb6
polish(pu): add adapter_scales to tb
Jun 11, 2025
7dd6c04
feature(pu): add atari uz balance config
Jun 12, 2025
c8e7cb8
polish(pu): add stable_adaptor_scale
Jun 19, 2025
0313335
tmp
Jun 23, 2025
36fd720
fix(fir): fix timestep and non-text-based games compatibility for muz…
Firerozes Jun 23, 2025
9e4cb99
fix(pu): fix dtype bug in sez buffer
Jun 23, 2025
ef170fd
sync code
Jun 25, 2025
a5c1343
fix(pu): fix timestep and reward-type compatibility (#380)
puyuan1996 Jun 29, 2025
8aaac01
fix(fir): fix compatibility of stochastic muzero in collector/evaluat…
Firerozes Jul 1, 2025
527d355
polish(fir): polish ensure_softmax function (#389)
puyuan1996 Jul 21, 2025
2a66cfd
feature(fir): enable independent configuration for reward/value cate…
Firerozes Jul 23, 2025
3148c7e
fix(fir): fix timestep compatibility in muzero_evaluator.py (#386)
Firerozes Jul 23, 2025
005cea1
fix(fir): fix probabilities visualization (#393)
Firerozes Jul 27, 2025
c2eb518
polish(fir): polish softmax (#394)
Firerozes Jul 27, 2025
bbec353
polish(pu): use freeze_non_lora_parameters in transformer, not use Le…
zjowowen Jul 30, 2025
20648d5
feature(pu): add vit-encoder lora in balance pipeline
zjowowen Jul 30, 2025
db6032a
polish(pu): fix reanalyze index bug, fix global_solved bug, add apply…
Aug 5, 2025
f63b544
polish(pu): add collect/eval_num_simulations option
Aug 5, 2025
5c412bb
feature(xjy): add encoder_decoder_type option for jericho's world mod…
xiongjyu Aug 27, 2025
90e44a6
fix(pu): fix pad dtype bug (#412)
puyuan1996 Sep 6, 2025
5069425
fix(pu): fix pos_in_game_segment bug in buffer (#414)
puyuan1996 Sep 10, 2025
da2da95
fix(pu): fix muzero_evaluator compatibility when n_evaluator_episode>…
puyuan1996 Sep 10, 2025
da2a62f
adaptively set the config of batchsize and accumulation_steps in Jeri…
xiongjyu Sep 18, 2025
bbbe505
polish(pu): polish comments and style in entry of scalezero
puyuan1996 Sep 28, 2025
bf9f965
polish(pu): polish comments and style of ctree/tree_search/buffer/com…
puyuan1996 Sep 28, 2025
fb04c7a
polish(pu): polish comments and style of files in lzero.model
puyuan1996 Sep 28, 2025
06148e7
polish(pu): polish comments and style of files in lzero.model.unizero…
puyuan1996 Sep 28, 2025
471ae6a
polish(pu): polish comments and style of unizero_world_models
puyuan1996 Sep 28, 2025
07933a5
polish(pu): polish comments and style of files in policy/
puyuan1996 Sep 28, 2025
df3b644
polish(pu): polish comments and style of files in worker
puyuan1996 Sep 28, 2025
4f89dcc
polish(pu): polish comments and style of files in configs
puyuan1996 Sep 28, 2025
e7a8796
Merge remote-tracking branch 'origin/main' into dev-multitask-balance…
puyuan1996 Sep 28, 2025
ab746d1
fix(pu): fix some merge typo
tAnGjIa520 Sep 28, 2025
0476aca
fix(pu): fix ln norm_type, fix kv_cache rewrite bug, add value_priori…
tAnGjIa520 Sep 28, 2025
2c0a965
fix(pu): fix unizero_mt
tAnGjIa520 Sep 28, 2025
84e6094
polish(pu): add LN in head, polish init_weight, polish adamw
tAnGjIa520 Sep 29, 2025
05da638
fix(pu): fix configure_optimizer_unizero in unizero_mt
tAnGjIa520 Oct 2, 2025
06ad080
feature(pu): add encoder-clip, label smooth, analyze_latent_represent…
tAnGjIa520 Oct 9, 2025
9f69f5a
feature(pu): add encoder-clip, label smooth option in unizero_multit…
tAnGjIa520 Oct 9, 2025
af99278
fix(pu): fix tb log when gpu_num<task_num, fix total_loss += bug, polish
tAnGjIa520 Oct 9, 2025
bf91ca2
polish(pu):polish config
tAnGjIa520 Oct 9, 2025
b18f892
fix(pu): fix encoder-clip bug and num_channel/res bug
tAnGjIa520 Oct 11, 2025
bf3cd12
polish(pu): polish scale_factor in DPS
tAnGjIa520 Oct 12, 2025
b1efa60
tmp
tAnGjIa520 Oct 18, 2025
c2f9817
feature(pu): add some analysis metrics in tensorboard for unizero and…
tAnGjIa520 Oct 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)

Updated on 2025.04.09 LightZero-v0.2.0
Updated on 2025.06.03 LightZero-v0.2.0

English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)

Expand Down
2 changes: 1 addition & 1 deletion README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)

最近更新于 2025.04.09 LightZero-v0.2.0
最近更新于 2025.06.03 LightZero-v0.2.0

[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [文档](https://opendilab.github.io/LightZero) | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)

Expand Down
7 changes: 4 additions & 3 deletions docs/source/tutorials/algos/customize_algos.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,16 +119,17 @@ Here is an example of unit testing in LightZero. In this example, we test the `i
```Python
import pytest
import torch
from lzero.policy.scaling_transform import inverse_scalar_transform, InverseScalarTransform
from lzero.policy.scaling_transform import DiscreteSupport, inverse_scalar_transform, InverseScalarTransform

@pytest.mark.unittest
def test_scaling_transform():
import time
logit = torch.randn(16, 601)
discrete_support = DiscreteSupport(-300., 301., 1.)
start = time.time()
output_1 = inverse_scalar_transform(logit, 300)
output_1 = inverse_scalar_transform(logit, discrete_support)
print('t1', time.time() - start)
handle = InverseScalarTransform(300)
handle = InverseScalarTransform(discrete_support)
start = time.time()
output_2 = handle(logit)
print('t2', time.time() - start)
Expand Down
7 changes: 4 additions & 3 deletions docs/source/tutorials/algos/customize_algos_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,16 +120,17 @@ if timestep.done:
```Python
import pytest
import torch
from lzero.policy.scaling_transform import inverse_scalar_transform, InverseScalarTransform
from lzero.policy.scaling_transform import DiscreteSupport, inverse_scalar_transform, InverseScalarTransform

@pytest.mark.unittest
def test_scaling_transform():
import time
logit = torch.randn(16, 601)
discrete_support = DiscreteSupport(-300., 301., 1.)
start = time.time()
output_1 = inverse_scalar_transform(logit, 300)
output_1 = inverse_scalar_transform(logit, discrete_support)
print('t1', time.time() - start)
handle = InverseScalarTransform(300)
handle = InverseScalarTransform(discrete_support)
start = time.time()
output_2 = handle(logit)
print('t2', time.time() - start)
Expand Down
3 changes: 2 additions & 1 deletion docs/source/tutorials/config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ The `main_config` dictionary contains the main parameter settings for running th
- `downsample`: Whether to downsample the input.
- `norm_type`: The type of normalization used.
- `num_channels`: The number of channels in the convolutional layers (number of features extracted).
- `support_scale`: The range of the value support set (`-support_scale` to `support_scale`).
- `reward_support_range`: The range of the reward support set (`(start, stop, step)`).
- `value_support_range`: The range of the value support set (`(start, stop, step)`).
- `bias`: Whether to use bias terms in the layers.
- `discrete_action_encoding_type`: How discrete actions are encoded.
- `self_supervised_learning_loss`: Whether to use a self-supervised learning loss (as in EfficientZero).
Expand Down
3 changes: 2 additions & 1 deletion docs/source/tutorials/config/config_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@
- `downsample`: 是否进行降采样。
- `norm_type`: 归一化使用的方法。
- `num_channels`: 卷积层提取的特征个数。
- `support_scale`: 价值支持集的范围 (-support_scale, support_scale)。
- `reward_support_range`: 价值支持集的范围 (`(start, stop, step)`)。<!-- TODO : ADAPT THIS DESCRIPTION, I DON'T SPEAK CHINESE -->
- `value_support_range`: 价值支持集的范围 (`(start, stop, step)`)。<!-- TODO : ADAPT THIS DESCRIPTION, I DON'T SPEAK CHINESE -->
- `bias`: 是否使用偏置。
- `discrete_action_encoding_type`: 离散化动作空间使用的编码类型。
- `self_supervised_learning_loss`: 是否使用自监督学习损失(参照EfficientZero的实现)。
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/gumbel_muzero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,8 @@
image_channel=3,
num_res_blocks=1,
num_channels=32,
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
),
cuda=True,
env_type='board_games',
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/gumbel_muzero/tictactoe_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,8 @@
reward_head_hidden_channels=[8],
value_head_hidden_channels=[8],
policy_head_hidden_channels=[8],
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
),
cuda=True,
env_type='board_games',
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/muzero/gomoku_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,8 @@
image_channel=3,
num_res_blocks=1,
num_channels=32,
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
),
cuda=True,
env_type='board_games',
Expand Down
5 changes: 2 additions & 3 deletions lzero/agent/config/muzero/tictactoe_play_with_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,8 @@
reward_head_hidden_channels=[8],
value_head_hidden_channels=[8],
policy_head_hidden_channels=[8],
support_scale=10,
reward_support_size=21,
value_support_size=21,
reward_support_range=(-10., 11., 1.),
value_support_range=(-10., 11., 1.),
norm_type='BN',
),
cuda=True,
Expand Down
4 changes: 3 additions & 1 deletion lzero/entry/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .eval_alphazero import eval_alphazero
from .eval_muzero import eval_muzero

from .eval_muzero_with_gym_env import eval_muzero_with_gym_env
from .train_alphazero import train_alphazero
from .train_muzero import train_muzero
Expand All @@ -12,4 +13,5 @@
from .train_muzero_multitask_segment_ddp import train_muzero_multitask_segment_ddp
from .train_unizero_multitask_segment_ddp import train_unizero_multitask_segment_ddp
from .train_unizero_multitask_segment_eval import train_unizero_multitask_segment_eval
from .utils import *
from .train_unizero_multitask_balance_segment_ddp import train_unizero_multitask_balance_segment_ddp
from .utils import *
80 changes: 0 additions & 80 deletions lzero/entry/compute_task_weight.py

This file was deleted.

3 changes: 2 additions & 1 deletion lzero/entry/eval_muzero.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os
from functools import partial
from typing import Optional, Tuple
import logging

import numpy as np
import torch
Expand Down Expand Up @@ -51,7 +52,7 @@ def eval_muzero(
# Create main components: env, policy
env_fn, collector_env_cfg, evaluator_env_cfg = get_vec_env_setting(cfg.env)
evaluator_env = create_env_manager(cfg.env.manager, [partial(env_fn, cfg=c) for c in evaluator_env_cfg])

# print(f"cfg.seed:{cfg.seed}")
evaluator_env.seed(cfg.seed, dynamic_seed=False)
set_pkg_seed(cfg.seed, use_cuda=cfg.policy.cuda)

Expand Down
Loading
Loading