-
Notifications
You must be signed in to change notification settings - Fork 181
feature(tj): add monitoring for the gradient conflict metric of MoE in ScaleZero #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev-multitask-balance-clean
Are you sure you want to change the base?
feature(tj): add monitoring for the gradient conflict metric of MoE in ScaleZero #418
Conversation
modified: zoo/atari/config/atari_unizero_multitask_segment_ddp_config.py
| import os | ||
| from functools import partial | ||
| from typing import Tuple, Optional, List | ||
| import concurrent.futures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要merge dev-multitask-balance-clean 解决冲突
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已解决,这个MoE冲突的应该是单独开一个分支的吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR后面应该是放在单独的分支哈,因为与主逻辑差别较大。
| @@ -0,0 +1,1501 @@ | |||
| import torch | |||
| import torch.nn as nn | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件是和 https://github.com/opendilab/LightZero/pull/401/files 一致的吗?如果是改一下文件name 以及移到lzero/model/unizero_world_models/toy_multitask_moe_grad_analysis.py这个路径下吧
| betas=(0.9, 0.95), | ||
| ) | ||
|
|
||
| # self.a=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已解决,类似的已去除
| import os | ||
| from functools import partial | ||
| from typing import Tuple, Optional, List | ||
| import concurrent.futures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR后面应该是放在单独的分支哈,因为与主逻辑差别较大。
| self.forward_handler.remove() | ||
| self.backward_handler.remove() | ||
|
|
||
| # # modified by tangjia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的代码有用到吗 需要删除吧?
| torchrun --nproc_per_node=8 ./zoo/atari/config/atari_unizero_multitask_segment_8games_ddp_config.py | ||
| """ | ||
|
|
||
| # /fs-computility/niuyazhe/tangjia/code/LightZero-dev-multitask-balance-clean/zoo/atari/config/atari_unizero_multitask_segment_ddp_config.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
类似的已删除
| only_use_moco_stats=False, | ||
| # use_moco=False, # ==============TODO============== | ||
| use_moco=True, # ==============TODO: moco============== | ||
| use_moco=False, # ==============TODO============== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实际scalezero moe grad分析这里是用的这个还是上面那个config?只保留使用的那个,以及写一个英文和中文 readme整体介绍一下这个PR的相关改动和moe grad分析结果的大致情况,放在config同等路径下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_moco=False, 这一个
No description provided.