-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Description
System Info
transformers
version: 4.39.0.dev0- Platform: Linux-3.10.0-1160.71.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.10.13
- Huggingface_hub version: 0.21.4
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {'gradient_accumulation_steps': 16, 'zero3_init_flag': False, 'zero_stage': 0}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: [] - PyTorch version (GPU?): 2.2.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
Hi, @ArthurZucker and @younesbelkada . I'm trying to split a dataset automatically to multi gpu (a bit like data parallel) for inference. But strange things happen when using t5 model in hf while other models work correctly(i.e. bart), so I guess here exist some problem related to t5 implementation, would you like help checking it out? :)
Although it has been mentioned online that the error below may be related to OOM, I am certain that it is not. The following code only allows rank0 to obtain normal output, while other ranks will report the following error.
Traceback (most recent call last):
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/ruanjh/NiuInference/NiuInference.py", line 97, in get_pred
output = model.generate(
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/transformers/generation/utils.py", line 1388, in generate
model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/transformers/generation/utils.py", line 503, in _prepare_encoder_decoder_kwargs_for_generation
model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1115, in forward
layer_outputs = layer_module(
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 695, in forward
self_attention_outputs = self.layer[0](
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 602, in forward
attention_output = self.SelfAttention(
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 521, in forward
query_states = shape(self.q(hidden_states)) # (batch_size, n_heads, seq_length, dim_per_head)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data/ruanjh/miniconda3/envs/mamba/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The following code should be quite easy to reproduce. All you need to do is replace the model_dir in the main function with a specific model, such as Google/t5-v1_1-large , and make sure CUDA VISIBLE DEVICES >1 .
import torch
from torch import bfloat16
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.utils.data import Dataset,DataLoader
import functools
from transformers import AutoTokenizer,DefaultDataCollator,GenerationConfig,PreTrainedModel,AutoModelForSeq2SeqLM,AutoModelForCausalLM,AutoConfig,DataCollatorWithPadding
import logging
from transformers.models.auto.modeling_auto import MODEL_FOR_CAUSAL_LM_MAPPING_NAMES
from tqdm import tqdm
# from accelerate import find_executable_batch_size
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class DefaultDataset(Dataset):
def __init__(self,data,tokenizer):
self.data=tokenizer(data,return_tensors='pt',padding=True)
def __getitem__(self,idx):
return {'input_ids':self.data['input_ids'][idx]}
def __len__(self):
return self.data['input_ids'].size(0)
class NiuInference:
def __init__(self,model_dir,data,dtype=bfloat16,dataset=None,data_collator=None,output_path='niuinference.out',auto_batch_size=True,batch_size=1,generation_config=None):
self.model_dir=model_dir
self.dtype=dtype
self.data=data
self.dataset=dataset
self.data_collator=data_collator
self.output_path=output_path
self.batch_size=batch_size
self.auto_batch_size=auto_batch_size
self.generation_config=generation_config
def _load_model_and_tokenizer(self,device):
print(self.dtype)
config=AutoConfig.from_pretrained(self.model_dir)
if config.model_type in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES:
model=AutoModelForCausalLM.from_pretrained(self.model_dir,torch_dtype=self.dtype)
else:
model=AutoModelForSeq2SeqLM.from_pretrained(self.model_dir,torch_dtype=self.dtype)
model.to(device)
tokenizer=AutoTokenizer.from_pretrained(self.model_dir)
return model,tokenizer
# @find_executable_batch_size(starting_batch_size=1)
# def auto_get_pred(batch_size):
def get_pred(self,rank,out_path,data,dict):
batch_size=2
try:
device = torch.device(f'cuda:{rank}')
model, tokenizer = self._load_model_and_tokenizer(device)
if self.dataset is not None:
dataset=self.dataset(data=data,tokenizer=tokenizer)
else:
dataset=DefaultDataset(data=data,tokenizer=tokenizer)
if self.data_collator is not None:
collator=self.data_collator(tokenizer,model=model,padding=True)
else:
collator= DataCollatorWithPadding(tokenizer)
dataloader=DataLoader(dataset,batch_size,collate_fn=collator,pin_memory=True,num_workers=0)
result=[]
for input in tqdm(dataloader):
input.to(device)
print(input)
output = model.generate(
input_ids=input['input_ids'],
attention_mask=input['attention_mask'],
num_beams=5,
do_sample=False,
temperature=1.0,
max_new_tokens=512,
)
pred = tokenizer.batch_decode(output,skip_special_tokens=True)
print(pred)
result+=pred
dict[f'{rank}']=result
except Exception as e:
print('error',device)
raise
def split_list(self,lst, n):
avg = len(lst) / float(n)
return [lst[int(avg * i):int(avg * (i + 1))] for i in range(n)]
def run(self,):
world_size = min(torch.cuda.device_count(),len(self.data)) # corner case, data<available GPU num
data_subsets = self.split_list(self.data,world_size)
print(data_subsets)
processes = []
manager = mp.Manager()
record_dict = manager.dict()
for rank in range(world_size):
p = mp.Process(target=self.get_pred, args=(rank,self.output_path,data_subsets[rank],record_dict))
p.start()
processes.append(p)
for p in processes:
p.join()
with open(self.output_path, "w", encoding="utf-8") as f:
for rank in range(world_size):
for r in record_dict[f'{rank}']:
f.write(r.replace('\n','\\n')+'\n')
if __name__=='__main__':
mp.set_start_method('spawn')
i=NiuInference(model_dir=**replace here to t5 or bart**,data=['hello,how is your day','my wish is that you happy','from scratch',])
i.run()
Expected behavior
t5 model can inference in multiprocessing.