Unable to save/load optimizer state after accelerator.prepare()

### System Info

```Shell
accelerate version==1.7.0, OS==Linux, python version==3.10.16, 
"deepspeed_config": {
    "zero_stage": 2,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },q
  }
```

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`)
- [x] My own task or dataset (give details below)

### Reproduction


After preparing the optimizer with `accelerator.prepare()`, I'm unable to properly save and load its state:

1. When trying to save the prepared optimizer directly:
```python
optimizer = torch.optim.Adam(model.parameters())
optimizer = accelerator.prepare(model, optimizer)[1]

# Saving
torch.save({
    'optimizer_state_dict': optimizer.state_dict(),
}, 'checkpoint.pt')

# Loading 
checkpoint = torch.load('checkpoint.pt')
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])  # Fails
```
This fails with:
```
KeyError: 'param_groups'
```

2.I've tried looking at the saved optimizer state:
```python
ckpt = torch.load('checkpoint.pt', map_location='cpu')
print(ckpt['optimizer_state_dict'].keys())
```
Shows DeepSpeed format keys:
```
dict_keys(['loss_scaler', 'dynamic_loss_scale', 'overflow', 'clip_grad', 'base_optimizer_state', 'single_partition_of_fp32_groups', 'zero_stage', 'group_paddings', 'partition_count', 'ds_version', 'param_slice_mappings'])
```
So I tried to load from base_optimizer_state:
```python
optimizer.load_state_dict(checkpoint['optimizer_state_dict']['base_optimizer_state'])
```
Fails with:
```
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
```

3.When trying to unwrap the optimizer first:
```python
unwrapped_optimizer = accelerator.unwrap_model(optimizer)  # Fails
```
This fails with:
```
AttributeError: 'DeepSpeedOptimizerWrapper' object has no attribute '_modules'
```

### Expected behavior

Should be able to save and load optimizer states when using Accelerate, similar to regular PyTorch training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to save/load optimizer state after accelerator.prepare() #3670

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to save/load optimizer state after accelerator.prepare() #3670

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions