[P1] Multi-GPU model sharding with intervening evaluation and training

**Descriptions:**

The library is not tested with multi-GPU use cases. We assume the intervening model can be loaded into a single GPU. This is not ideal for interventions on 70B models, for instance. We want to be able to load the model into multiple GPUs using sharding.

Static interventions need to be attached to the right component on the right machine in case of model sharing. Training interventions need to be mapped onto the right machine where the corresponding model component lives as well.

This could be a large task. The first step is clear: try out static interventions (e.g., vanilla interventions) when models are loaded into multiple GPUs during inference time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[P1] Multi-GPU model sharding with intervening evaluation and training #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[P1] Multi-GPU model sharding with intervening evaluation and training #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions