Skip to content
This repository was archived by the owner on Mar 15, 2024. It is now read-only.
This repository was archived by the owner on Mar 15, 2024. It is now read-only.

batch_size flag #220

@tsengalb99

Description

@tsengalb99

Is the batch_size flag the batch size per GPU or the total batch size across all GPUs? In the example training command, you use 4 GPUs and a batch size of 256. Does this mean the effective batch size is 1024 or 256 with 64 per GPU? I am unable to reproduce the DeiT-Ti results (~62.5% @ 250 epochs, I highly doubt it will hit 72% @ 300 epochs) using either 8 GPUs and batch_size=128 or 4 GPUs and batch_size=256. I was under the assumption that both would give me identical results equivalent to a batch size of 1024, but it seems like something is broken here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions