Skip to content

Add Vision Transformer (ViT) Demo to computer_vision Module #13372

@dhruvidave348

Description

@dhruvidave348

Feature description

This PR adds a fully functional demo of a Vision Transformer (ViT) for image classification using Hugging Face Transformers.

Features included:

  • Loads a sample image from the web.
  • Uses ViTImageProcessor for preprocessing.
  • Performs inference with ViTForImageClassification.
  • Prints the predicted label for the image.
  • Handles network timeout and correct import order.

Example output:
Predicted label: tabby, tabby cat

This demo can be used as a reference for anyone learning ViT or image classification with Hugging Face Transformers.

Additional Notes:

  • Requires torch, transformers, and PIL.
  • Can be extended to classify local images easily.

Hacktoberfest 2025: ✅

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementThis PR modified some existing files

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions