Skip to content

InternRobotics/InternVLA-A1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation​


InternVLA-A1 is an end-to-end vision–language–action (VLA) framework unifing understanding, generation ,and action for robotic manipulation. It leverages predictive imagination of task evolution to guide execution, enabling enhanced manipulation in highly dynamic environments.

🔥 Highlights

seer

  • Novel Model Archituecture: A Mixture-of-Transformers architecture for unified understanding, generation, and action.
  • Hybrid Synthetic-Real Data Corpus: A hybrid synthetic-real manipulation dataset InternData-A1, integrating 5 heterogeneous robots, 15 skills, and 200+ scenes, emphasizing multi-robot collaboration under dynamic scenarios.
  • Impressive Real-World performance: InternVLA-A1 demonstrates strong effectiveness and generalization in highly dynamic scenarios involving dynamic grasping of conveyor belts and multi-robot collaboration.

🏆 Unified Understanding-Generation-Action Family

🤖 Real-World Robot Demonstrations

Package grabbing and flipping in conveyor belt

default.mp4

The model handles dynamically shaped packages on conveyor belts, tracking and predicting their trajectories in real-time to achieve high-speed stable grasping, while adaptively flipping packages and identifying express information from delivery notes.

Multi-robot collaboration on long-horizon tasks in dynamic environments

multi-robot-long-horizon.mp4

The model swiftly identifies, locates, and grips high-speed ingredients based on task demands, showcasing its adaptability in complex environments.

🚀 Quick Start

Prerequisites

  • Python ≥ 3.10
  • torch ≥ 2.6.0
  • CUDA ≥ 12.4

Installation

# Clone repository
git clone https://github.com/InternRobotics/InternVLA-A1.git

# Create environment
conda create -f internvla_a1 python==3.10
conda activate internvla_a1

# Install dependencies
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 torchcodec==0.2.1 --index-url https://download.pytorch.org/whl/cu124

# install other requirements
pip install -r requirements.txt

pip install numpy==1.26.4

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

About

InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation​

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages