InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation

InternVLA-A1 is an end-to-end vision–language–action (VLA) framework unifing understanding, generation ,and action for robotic manipulation. It leverages predictive imagination of task evolution to guide execution, enabling enhanced manipulation in highly dynamic environments.

🔥 Highlights

Novel Model Archituecture: A Mixture-of-Transformers architecture for unified understanding, generation, and action.
Hybrid Synthetic-Real Data Corpus: A hybrid synthetic-real manipulation dataset InternData-A1, integrating 5 heterogeneous robots, 15 skills, and 200+ scenes, emphasizing multi-robot collaboration under dynamic scenarios.
Impressive Real-World performance: InternVLA-A1 demonstrates strong effectiveness and generalization in highly dynamic scenarios involving dynamic grasping of conveyor belts and multi-robot collaboration.

🏆 Unified Understanding-Generation-Action Family

F1-VLA (F1 is a prequel version of InternVLA-A1): Paper | Code | Model
InternVLA-A1: Code | Paper/Model (Scheduled for late September release)

🤖 Real-World Robot Demonstrations

Package grabbing and flipping in conveyor belt

default.mp4

The model handles dynamically shaped packages on conveyor belts, tracking and predicting their trajectories in real-time to achieve high-speed stable grasping, while adaptively flipping packages and identifying express information from delivery notes.

Multi-robot collaboration on long-horizon tasks in dynamic environments

multi-robot-long-horizon.mp4

The model swiftly identifies, locates, and grips high-speed ingredients based on task demands, showcasing its adaptability in complex environments.

🚀 Quick Start

Prerequisites

Python ≥ 3.10
torch ≥ 2.6.0
CUDA ≥ 12.4

Installation

# Clone repository
git clone https://github.com/InternRobotics/InternVLA-A1.git

# Create environment
conda create -f internvla_a1 python==3.10
conda activate internvla_a1

# Install dependencies
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 torchcodec==0.2.1 --index-url https://download.pytorch.org/whl/cu124

# install other requirements
pip install -r requirements.txt

pip install numpy==1.26.4

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
internvla_a1/src		internvla_a1/src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation

🔥 Highlights

🏆 Unified Understanding-Generation-Action Family

🤖 Real-World Robot Demonstrations

Package grabbing and flipping in conveyor belt

Multi-robot collaboration on long-horizon tasks in dynamic environments

🚀 Quick Start

Prerequisites

Installation

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

InternRobotics/InternVLA-A1

Folders and files

Latest commit

History

Repository files navigation

InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation​

🔥 Highlights

🏆 Unified Understanding-Generation-Action Family

🤖 Real-World Robot Demonstrations

Package grabbing and flipping in conveyor belt

Multi-robot collaboration on long-horizon tasks in dynamic environments

🚀 Quick Start

Prerequisites

Installation

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation

Packages