OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Yang Zhou1 Yifan Wang1 Jianjun Zhou1,2 Wenzheng Chang1 Haoyu Guo1 Zizun Li1 Kaijing Ma1 Xinyue Li1 Yating Wang1 Haoyi Zhu1 Mingyu Liu1,2 Dingning Liu1 Jiange Yang1 Zhoujie Fu1 Junyi Chen1 Chunhua Shen1,2 Jiangmiao Pang1 Kaipeng Zhang1 Tong He1†
1Shanghai AI Lab 2ZJU
Corresponding Author

Data Visualization

Click thumbnails below to view different scenes.

Loading point cloud data...
a man run on the road
demo1 demo2 demo3 demo4 demo5

Acquisition & Annotation

Model Pipeline

Dataset Structure

Dataset Domain # Seq. FPS Resolution # Frames Depth Camera Text Optical flow Fg. masks
OmniWorld-Game Simulator 96K 24 1280×720 18,515K 🙂 🙂 🙂 🙂 🙂
AgiBot Robot 20K 30 640×480 39,247K 🙂 🙂
DROID Robot 35K 60 1280×720 26,643K 🙂 🙂 🙂 🙂
RH20T Robot 109K 10 640×360 53,453K 🙂 🙂 🙂
RH20T-Human Human 73K 10 640×360 8,875K 🙂
HOI4D Human 2K 15 1920×1080 891K 🙂 🙂 🙂 🙂
Epic-Kitchens Human 15K 30 1280×720 3,635K 🙂 🙂
Ego-Exo4D Human 4K 30 1024×1024 9,190K 🙂 🙂
HoloAssist Human 1K 30 896×504 13,037K 🙂 🙂 🙂
Assembly101 Human 4K 60 1920×1080 110,831K 🙂 🙂 🙂
EgoDex Human 242K 30 1280×720 76,631K 🙂
CityWalk Internet 7K 30 1280×720 13,096K 🙂

🙂 indicates the modality is newly annotated by us, denotes ground-truth data that already exists in the original dataset, and marks missing modalities.

Synthetic Dataset Comparison

Dataset Scene Type Motion Resolution # Frames Depth Camera Text Optical flow Fg. masks
MPI Sintel Mixed Dynamic 1024×436 1K
FlyingThings++ Outdoor Dynamic 960×540 28K
TartanAir Mixed Dynamic 640×480 1,000K
BlendedMVS Mixed Static 768×576 17K
HyperSim Indoor Static 1024×768 77K
Dynamic Replica Indoor Dynamic 1280×720 169K
Spring Mixed Dynamic 1920×1080 23K
EDEN Outdoor Static 640×480 300K
PointOdyssey Mixed Dynamic 960×540 216K
SeKai-Game Outdoor Dynamic 1920×1080 4,320K
OmniWorld-Game (Ours) Mixed Dynamic 1280×720 18,515K

Comparisons between OmniWorld-Game and existing synthetic datasets. OmniWorld-Game surpasses existing public synthetic datasets in modal diversity and data scale. denotes available modalities; denotes missing ones.

Dataset Statistics

OmniWorld Compositional Distribution
OmniWorld Compositional Distribution
OmniWorld-Game Internal Composition
OmniWorld-Game Internal Composition
Statistic Captions
Caption Tokens Distribution