

Each dataset is captured by three Kinect V2 cameras concurrently. These two datasets both contain RGB videos, depth map sequences, 3D skeletal data, and infrared (IR) videos for each sample. "NTU RGB+D 120" extends "NTU RGB+D" by adding another 60 classes and another 57,600 video samples, i.e., "NTU RGB+D 120" has 120 classes and 114,480 samples in total.

"NTU RGB+D" contains 60 action classes and 56,880 video samples. This page introduces two datasets: "NTU RGB+D" and "NTU RGB+D 120". Action Recognition Datasets: "NTU RGB+D" Dataset and "NTU RGB+D 120" Dataset ( also include AUTH UAV Gesture Dataset: NTU 4-Class)
