Dataset Structure

For ease of use, TensorBay defines a uniform dataset format. This topic explains the related concepts. The TensorBay dataset format looks like:

dataset
├── notes
├── catalog
│   ├── subcatalog
│   ├── subcatalog
│   └── ...
├── segment
│   ├── data
│   ├── data
│   └── ...
├── segment
│   ├── data
│   ├── data
│   └── ...
└── ...

dataset

Dataset is the topmost concept in TensorBay dataset format. Each dataset includes a catalog and a certain number of segments.

The corresponding class of dataset is Dataset.

notes

Notes contains the basic information of a dataset, including

  • the time continuity of the data inside the dataset

  • the fields of bin point cloud files inside the dataset

The corresponding class of notes is Notes.

catalog

Catalog is used for storing label meta information. It collects all the labels corresponding to a dataset. There could be one or several subcatalogs (Label Format) under one catalog. Each Subcatalog only stores label meta information of one label type, including whether the corresponding annotation has tracking information.

Here are some catalog examples of datasets with different label types and a dataset with tracking annotations(Table. 6).

Table 6 Catalogs

Catalogs

Description

elpv Catalog

This example is the catalog of elpv Dataset,
which is a dataset with Classification label.

BSTLD Catalog

This example is the catalog of BSTLD Dataset,
which is a dataset with Box2D label.

Neolix OD Catalog

This example is the catalog of Neolix OD Dataset,
which is a dataset with Box3D label.

Leeds Sports Pose Catalog

This example is the catalog of Leeds Sports Pose Dataset,
which is a dataset with Keypoints2D label.

NightOwls Catalog

This example is the catalog of NightOwls Dataset,
which is a dataset with tracking Box2D label.

Note that catalog is not needed if there is no label information in a dataset.

segment

There may be several parts in a dataset. In TensorBay format, each part of the dataset is stored in one segment. For example, all training samples of a dataset can be organized in a segment named “train”.

The corresponding class of segment is Segment.

data

Data is the structural level next to segment. One data contains one dataset sample and its related labels, as well as any other information such as timestamp.

The corresponding class of data is Data.