Dataset Management

This topic describes dataset management, including:

Organize Dataset

TensorBay SDK supports methods to organize local datasets into uniform TensorBay dataset structure. The typical steps to organize a local dataset:

  • First, write a catalog (ref) to store all the label schema information inside a dataset.

  • Second, write a dataloader (ref) to load the whole local dataset into a Dataset instance.


A catalog is needed only if there is label information inside the dataset.

Take the Organization of BSTLD as an example.

Upload Dataset

For an organized local dataset (i.e. the initialized Dataset instance), users can:

  • Upload it to TensorBay.

  • Read it directly.

This section mainly discusses the uploading operation. There are plenty of benefits of uploading local datasets to TensorBay.

  • REUSE: uploaded datasets can be reused without preprocessing again.

  • SHARING: uploaded datasets can be shared the with your team or the community.

  • VISUALIZATION: uploaded datasets can be visualized without coding.

  • VERSION CONTROL: different versions of one dataset can be uploaded and controlled conveniently.


During uploading dataset or data, if the remote path of the data is the same as another data under the same segment, the old data will be replaced.

Take the Upload Dataset of BSTLD as an example.

Read Dataset

Two types of datasets can be read from TensorBay:


Before reading a dataset uploaded by the community, fork it first.


Visit my datasets(or team datasets) panel of TensorBay platform to check all datasets that can be read.

Take the Read Dataset of BSTLD as an example.

Update Dataset

Since TensorBay supports version control, users can update dataset meta, notes, data and labels to a new commit of a dataset. Thus, different versions of data and labels can coexist in one dataset, which greatly facilitates the datasets’ maintenance.

Please see Update dataset example for more details.

Move and Copy

TensorBay supports four methods to copy or move data in datasets:

  • copy segments

  • copy data

  • move segments

  • move data

Copy is supported within a dataset or between datasets.

Moving is only supported within one dataset.


The target dataset of copying and moving must be in draft status.

Please see Move and copy example for more details.

Merge Datasets

Since TensorBay supports copy operation between different datasets, users can use it to merge datasets.

Please see Merge Datasets example for more details.

Get Label Statistics

TensorBay supports getting label statistics of dataset.

Please see Get Label Statistics example for more details.