Dataset Management#

This topic describes dataset management, including:

Organize Dataset#

TensorBay SDK supports methods to organize local datasets into uniform TensorBay dataset structure. The typical steps to organize a local dataset:

  • First, write a catalog (ref) to store all the label schema information inside a dataset.

  • Second, write a dataloader (ref) to load the whole local dataset into a Dataset instance.


A catalog is needed only if there is label information inside the dataset.

Take the Organization of BSTLD as an example.

Upload Dataset#

For an organized local dataset (i.e. the initialized Dataset instance), users can:

  • Upload it to TensorBay.

  • Read it directly.

This section mainly discusses the uploading operation. There are plenty of benefits of uploading local datasets to TensorBay.

  • REUSE: uploaded datasets can be reused without preprocessing again.

  • SHARING: uploaded datasets can be shared the with your team or the community.

  • VISUALIZATION: uploaded datasets can be visualized without coding.

  • VERSION CONTROL: different versions of one dataset can be uploaded and controlled conveniently.


During uploading dataset or data, if the remote path of the data is the same as another data under the same segment, the old data will be replaced.

Take the Upload Dataset of BSTLD as an example.

Read Dataset#

Two types of datasets can be read from TensorBay:


Before reading a dataset uploaded by the community, fork it first.


Visit my datasets(or team datasets) panel of TensorBay platform to check all datasets that can be read.

Take the Read Dataset of BSTLD as an example.

Update Dataset#

Since TensorBay supports version control, users can update dataset meta, notes, data and labels to a new commit of a dataset. Thus, different versions of data and labels can coexist in one dataset, which greatly facilitates the datasets’ maintenance.

Please see Update dataset example for more details.

Move and Copy#

TensorBay supports four methods to copy or move data in datasets:

  • copy segments

  • copy data

  • move segments

  • move data

Copy is supported within a dataset or between datasets.

Moving is only supported within one dataset.


The target dataset of copying and moving must be in draft status.

Please see Move and copy example for more details.

Merge Datasets#

Since TensorBay supports copy operation between different datasets, users can use it to merge datasets.

Please see Merge Datasets example for more details.

Get Label Statistics#

TensorBay supports getting label statistics of dataset.

Please see Get Label Statistics example for more details.