Dataset Management

This topic describes dataset management, including:

Organize Dataset

TensorBay SDK supports methods to organize local datasets into uniform TensorBay dataset structure. The typical steps to organize a local dataset:

  • First, write a catalog (ref) to store all the label schema information inside a dataset.

  • Second, write a dataloader (ref) to load the whole local dataset into a Dataset instance.


A catalog is needed only if there is label information inside the dataset.

Take the Organization of BSTLD as an example.

Upload Dataset

For an organized local dataset (i.e. the initialized Dataset instance), users can:

  • Upload it to TensorBay.

  • Read it directly.

This section mainly discusses the uploading operation. There are plenty of benefits of uploading local datasets to TensorBay.

  • REUSE: uploaded datasets can be reused without preprocessing again.

  • SHARING: uploaded datasets can be shared the with your team or the community.

  • VISUALIZATION: uploaded datasets can be visualized without coding.

  • VERSION CONTROL: different versions of one dataset can be uploaded and controlled conveniently.

Take the Uploading of BSTLD as an example.

Read Dataset

Two types of datasets can be read from TensorBay:


Before reading a dataset uploaded by the community, fork it first.


Visit my datasets(or team datasets) panel of TensorBay platform to check all datasets that can be read.

Take the Uploading of BSTLD as an example.

Update Dataset

Since TensorBay supports version control, users can update data and labels to a new commit of a dataset. Thus, different versions of data and labels can coexist in one dataset, which greatly facilitates the datasets’ maintenance.

Please see Update dataset example for more details.