This topic describes dataset management, including:
TensorBay SDK supports methods to organize local datasets into uniform TensorBay dataset structure. The typical steps to organize a local dataset:
First, write a catalog (ref) to store all the label schema information inside a dataset.
Second, write a dataloader (ref) to load the whole local dataset into a
A catalog is needed only if there is label information inside the dataset.
Take the Organization of BSTLD as an example.
For an organized local dataset (i.e. the initialized
instance), users can:
Upload it to TensorBay.
Read it directly.
This section mainly discusses the uploading operation. There are plenty of benefits of uploading local datasets to TensorBay.
REUSE: uploaded datasets can be reused without preprocessing again.
SHARING: uploaded datasets can be shared the with your team or the community.
VISUALIZATION: uploaded datasets can be visualized without coding.
VERSION CONTROL: different versions of one dataset can be uploaded and controlled conveniently.
During uploading dataset or data, if the remote path of the data is the same as another data under the same segment, the old data will be replaced.
Take the Upload Dataset of BSTLD as an example.
Two types of datasets can be read from TensorBay:
Datasets uploaded by yourself as mentioned in Upload Dataset.
Datasets uploaded by the shared Open Datasets platform.
Before reading a dataset uploaded by the community, fork it first.
Visit my datasets(or team datasets) panel of TensorBay platform to check all datasets that can be read.
Take the Read Dataset of BSTLD as an example.
Since TensorBay supports version control, users can update dataset meta, notes, data and labels to a new commit of a dataset. Thus, different versions of data and labels can coexist in one dataset, which greatly facilitates the datasets’ maintenance.
Please see Update dataset example for more details.
Move and Copy#
TensorBay supports four methods to copy or move data in datasets:
Copy is supported within a dataset or between datasets.
Moving is only supported within one dataset.
The target dataset of copying and moving must be in draft status.
Please see Move and copy example for more details.
Since TensorBay supports copy operation between different datasets, users can use it to merge datasets.
Please see Merge Datasets example for more details.
Get Label Statistics#
TensorBay supports getting label statistics of dataset.
Please see Get Label Statistics example for more details.