Update Dataset

This topic describes how to update datasets, including:

The following scenario is used for demonstrating how to update data and label:

  1. Upload a dataset.

  2. Update the dataset’s labels.

  3. Add some data to the dataset.

Please see Upload Dataset for more information about the first step.
The last two steps will be introduced in detail.

Update Dataset Meta

TensorBay SDK supports a method to update dataset meta info.

gas.update_dataset("DATASET_NAME", alias="alias", is_public=True)

Update Dataset Notes

TensorBay SDK supports a method to update dataset notes. The dataset can be updated into continuous dataset by setting is_continuous to True.

dataset_client = gas.get_dataset("DATASET_NAME")
dataset_client.create_draft("draft-1")
dataset_client.update_notes(is_continuous=True)
dataset_client.commit("update notes")

Update Label

TensorBay SDK supports methods to update labels to overwrite previous labels.

Get a previously uploaded dataset and create a draft:

dataset_client.create_draft("draft-2")

Update the catalog if needed:

dataset_client.upload_catalog(dataset.catalog)

Overwrite previous labels with new label on dataset:

for segment in dataset:
    segment_client = dataset_client.get_segment(segment.name)
    for data in segment:
        segment_client.upload_label(data)

Commit the dataset:

dataset_client.commit("update labels")
Now dataset is committed with a version includes new labels.
Users can switch between different commits to use different version of labels.

Important

Uploading labels operation will overwrite all types of labels in data.

Update Data

Add new data to dataset.

gas.upload_dataset(dataset, jobs=8, skip_uploaded_files=True)

Set skip_uploaded_files=True to skip uploaded data.

Overwrite uploaded data to dataset.

gas.upload_dataset(dataset, jobs=8)

The default value of skip_uploaded_files is false, use it to overwrite uploaded data.

Note

The segment name and data name are used to identify data, which means if two data’s segment names and data names are the same, then they will be regarded as one data.

Important

Uploading dataset operation will only add or overwrite data, Data uploaded before will not be deleted.

Delete segment by the segment name.

dataset_client.create_draft("draft-3")
dataset_client.delete_segment("SegmentName")

Delete data by the data remote path.

segment_client = dataset_client.get_segment("SegmentName")
segment_client.delete_data("a.png")

For a fusion dataset, TensorBay SDK supports deleting a frame by its id.

segment_client.delete_frame("00000000003W09TEMC1HXYMC74")