Update Dataset#

This topic describes how to update datasets, including:

The following scenario is used for demonstrating how to update data and label:

  1. Upload a dataset.

  2. Update the dataset’s labels.

  3. Add some data to the dataset.

Please see Upload Dataset for more information about the first step.
The last two steps will be introduced in detail.

Update Dataset Meta#

TensorBay SDK supports a method to update dataset meta info.

gas.update_dataset("<DATASET_NAME>", alias="<DATASET_ALIAS>", is_public=True)

Update Dataset Notes#

TensorBay SDK supports a method to update dataset notes. The dataset can be updated into continuous dataset by setting is_continuous to True.

dataset_client = gas.get_dataset("<DATASET_NAME>")
dataset_client.create_draft("draft-1")
dataset_client.update_notes(is_continuous=True)
dataset_client.commit("update notes")

Update Label#

TensorBay SDK supports methods to update labels to overwrite previous labels.

Get a previously uploaded dataset and create a draft:

dataset_client.create_draft("draft-2")

Update the catalog if needed:

dataset_client.upload_catalog(dataset.catalog)

Overwrite previous labels with new label:

from tensorbay.label import Classification

dataset = Dataset("<DATASET_NAME>", gas)
for segment in dataset:
    update_data = []
    for data in segment:
        data.label.classification = Classification("NEW_CATEGORY")  # set new label
        update_data.append(data)
    segment_client = dataset_client.get_segment(segment.name)
    segment_client.upload_label(update_data)

Commit the dataset:

dataset_client.commit("update labels")
Now dataset is committed with a version including new labels.
Users can switch between different commits to use different version of labels.

Important

The operation to upload labels will overwrite all types of labels in data.

Update Data#

Add new data to dataset.

gas.upload_dataset(dataset, jobs=8, skip_uploaded_files=True)

Set skip_uploaded_files=True to skip uploaded data.

Overwrite uploaded data to dataset.

gas.upload_dataset(dataset, jobs=8)

The default value of skip_uploaded_files is False, and use it to overwrite uploaded data.

Note

The segment name and data name are used to identify data, if uploading a data whose segment name and data name are the same with certain data uploaded, then the former one will be visited.

Important

The operation to upload data will only add or overwrite data, and the data uploaded before will not be deleted.

Delete segment by the segment name.

dataset_client.create_draft("draft-3")
dataset_client.delete_segment("<SEGMENT_NAME>")

Delete data by the data remote path.

segment_client = dataset_client.get_segment("<SEGMENT_NAME>")
segment_client.delete_data("a.png")

For a fusion dataset, TensorBay SDK supports deleting a frame by its id.

segment_client.delete_frame("00000000003W09TEMC1HXYMC74")