Update Dataset#
This topic describes how to update datasets, including:
The following scenario is used for demonstrating how to update data and label:
Upload a dataset.
Update the dataset’s labels.
Add some data to the dataset.
Update Dataset Meta#
TensorBay SDK supports a method to update dataset meta info.
gas.update_dataset("<DATASET_NAME>", alias="<DATASET_ALIAS>", is_public=True)
Update Dataset Notes#
TensorBay SDK supports a method to update dataset notes. The dataset can be updated into continuous
dataset by setting is_continuous
to True
.
dataset_client = gas.get_dataset("<DATASET_NAME>")
dataset_client.create_draft("draft-1")
dataset_client.update_notes(is_continuous=True)
dataset_client.commit("update notes")
Update Label#
TensorBay SDK supports methods to update labels to overwrite previous labels.
Get a previously uploaded dataset and create a draft:
dataset_client.create_draft("draft-2")
Update the catalog if needed:
dataset_client.upload_catalog(dataset.catalog)
Overwrite previous labels with new label:
from tensorbay.label import Classification
dataset = Dataset("<DATASET_NAME>", gas)
for segment in dataset:
update_data = []
for data in segment:
data.label.classification = Classification("NEW_CATEGORY") # set new label
update_data.append(data)
segment_client = dataset_client.get_segment(segment.name)
segment_client.upload_label(update_data)
Commit the dataset:
dataset_client.commit("update labels")
Important
The operation to upload labels will overwrite all types of labels in data.
Update Data#
Add new data to dataset.
gas.upload_dataset(dataset, jobs=8, skip_uploaded_files=True)
Set skip_uploaded_files=True
to skip uploaded data.
Overwrite uploaded data to dataset.
gas.upload_dataset(dataset, jobs=8)
The default value of skip_uploaded_files
is False
, and use it to overwrite uploaded data.
Note
The segment name and data name are used to identify data, if uploading a data whose segment name and data name are the same with certain data uploaded, then the former one will be visited.
Important
The operation to upload data will only add or overwrite data, and the data uploaded before will not be deleted.
Delete segment by the segment name.
dataset_client.create_draft("draft-3")
dataset_client.delete_segment("<SEGMENT_NAME>")
Delete data by the data remote path.
segment_client = dataset_client.get_segment("<SEGMENT_NAME>")
segment_client.delete_data("a.png")
For a fusion dataset, TensorBay SDK supports deleting a frame by its id.
segment_client.delete_frame("00000000003W09TEMC1HXYMC74")