Move And Copy
This topic describes TensorBay dataset operations:
Take the Oxford-IIIT Pet as an example. Its structure looks like:
datasets/
test/
Abyssinian_002.jpg
...
trainval/
Abyssinian_001.jpg
...
Note
Before operating this dataset, fork it first.
Get the dataset client.
from tensorbay import GAS
ACCESS_KEY = "Accesskey-*****"
gas = GAS(ACCESS_KEY)
dataset_client = gas.get_dataset("OxfordIIITPet")
dataset_client.list_segment_names()
# test, trainval
There are currently two segments: test
and trainval
.
Copy Segment
Copy segment test
to test_1
.
dataset_client.create_draft("draft-1")
segment_client = dataset_client.copy_segment("test", "test_1")
segment_client.name
# test_1
dataset_client.list_segment_names()
# test, test_1, trainval
dataset_client.commit("copy test segment to test_1 segment")
Move Segment
Move segment test
to test_2
.
dataset_client.create_draft("draft-2")
segment_client = dataset_client.move_segment("test", "test_2")
segment_client.name
# test_2
dataset_client.list_segment_names()
# test_1, trainval, test_2
dataset_client.commit("move test segment to test_2 segment")
Copy Data
Copy all data with prefix Abyssinian
in both test_1
and trainval
segments to abyssinian
segment.
dataset_client.create_draft("draft-3")
target_segment_client = dataset_client.create_segment("abyssinian")
for name in ["test_1", "trainval"]:
segment_client = dataset_client.get_segment(name)
for file_name in segment_client.list_data_paths():
if file_name.startswith("Aabyssinian"):
target_segment_client.copy_data(file_name, file_name, source_client=segment_client)
dataset_client.list_segment_names()
# test_1, test_2, trainval, abyssinian
dataset_client.commit("add abyssinian segment")
Move Data
Split trainval
segment into train
and val
:
Extract 500 data from
trainval
toval
segment.Move
trainval
totrain
.
import random
dataset_client.create_draft("draft-4")
val_segment_client = dataset_client.create_segment("val")
trainval_segment_client = dataset_client.get_segment("trainval")
# list_data_paths will return a lazy list, get and delete data are not supports at one time.
data_paths = list(trainval_segment_client.list_data_paths())
# Generate 500 random numbers.
val_random_numbers = random.sample(range(0, len(data_paths)), 500)
# Get the data path list by random index list.
val_ramdom_paths = [data_paths[index] for index in val_random_numbers]
# Move all data of the val random path list from trainval to train segment
val_segment_client.move_data(val_ramdom_paths, source_client=trainval_segment_client)
dataset_client.move_segment("trainval", "train")
dataset_client.list_segment_names()
# train, val, test_1, test_2, abyssinian
dataset_client.commit("split train and val segment")
Note
The data storage space will only be calculated once when a segment is copied.
Note
TensorBay SDK supports three strategies to solve the conflict when the target segment/data already exists, which can be set as an keyword argument in the above-mentioned functions.
abort(default): abort the process by raising InternalServerError.
skip: skip moving or copying segment/data.
override: override the whole target segment/data with the source segment/data.