Cloud Storage¶
All data on TensorBay are hosted on cloud.
TensorBay supports two cloud storage modes:
DEFAULT CLOUD STORAGE: data are stored on TensorBay cloud
AUTHORIZED CLOUD STORAGE: data are stored on other providers’ cloud
Default Cloud Storage¶
In default cloud storage mode, data are stored on TensorBay cloud.
Create a dataset with default storage:
gas.create_dataset("DatasetName")
Authorized Cloud Storage¶
You can also upload data to your public cloud storage space.
Now TensorBay support following cloud providers:
Aliyun OSS
Amazon S3
Azure Blob
Config¶
See cloud storage instruction for details about how to configure cloud storage on TensorBay.
TensorBay SDK supports a method to list a user’s all previous configurations.
from tensorbay import GAS
gas = GAS("Accesskey-*****")
gas.list_auth_storage_configs()
Create Authorized Storage Dataset¶
Create a dataset with authorized cloud storage:
dataset_client = gas.create_auth_dataset("dataset_name", "config_name", "path/to/dataset")
Import Cloud Files into Authorized Storage Dataset¶
Take the following cloud directory as an example:
data/
├── images/
│ ├── 00001.png
│ ├── 00002.png
│ └── ...
├── labels/
│ ├── 00001.json
│ ├── 00002.json
│ └── ...
└── ...
Get a cloud client.
from tensorbay import GAS
gas = GAS("Accesskey-*****")
cloud_client = gas.get_cloud_client("config_name")
Import the AuthData from cloud platform and load label file to an authorized storage dataset.
import json
from tensorbay.dataset import Dataset
from tensorbay.label import Classification
# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("DatasetName")
# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()
images = cloud_client.list_auth_data("data/images")
labels = cloud_client.list_auth_data("data/labels")
for auth_data, label in zip(images, labels):
with label.open() as fp:
auth_data.label.classification = Classification.loads(json.load(fp))
segment.append(auth_data)
dataset_client = gas.upload_dataset(dataset, jobs=8)
Important
Files will be copied from raw directory to the authorized cloud storage dataset path, thus the storage space will be doubled on the cloud platform.