Cloud Storage
DEFAULT CLOUD STORAGE: data are stored on TensorBay cloud
AUTHORIZED CLOUD STORAGE: data are stored on other providers’ cloud
Default Cloud Storage
gas.create_dataset("DatasetName")
Authorized Cloud Storage
Aliyun OSS
Amazon S3
Azure Blob
Config
See cloud storage instruction for details about how to configure cloud storage on TensorBay.
TensorBay SDK supports following methods to configure cloud storage.
For example:
gas.create_oss_storage_config(
"oss_config",
"tests",
endpoint="<YOUR_ENDPOINT>", # like oss-cn-qingdao.aliyuncs.com
accesskey_id="<YOUR_ACCESSKEYID>",
accesskey_secret="<YOUR_ACCESSKEYSECRET>",
bucket_name="<YOUR_BUCKETNAME>",
)
TensorBay SDK supports a method to list a user’s all previous configurations.
gas.list_auth_storage_configs()
Create Authorized Storage Dataset
Create a dataset with authorized cloud storage:
dataset_client = gas.create_dataset("dataset_name", config_name="config_name")
Import Cloud Files into Authorized Storage Dataset
Take the following original cloud directory as an example:
data/
├── images/
│ ├── 00001.png
│ ├── 00002.png
│ └── ...
├── labels/
│ ├── 00001.json
│ ├── 00002.json
│ └── ...
└── ...
Get a cloud client.
from tensorbay import GAS
gas = GAS("Accesskey-*****")
cloud_client = gas.get_cloud_client("config_name")
Import the AuthData from cloud platform and load label file to an authorized storage dataset.
import json
from tensorbay.dataset import Dataset
from tensorbay.label import Classification
# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("DatasetName")
# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()
images = cloud_client.list_auth_data("data/images/")
labels = cloud_client.list_auth_data("data/labels/")
for auth_data, label in zip(images, labels):
with label.open() as fp:
auth_data.label.classification = Classification.loads(json.load(fp))
segment.append(auth_data)
dataset_client = gas.upload_dataset(dataset, jobs=8)
Important
Files will be copied from original directory to the authorized cloud storage dataset path, thus the storage space will be doubled on the cloud platform.
Note
Set the authorized cloud storage dataset path the same as original directory could speed up
the import action. For example, set the config path of above dataset to data/images
.
Authorized Local Storage
If you want to use TensorBay service and have the data stored locally at the same time, TensorBay supports authorized local storage config.
Before creating the local storage config via create_local_storage_config()
,
you need to start a local storage service. Please contact us on TensorBay for more information.
gas.create_local_storage_config(
name="local_storage_config",
file_path="<path to store the datasets>",
endpoint="<external IP address of the local storage service>",
)
Then create an authorized local storage dataset with the config.
dataset_client = gas.create_dataset("dataset_name", config_name="local_storage_config")
Other operations such as uploading data and reading data, are the same as datasets created by default, except that the uploaded data is stored under the local storage.