Cloud Storage

All data on TensorBay are hosted on cloud.
TensorBay supports two cloud storage modes:
  • DEFAULT CLOUD STORAGE: data are stored on TensorBay cloud

  • AUTHORIZED CLOUD STORAGE: data are stored on other providers’ cloud

Default Cloud Storage

In default cloud storage mode, data are stored on TensorBay cloud.
Create a dataset with default storage:
gas.create_dataset("DatasetName")

Authorized Cloud Storage

You can also upload data to your public cloud storage space.
Now TensorBay support following cloud providers:
  • Aliyun OSS

  • Amazon S3

  • Azure Blob

Config

See cloud storage instruction for details about how to configure cloud storage on TensorBay.

TensorBay SDK supports following methods to configure cloud storage.

For example:

gas.create_oss_storage_config(
    "oss_config",
    "tests",
    endpoint="<YOUR_ENDPOINT>",  # like oss-cn-qingdao.aliyuncs.com
    accesskey_id="<YOUR_ACCESSKEYID>",
    accesskey_secret="<YOUR_ACCESSKEYSECRET>",
    bucket_name="<YOUR_BUCKETNAME>",
)

TensorBay SDK supports a method to list a user’s all previous configurations.

gas.list_auth_storage_configs()

Create Authorized Storage Dataset

Create a dataset with authorized cloud storage:

dataset_client = gas.create_dataset("dataset_name", config_name="config_name")

Import Cloud Files into Authorized Storage Dataset

Take the following original cloud directory as an example:

data/
├── images/
│   ├── 00001.png
│   ├── 00002.png
│   └── ...
├── labels/
│   ├── 00001.json
│   ├── 00002.json
│   └── ...
└── ...

Get a cloud client.

from tensorbay import GAS

gas = GAS("Accesskey-*****")
cloud_client = gas.get_cloud_client("config_name")

Import the AuthData from cloud platform and load label file to an authorized storage dataset.

import json

from tensorbay.dataset import Dataset
from tensorbay.label import Classification

# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("DatasetName")

# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()

images = cloud_client.list_auth_data("data/images/")
labels = cloud_client.list_auth_data("data/labels/")

for auth_data, label in zip(images, labels):
    with label.open() as fp:
        auth_data.label.classification = Classification.loads(json.load(fp))
    segment.append(auth_data)

dataset_client = gas.upload_dataset(dataset, jobs=8)

Important

Files will be copied from original directory to the authorized cloud storage dataset path, thus the storage space will be doubled on the cloud platform.

Note

Set the authorized cloud storage dataset path the same as original directory could speed up the import action. For example, set the config path of above dataset to data/images.

Authorized Local Storage

If you want to use TensorBay service and have the data stored locally at the same time, TensorBay supports authorized local storage config.

Before creating the local storage config via create_local_storage_config(), you need to start a local storage service. Please contact us on TensorBay for more information.

gas.create_local_storage_config(
    name="local_storage_config",
    file_path="<path to store the datasets>",
    endpoint="<external IP address of the local storage service>",
)

Then create an authorized local storage dataset with the config.

dataset_client = gas.create_dataset("dataset_name", config_name="local_storage_config")

Other operations such as uploading data and reading data, are the same as datasets created by default, except that the uploaded data is stored under the local storage.