Cloud Storage¶

All data on TensorBay are hosted on cloud.

TensorBay supports two cloud storage modes:

DEFAULT CLOUD STORAGE: data are stored on TensorBay cloud

AUTHORIZED CLOUD STORAGE: data are stored on other providers’ cloud

Default Cloud Storage¶

In default cloud storage mode, data are stored on TensorBay cloud.

Create a dataset with default storage:

gas.create_dataset("DatasetName")

Authorized Cloud Storage¶

You can also upload data to your public cloud storage space.

Now TensorBay support following cloud providers:

Aliyun OSS

Amazon S3

Azure Blob

Config¶

See cloud storage instruction for details about how to configure cloud storage on TensorBay.

TensorBay SDK supports a method to list a user’s all previous configurations.

from tensorbay import GAS

gas = GAS("Accesskey-*****")
gas.list_auth_storage_configs()

Create Authorized Storage Dataset¶

Create a dataset with authorized cloud storage:

dataset_client = gas.create_auth_dataset("dataset_name", "config_name", "path/to/dataset")

Import Cloud Files into Authorized Storage Dataset¶

Take the following cloud directory as an example:

data/
├── images/
│   ├── 00001.png
│   ├── 00002.png
│   └── ...
├── labels/
│   ├── 00001.json
│   ├── 00002.json
│   └── ...
└── ...

Get a cloud client.

from tensorbay import GAS

gas = GAS("Accesskey-*****")
cloud_client = gas.get_cloud_client("config_name")

Import the AuthData from cloud platform and load label file to an authorized storage dataset.

import json

from tensorbay.dataset import Dataset
from tensorbay.label import Classification

# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("DatasetName")

# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()

images = cloud_client.list_auth_data("data/images")
labels = cloud_client.list_auth_data("data/labels")

for auth_data, label in zip(images, labels):
    with label.open() as fp:
        auth_data.label.classification = Classification.loads(json.load(fp))
    segment.append(auth_data)

dataset_client = gas.upload_dataset(dataset, jobs=8)

Important

Files will be copied from raw directory to the authorized cloud storage dataset path, thus the storage space will be doubled on the cloud platform.