Cloud Storage

All data on TensorBay are hosted on cloud.
TensorBay supports two cloud storage modes:
  • DEFAULT CLOUD STORAGE: data are stored on TensorBay cloud

  • AUTHORIZED CLOUD STORAGE: data are stored on other providers’ cloud

Default Cloud Storage

In default cloud storage mode, data are stored on TensorBay cloud.
Create a dataset with default storage:

Authorized Cloud Storage

You can also upload data to your public cloud storage space.
Now TensorBay support following cloud providers:
  • Aliyun OSS

  • Amazon S3

  • Azure Blob


See cloud storage instruction for details about how to configure cloud storage on TensorBay.

TensorBay SDK supports following methods to configure cloud storage.

For example:

    endpoint="<YOUR_ENDPOINT>",  # like

TensorBay SDK supports a method to list a user’s all previous configurations.


Create Authorized Storage Dataset

Create a dataset with authorized cloud storage:

dataset_client = gas.create_dataset("dataset_name", config_name="config_name")

Import Cloud Files into Authorized Storage Dataset

Take the following original cloud directory as an example:

├── images/
│   ├── 00001.png
│   ├── 00002.png
│   └── ...
├── labels/
│   ├── 00001.json
│   ├── 00002.json
│   └── ...
└── ...

Get a cloud client.

from tensorbay import GAS

gas = GAS("Accesskey-*****")
cloud_client = gas.get_cloud_client("config_name")

Import the AuthData from cloud platform and load label file to an authorized storage dataset.

import json

from tensorbay.dataset import Dataset
from tensorbay.label import Classification

# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("DatasetName")

# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()

images = cloud_client.list_auth_data("data/images/")
labels = cloud_client.list_auth_data("data/labels/")

for auth_data, label in zip(images, labels):
    with as fp:
        auth_data.label.classification = Classification.loads(json.load(fp))

dataset_client = gas.upload_dataset(dataset, jobs=8)


Files will be copied from original directory to the authorized cloud storage dataset path, thus the storage space will be doubled on the cloud platform.


Set the authorized cloud storage dataset path the same as original directory could speed up the import action. For example, set the config path of above dataset to data/images.