Storage Config#

TensorBay supports two storage config modes:

  • GRAVITI Storage Config: storage config provided by graviti.

  • Authorized Storage Config: storage config provided by userself.

GRAVITI Storage Config#

In graviti storage mode, the data is stored in graviti storage space on TensorBay.

Authorized Storage Config#

When using authorized storage config, datasets are stored on user’s storage space and are only indexed to the TensorBay. See authorized storage instruction for details about how to configure authorized storage on TensorBay.

TensorBay supports both authorize cloud storage and authorize local storage.

Authorized Cloud Storage#

TensorBay SDK supports following methods to configure authorized cloud storage.

For example:

gas.create_oss_storage_config(
    "<OSS_CONFIG_NAME>",
    "<path/to/dataset>",
    endpoint="<YOUR_ENDPOINT>",  # like oss-cn-qingdao.aliyuncs.com
    accesskey_id="<YOUR_ACCESSKEYID>",
    accesskey_secret="<YOUR_ACCESSKEYSECRET>",
    bucket_name="<YOUR_BUCKETNAME>",
)

TensorBay SDK supports a method to list a user’s all previous configurations.

gas.list_auth_storage_configs()

Create Authorized Storage Dataset#

Create a dataset with authorized cloud storage:

dataset_client = gas.create_dataset("<DATASET_NAME>", config_name="<CONFIG_NAME>")

Import Cloud Files into Authorized Storage Dataset#

Take the following original cloud storage directory as an example:

data/
├── images/
│   ├── 00001.png
│   ├── 00002.png
│   └── ...
├── labels/
│   ├── 00001.json
│   ├── 00002.json
│   └── ...
└── ...

Get a cloud client.

from tensorbay import GAS

# Please visit `https://gas.graviti.com/tensorbay/developer` to get the AccessKey.
gas = GAS("<YOUR_ACCESSKEY>")
cloud_client = gas.get_cloud_client("<CONFIG_NAME>")

Import the AuthData from original cloud storage and load label file to an authorized storage dataset.

import json

from tensorbay.dataset import Dataset
from tensorbay.label import Classification

# Use AuthData to organize a dataset by the "Dataset" class before importing.
dataset = Dataset("<DATASET_NAME>")

# TensorBay uses "segment" to separate different parts in a dataset.
segment = dataset.create_segment()

images = cloud_client.list_auth_data("<data/images/>")
labels = cloud_client.list_auth_data("<data/labels/>")

for auth_data, label in zip(images, labels):
    with label.open() as fp:
        auth_data.label.classification = Classification.loads(json.load(fp))
    segment.append(auth_data)

dataset_client = gas.upload_dataset(dataset, jobs=8)

Important

Files will be copied from original directory to the authorized storage dataset path, thus the storage space will be doubled.

Note

Set the authorized storage dataset path the same as original cloud storage directory could speed up the import action. For example, set the config path of above dataset to data/images.

Authorized Local Storage#

If you want to use TensorBay service and have the data stored locally at the same time, TensorBay supports authorized local storage config.

Before creating the local storage config via create_local_storage_config(), you need to start a local storage service. Please contact us on TensorBay for more information.

gas.create_local_storage_config(
    name="<LOCAL_STORAGE_CONFIG>",
    file_path="<path/to/dataset>",
    endpoint="<external IP address of the local storage service>",
)

Then create an authorized local storage dataset with the config.

dataset_client = gas.create_dataset("<DATASET_NAME>", config_name="<LOCAL_STORAGE_CONFIG>")

Other operations such as uploading data and reading data, are the same as datasets created by default, except that the uploaded data is stored under the local storage.