Cache#

This topic describes how to use cache while opening remote data on Tensorbay.

While using online data, sometimes it may be neccessary to use the entire dataset multiple times, such as training model.

This would cause redundant requests and responses between the local computer and TensorBay, and cost extra time.

Therefore, TensorBaySDK provides caching to speed up data access and reduce repeated requests.

Get Remote Dataset#

To use the cache, first get the remote dataset on TensorBay.

from tensorbay import GAS
from tensorbay.dataset import Dataset

# Please visit `https://gas.graviti.com/tensorbay/developer` to get the AccessKey.
gas = GAS("<YOUR_ACCESSKEY>")
dataset = Dataset("<DATASET_NAME>", gas)

Enable Cache#

Then use enable_cache() to start using cache for this dataset. The cache path is set in the temporary directory by default, which differs according to the system.

dataset.enable_cache()

It’s also feasible to pass a custom cache path to the function as below.

dataset.enable_cache("<path/to/cache/folder>")

Note

Please make sure there is enough free storage space to cache the dataset.

Use cache_enabled to check whether the cache is in use.

print(dataset.cache_enabled)
# True

Note

Cache is not available for datasets in draft status. The dataset.cache_enabled will remain False for datasets in draft status, even if the cache has already been set by dataset.enable_cache().

Use Data#

After enabling the cache, use the data as desired. Note that the cache works when the data.open() method is called, and only data and mask labels will be cached.

segment = dataset[0]
MAX_EPOCH = 100
for epoch in range(MAX_EPOCH):
    for data in segment:
        data.open()
        # code using opened data here

Delete Cache Data#

After use, according to the cache path, the cache data can be deleted as needed.

Note that if the default cache path is used, the cache will be removed automatically when the computer restarts.