Cache
This topic describes how to use cache while opening remote data on Tensorbay.
While using online data, sometimes it may be neccessary to use the entire dataset multiple times, such as training model.
This would cause redundant requests and responses between the local computer and TensorBay, and cost extra time.
Therefore, TensorBaySDK provides caching to speed up data access and reduce repeated requests.
Get Remote Dataset
To use the cache, first get the remote dataset on TensorBay.
from tensorbay import GAS
from tensorbay.dataset import Dataset
ACCESS_KEY = "Accesskey-*****"
gas = GAS(ACCESS_KEY)
dataset = Dataset("<DatasetName>", gas)
Enable Cache
Then use enable_cache()
to start using cache for this dataset.
The cache path is set in the temporary directory by default, which differs according to the system.
dataset.enable_cache()
It’s also feasible to pass a custom cache path to the function as below.
dataset.enable_cache("path/to/cache/folder")
Note
Please make sure there is enough free storage space to cache the dataset.
Use cache_enabled
to check whether the cache is in use.
print(dataset.cache_enabled)
# True
Note
Cache is not available for datasets in draft status.
The dataset.cache_enabled
will remain False
for datasets in draft status,
even if the cache has already been set by dataset.enable_cache()
.
Use Data
After enabling the cache, use the data as desired.
Note that the cache works when the data.open()
method is called,
and only data and mask labels will be cached.
segment = dataset[0]
MAX_EPOCH = 100
for epoch in range(MAX_EPOCH):
for data in segment:
data.open()
# code using opened data here
Delete Cache Data
After use, according to the cache path, the cache data can be deleted as needed.
Note that if the default cache path is used, the cache will be removed automatically when the computer restarts.