BSTLD#
This topic describes how to manage the BSTLD Dataset, which is a dataset with Box2D label(Fig. 1).

Fig. 1 The preview of a cropped image with labels from “BSTLD”.#
Create Dataset#
gas.create_dataset("BSTLD")
Organize Dataset#
Normally, dataloader.py
and catalog.json
are required to organize the “BSTLD” dataset into the Dataset
instance.
In this example, they are stored in the same directory like:
BSTLD/
catalog.json
dataloader.py
Step 1: Write the Catalog#
A catalog contains all label information of one dataset, which
is typically stored in a json file like catalog.json
.
1{
2 "BOX2D": {
3 "categories": [
4 { "name": "Red" },
5 { "name": "RedLeft" },
6 { "name": "RedRight" },
7 { "name": "RedStraight" },
8 { "name": "RedStraightLeft" },
9 { "name": "Green" },
10 { "name": "GreenLeft" },
11 { "name": "GreenRight" },
12 { "name": "GreenStraight" },
13 { "name": "GreenStraightLeft" },
14 { "name": "GreenStraigntRight" },
15 { "name": "Yellow" },
16 { "name": "off" }
17 ],
18 "attributes": [
19 {
20 "name": "occluded",
21 "type": "boolean"
22 }
23 ]
24 }
25}
The only annotation type for “BSTLD” is Box2D, and there are 13 category types and one attributes type.
Note
By passing the path of the catalog.json
, load_catalog()
supports loading the catalog into dataset.
Important
See catalog table for more catalogs with different label types.
Step 2: Write the Dataloader#
A dataloader is needed to organize the dataset into a Dataset
instance.
1#!/usr/bin/env python3
2#
3# Copytright 2021 Graviti. Licensed under MIT License.
4#
5# pylint: disable=invalid-name
6
7"""Dataloader of BSTLD dataset."""
8
9import os
10
11from tensorbay.dataset import Data, Dataset
12from tensorbay.exception import ModuleImportError
13from tensorbay.label import LabeledBox2D
14
15DATASET_NAME = "BSTLD"
16
17_LABEL_FILENAME_DICT = {
18 "test": "test.yaml",
19 "train": "train.yaml",
20 "additional": "additional_train.yaml",
21}
22
23
24def BSTLD(path: str) -> Dataset:
25 """`BSTLD <https://hci.iwr.uni-heidelberg.de/content\
26 /bosch-small-traffic-lights-dataset>`_ dataset.
27
28 The file structure should be like::
29
30 <path>
31 rgb/
32 additional/
33 2015-10-05-10-52-01_bag/
34 <image_name>.jpg
35 ...
36 ...
37 test/
38 <image_name>.jpg
39 ...
40 train/
41 2015-05-29-15-29-39_arastradero_traffic_light_loop_bag/
42 <image_name>.jpg
43 ...
44 ...
45 test.yaml
46 train.yaml
47 additional_train.yaml
48
49 Arguments:
50 path: The root directory of the dataset.
51
52 Raises:
53 ModuleImportError: When the module "yaml" can not be found.
54
55 Returns:
56 Loaded :class:`~tensorbay.dataset.dataset.Dataset` instance.
57
58 """
59 try:
60 import yaml # pylint: disable=import-outside-toplevel
61 except ModuleNotFoundError as error:
62 raise ModuleImportError(module_name=error.name, package_name="pyyaml") from error
63
64 root_path = os.path.abspath(os.path.expanduser(path))
65
66 dataset = Dataset(DATASET_NAME)
67 dataset.load_catalog(os.path.join(os.path.dirname(__file__), "catalog.json"))
68
69 for mode, label_file_name in _LABEL_FILENAME_DICT.items():
70 segment = dataset.create_segment(mode)
71 label_file_path = os.path.join(root_path, label_file_name)
72
73 with open(label_file_path, encoding="utf-8") as fp:
74 labels = yaml.load(fp, yaml.FullLoader)
75
76 for label in labels:
77 if mode == "test":
78 # the path in test label file looks like:
79 # /absolute/path/to/<image_name>.png
80 file_path = os.path.join(root_path, "rgb", "test", label["path"].rsplit("/", 1)[-1])
81 else:
82 # the path in label file looks like:
83 # ./rgb/additional/2015-10-05-10-52-01_bag/<image_name>.png
84 file_path = os.path.join(root_path, *label["path"][2:].split("/"))
85 data = Data(file_path)
86 data.label.box2d = [
87 LabeledBox2D(
88 box["x_min"],
89 box["y_min"],
90 box["x_max"],
91 box["y_max"],
92 category=box["label"],
93 attributes={"occluded": box["occluded"]},
94 )
95 for box in label["boxes"]
96 ]
97 segment.append(data)
98
99 return dataset
See Box2D annotation for more details.
There are already a number of dataloaders in TensorBay SDK provided by the community. Thus, in addition to writing, importing an available dataloader is also feasible.
from tensorbay.opendataset import BSTLD
dataset = BSTLD("<path/to/dataset>")
Note
Note that catalogs are automatically loaded in available dataloaders, users do not have to write them again.
Important
See dataloader table for dataloaders with different label types.
Visualize Dataset#
Optionally, the organized dataset can be visualized by Pharos, which is a TensorBay SDK plug-in. This step can help users to check whether the dataset is correctly organized. Please see Visualization for more details.
Upload Dataset#
The organized “BSTLD” dataset can be uploaded to TensorBay for sharing, reuse, etc.
dataset_client = gas.upload_dataset(dataset, jobs=8, skip_uploaded_files=True)
dataset_client.commit("initial commit")
Note
Set skip_uploaded_files=True to skip uploaded data. The data will be skiped if its name and segment name is the same as remote data.
Similar with Git, the commit step after uploading can record changes to the dataset as a version. If needed, do the modifications and commit again. Please see Version Control for more details.
Read Dataset#
Now “BSTLD” dataset can be read from TensorBay.
dataset = Dataset("BSTLD", gas)
In dataset “BSTLD”, there are three
segments: train
, test
and additional
.
Get the segment names by listing them all.
dataset.keys()
Get a segment by passing the required segment name.
first_segment = dataset[0]
train_segment = dataset["train"]
In the train segment, there is a sequence of data, which can be obtained by index.
data = train_segment[3]
In each data, there is a sequence of Box2D annotations, which can be obtained by index.
label_box2d = data.label.box2d[0]
category = label_box2d.category
attributes = label_box2d.attributes
There is only one label type in “BSTLD” dataset, which is box2d
.
The information stored in category is
one of the names in “categories” list of catalog.json. The information stored
in attributes is one or several of the attributes in “attributes” list of catalog.json.
See Box2D label format for more details.
Delete Dataset#
gas.delete_dataset("BSTLD")