Label Format

TensorBay supports multiple types of labels.

Each Data instance can have multiple types of label.

And each type of label is supported with a specific label class, and has a corresponding subcatalog class.

Table 7 supported label types

supported label types

label classes

subcatalog classes































Common Label Properties

Different types of labels contain different aspects of annotation information about the data. Some are more general, and some are unique to a specific label type.

Three common properties of a label will be introduced first, and the unique ones will be explained under the corresponding type of label.

Take a 2D box label as an example:

>>> from tensorbay.label import LabeledBox2D
>>> label = LabeledBox2D(
... 10, 20, 30, 40,
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> label
LabeledBox2D(10, 20, 30, 40)(
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


Category is a string indicating the class of the labeled object.

>>> label.category


Attributes are the additional information about this data, and there is no limit on the number of attributes.

The attribute names and values are stored in key-value pairs.

>>> label.attributes
{'attribute_name': 'attribute_value'}


Instance is the unique id for the object inside of the label, which is mostly used for tracking tasks.

>>> label.instance

Common Subcatalog Properties

Before creating a label or adding a label to data, it’s necessary to define the annotation rules of the specific label type inside the dataset. This task is done by subcatalog.

Different label types have different subcatalog classes.

Take Box2DSubcatalog as an example to describe some common features of subcatalog.

>>> from tensorbay.label import Box2DSubcatalog
>>> box2d_subcatalog = Box2DSubcatalog(is_tracking=True)
>>> box2d_subcatalog
   (is_tracking): True

tracking information

If the label of this type in the dataset has the information of instance IDs, then the subcatalog should set a flag to show its support for tracking information.

Pass True to the is_tracking parameter while creating the subcatalog, or set the is_tracking attr after initialization.

>>> box2d_subcatalog.is_tracking = True

category information

If the label of this type in the dataset has category, then the subcatalog should contain all the optional categories.

Each category of a label appeared in the dataset should be within the categories of the subcatalog.

Category information can be added to the subcatalog.

>>> box2d_subcatalog.add_category(name="cat", description="The Flerken")
>>> box2d_subcatalog.categories
NameList [

CategoryInfo is used to describe a category. See details in CategoryInfo.

attributes information

If the label of this type in the dataset has attributes, then the subcatalog should contain all the rules for different attributes.

Each attributes of a label appeared in the dataset should follow the rules set in the attributes of the subcatalog.

Attribute information ca be added to the subcatalog.

>>> box2d_subcatalog.add_attribute(
... name="attribute_name",
... type_="number",
... maximum=100,
... minimum=0,
... description="attribute description"
... )
>>> box2d_subcatalog.attributes
NameList [

AttributeInfo is used to describe the rules of an attributes, which refers to the Json schema method.

See details in AttributeInfo.

Other unique subcatalog features will be explained in the corresponding label type section.


Classification is to classify data into different categories.

It is the annotation for the entire file, so each data can only be assigned with one classification label.

Classification labels applies to different types of data, such as images and texts.

The structure of one classification label is like:

    "category": <str>
    "attributes": {
        <key>: <value>

To create a Classification label:

>>> from tensorbay.label import Classification
>>> classification_label = Classification(
... category="data_category",
... attributes={"attribute_name": "attribute_value"}
... )
>>> classification_label
  (category): 'data_category',
  (attributes): {...}


The category of the entire data file. See category for details.


The attributes of the entire data file. See attributes for details.


There must be either a category or attributes in one classification label.


Before adding the classification label to data, ClassificationSubcatalog should be defined.

ClassificationSubcatalog has categories and attributes information, see category information and attributes information for details.

To add a Classification label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.classification = classification_label


One data can only have one classification label.


Box2D is a type of label with a 2D bounding box on an image. It’s usually used for object detection task.

Each data can be assigned with multiple Box2D labels.

The structure of one Box2D label is like:

    "box2d": {
        "xmin": <float>
        "ymin": <float>
        "xmax": <float>
        "ymax": <float>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledBox2D label:

>>> from tensorbay.label import LabeledBox2D
>>> box2d_label = LabeledBox2D(
... xmin, ymin, xmax, ymax,
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> box2d_label
LabeledBox2D(xmin, ymin, xmax, ymax)(
  (category): 'category',
  (attributes): {...}
  (instance): 'instance_ID'


LabeledBox2D extends Box2D.

To construct a LabeledBox2D instance with only the geometry information, use the coordinates of the top-left and bottom-right vertexes of the 2D bounding box, or the coordinate of the top-left vertex, the height and the width of the bounding box.

>>> LabeledBox2D(10, 20, 30, 40)
LabeledBox2D(10, 20, 30, 40)()
>>> LabeledBox2D.from_xywh(x=10, y=20, width=20, height=20)
LabeledBox2D(10, 20, 30, 40)()

It contains the basic geometry information of the 2D bounding box.

>>> box2d_label.xmin
>>> box2d_label.ymin
>>> box2d_label.xmax
>>> box2d_label.ymax
Vector2D(30, 40)
Vector2D(10, 20)
>>> box2d_label.area()


The category of the object inside the 2D bounding box. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique ID for the object inside of the 2D bounding box, which is mostly used for tracking tasks. See instance for details.


Before adding the Box2D labels to data, Box2DSubcatalog should be defined.

Box2DSubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledBox2D label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.box2d = []
>>> data.label.box2d.append(box2d_label)


One data may contain multiple Box2D labels, so the Data.label.box2d must be a list.


Box3D is a type of label with a 3D bounding box on point cloud, which is often used for 3D object detection.

Currently, Box3D labels applies to point data only.

Each point cloud can be assigned with multiple Box3D label.

The structure of one Box3D label is like:

    "box3d": {
        "translation": {
            "x": <float>
            "y": <float>
            "z": <float>
        "rotation": {
            "w": <float>
            "x": <float>
            "y": <float>
            "z": <float>
        "size": {
            "x": <float>
            "y": <float>
            "z": <float>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledBox3D label:

>>> from tensorbay.label import LabeledBox3D
>>> box3d_label = LabeledBox3D(
... size=[10, 20, 30],
... translation=[0, 0, 0],
... rotation=[1, 0, 0, 0],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> box3d_label
  (size): Vector3D(10, 20, 30),
  (translation): Vector3D(0, 0, 0),
  (rotation): quaternion(1.0, 0.0, 0.0, 0.0),
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledBox3D extends Box3D.

To construct a LabeledBox3D instance with only the geometry information, use the transform matrix and the size of the 3D bounding box, or use translation and rotation to represent the transform of the 3D bounding box.

>>> LabeledBox3D(
... size=[10, 20, 30],
... transform_matrix=[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]],
... )
  (size): Vector3D(10, 20, 30)
  (translation): Vector3D(0, 0, 0),
  (rotation): quaternion(1.0, -0.0, -0.0, -0.0),
>>> LabeledBox3D(
... size=[10, 20, 30],
... translation=[0, 0, 0],
... rotation=[1, 0, 0, 0],
... )
  (size): Vector3D(10, 20, 30)
  (translation): Vector3D(0, 0, 0),
  (rotation): quaternion(1.0, 0.0, 0.0, 0.0),

It contains the basic geometry information of the 3D bounding box.

>>> box3d_label.transform
  (translation): Vector3D(0, 0, 0),
  (rotation): quaternion(1.0, 0.0, 0.0, 0.0)
>>> box3d_label.translation
Vector3D(0, 0, 0)
>>> box3d_label.rotation
quaternion(1.0, 0.0, 0.0, 0.0)
>>> box3d_label.size
Vector3D(10, 20, 30)
>>> box3d_label.volumn()


The category of the object inside the 3D bounding box. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique id for the object inside of the 3D bounding box, which is mostly used for tracking tasks. See instance for details.


Before adding the Box3D labels to data, Box3DSubcatalog should be defined.

Box3DSubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledBox3D label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.box3d = []
>>> data.label.box3d.append(box3d_label)


One data may contain multiple Box3D labels, so the Data.label.box3d must be a list.


Keypoints2D is a type of label with a set of 2D keypoints. It is often used for animal and human pose estimation.

Keypoints2D labels mostly applies to images.

Each data can be assigned with multiple Keypoints2D labels.

The structure of one Keypoints2D label is like:

    "keypoints2d": [
        { "x": <float>
          "y": <float>
          "v": <int>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledKeypoints2D label:

>>> from tensorbay.label import LabeledKeypoints2D
>>> keypoints2d_label = LabeledKeypoints2D(
... [[10, 20], [15, 25], [20, 30]],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> keypoints2d_label
LabeledKeypoints2D [
  Keypoint2D(10, 20),
  Keypoint2D(15, 25),
  Keypoint2D(20, 30)
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledKeypoints2D extends Keypoints2D.

To construct a LabeledKeypoints2D instance with only the geometry information, The coordinates of the set of 2D keypoints are necessary. The visible status of each 2D keypoint is optional.

>>> LabeledKeypoints2D([[10, 20], [15, 25], [20, 30]])
LabeledKeypoints2D [
  Keypoint2D(10, 20),
  Keypoint2D(15, 25),
  Keypoint2D(20, 30)
>>> LabeledKeypoints2D([[10, 20, 0], [15, 25, 1], [20, 30, 1]])
LabeledKeypoints2D [
  Keypoint2D(10, 20, 0),
  Keypoint2D(15, 25, 1),
  Keypoint2D(20, 30, 1)

It contains the basic geometry information of the 2D keypoints, which can be obtained by index.

>>> keypoints2d_label[0]
Keypoint2D(10, 20)


The category of the object inside the 2D keypoints. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique ID for the object inside of the 2D keypoints, which is mostly used for tracking tasks. See instance for details.


Before adding 2D keypoints labels to the dataset, Keypoints2DSubcatalog should be defined.

Besides attributes information, category information, tracking information in Keypoints2DSubcatalog, it also has keypoints to describe a set of keypoints corresponding to certain categories.

>>> from tensorbay.label import Keypoints2DSubcatalog
>>> keypoints2d_subcatalog = Keypoints2DSubcatalog()
>>> keypoints2d_subcatalog.add_keypoints(
... 3,
... names=["head", "body", "feet"],
... skeleton=[[0, 1], [1, 2]],
... visible="BINARY",
... parent_categories=["cat"],
... description="keypoints of cats"
... )
>>> keypoints2d_subcatalog.keypoints
   (number): 3,
   (names): [...],
   (skeleton): [...],
   (visible): 'BINARY',
   (parent_categories): [...]

KeypointsInfo is used to describe a set of 2D keypoints.

The first parameter of add_keypoints() is the number of the set of 2D keypoints, which is required.

The names is a list of string representing the names for each 2D keypoint, the length of which is consistent with the number.

The skeleton is a two-dimensional list indicating the connection between the keypoints.

The visible is the visible status that limits the v of Keypoint2D. It can only be “BINARY” or “TERNARY”.

See details in Keypoint2D.

The parent_categories is a list of categories indicating to which category the keypoints rule applies.

Mostly, parent_categories is not given, which means the keypoints rule applies to all the categories of the entire dataset.

To add a LabeledKeypoints2D label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.keypoints2d = []
>>> data.label.keypoints2d.append(keypoints2d_label)


One data may contain multiple Keypoints2D labels, so the Data.label.keypoints2d must be a list.


Polygon is a type of label with a polygonal region on an image which contains some semantic information. It’s often used for CV tasks such as semantic segmentation.

Each data can be assigned with multiple Polygon labels.

The structure of one Polygon label is like:

    "polygon": [
            "x": <float>
            "y": <float>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledPolygon label:

>>> from tensorbay.label import LabeledPolygon
>>> polygon_label = LabeledPolygon(
... [(1, 2), (2, 3), (1, 3)],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> polygon_label
LabeledPolygon [
  Vector2D(1, 2),
  Vector2D(2, 3),
  Vector2D(1, 3)
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledPolygon extends Polygon.

To construct a LabeledPolygon instance with only the geometry information, use the coordinates of the vertexes of the polygonal region.

>>> LabeledPolygon([(1, 2), (2, 3), (1, 3)])
LabeledPolygon [
  Vector2D(1, 2),
  Vector2D(2, 3),
  Vector2D(1, 3)

It contains the basic geometry information of the polygonal region.

>>> polygon_label.area()


The category of the object inside the polygonal region. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique id for the object inside of the polygonal region, which is mostly used for tracking tasks. See instance for details.


Before adding the Polygon labels to data, PolygonSubcatalog should be defined.

PolygonSubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledPolygon label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.polygon = []
>>> data.label.polygon.append(polygon_label)


One data may contain multiple Polygon labels, so the Data.label.polygon must be a list.


MultiPolygon is a type of label with several polygonal regions which contain same semantic information on an image. It’s often used for CV tasks such as semantic segmentation.

Each data can be assigned with multiple MultiPolygon labels.

The structure of one MultiPolygon label is like:

    "multiPolygon": [
                "x": <float>
                "y": <float>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledMultiPolygon label:

>>> from tensorbay.label import LabeledMultiPolygon
>>> multipolygon_label = LabeledMultiPolygon(
... [[(1.0, 2.0), (2.0, 3.0), (1.0, 3.0)], [(1.0, 4.0), (2.0, 3.0), (1.0, 8.0)]],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> multipolygon_label
LabeledMultiPolygon [
  Polygon [...],
  Polygon [...]
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledMultiPolygon extends MultiPolygon.

To construct a LabeledMultiPolygon instance with only the geometry information, use the coordinates of the vertexes of polygonal regions.

>>> LabeledMultiPolygon([[[1.0, 4.0], [2.0, 3.7], [7.0, 4.0]],
... [[5.0, 7.0], [6.0, 7.0], [9.0, 8.0]]])
LabeledMultiPolygon [
  Polygon [...],
  Polygon [...]


The category of the object inside polygonal regions. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique id for the object inside of polygonal regions, which is mostly used for tracking tasks. See instance for details.


Before adding the MultiPolygon labels to data, MultiPolygonSubcatalog should be defined.

MultiPolygonSubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledMultiPolygon label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.multi_polygon = []
>>> data.label.multi_polygon.append(multipolygon_label)


One data may contain multiple MultiPolygon labels, so the Data.label.multi_polygon must be a list.


RLE, Run-Length Encoding, is a type of label with a list of numbers to indicate whether the pixels are in the target region. It’s often used for CV tasks such as semantic segmentation.

Each data can be assigned with multiple RLE labels.

The structure of one RLE label is like:

    "rle": [
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledRLE label:

>>> from tensorbay.label import LabeledRLE
>>> rle_label = LabeledRLE(
... [8, 4, 1, 3, 12, 7, 16, 2, 9, 2],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> rle_label
LabeledRLE [
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledRLE extends RLE.

To construct a LabeledRLE instance with only the rle format mask.

>>> LabeledRLE([8, 4, 1, 3, 12, 7, 16, 2, 9, 2])
LabeledRLE [


The category of the object inside the region represented by rle format mask. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique id for the object inside the region represented by rle format mask, which is mostly used for tracking tasks. See instance for details.


Before adding the RLE labels to data, RLESubcatalog should be defined.

RLESubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledRLE label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.rle = []
>>> data.label.rle.append(rle_label)


One data may contain multiple RLE labels, so the Data.label.rle must be a list.


Polyline2D is a type of label with a 2D polyline on an image. It’s often used for CV tasks such as lane detection.

Each data can be assigned with multiple Polyline2D labels.

The structure of one Polyline2D label is like:

    "polyline2d": [
            "x": <float>
            "y": <float>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledPolyline2D label:

>>> from tensorbay.label import LabeledPolyline2D
>>> polyline2d_label = LabeledPolyline2D(
... [(1, 2), (2, 3)],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> polyline2d_label
LabeledPolyline2D [
  Vector2D(1, 2),
  Vector2D(2, 3)
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledPolyline2D extends Polyline2D.

To construct a LabeledPolyline2D instance with only the geometry information, use the coordinates of the vertexes of the polyline.

>>> LabeledPolyline2D([[1, 2], [2, 3]])
LabeledPolyline2D [
  Vector2D(1, 2),
  Vector2D(2, 3)

It contains a series of methods to operate on polyline.

>>> polyline_1 = LabeledPolyline2D([[1, 1], [1, 2], [2, 2]])
>>> polyline_2 = LabeledPolyline2D([[4, 5], [2, 1], [3, 3]])
>>> LabeledPolyline2D.uniform_frechet_distance(polyline_1, polyline_2)
>>> LabeledPolyline2D.similarity(polyline_1, polyline_2)


The category of the 2D polyline. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique ID for the 2D polyline, which is mostly used for tracking tasks. See instance for details.


Before adding the Polyline2D labels to data, Polyline2DSubcatalog should be defined.

Polyline2DSubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledPolyline2D label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.polyline2d = []
>>> data.label.polyline2d.append(polyline2d_label)


One data may contain multiple Polyline2D labels, so the Data.label.polyline2d must be a list.


MultiPolyline2D is a type of label with several 2D polylines which belong to the same category on an image. It’s often used for CV tasks such as lane detection.

Each data can be assigned with multiple MultiPolyline2D labels.

The structure of one MultiPolyline2D label is like:

    "multiPolyline2d": [
                "x": <float>
                "y": <float>
    "category": <str>
    "attributes": {
        <key>: <value>
    "instance": <str>

To create a LabeledMultiPolyline2D label:

>>> from tensorbay.label import LabeledMultiPolyline2D
>>> multipolyline2d_label = LabeledMultiPolyline2D(
... [[[1, 2], [2, 3]], [[3, 4], [6, 8]]],
... category="category",
... attributes={"attribute_name": "attribute_value"},
... instance="instance_ID"
... )
>>> multipolyline2d_label
LabeledMultiPolyline2D [
  Polyline2D [...],
  Polyline2D [...]
  (category): 'category',
  (attributes): {...},
  (instance): 'instance_ID'


LabeledMultiPolyline2D extends MultiPolyline2D.

To construct a LabeledMultiPolyline2D instance with only the geometry information, use the coordinates of the vertexes of polylines.

>>> LabeledMultiPolyline2D([[[1, 2], [2, 3]], [[3, 4], [6, 8]]])
LabeledMultiPolyline2D [
  Polyline2D [...],
  Polyline2D [...]


The category of the multiple 2D polylines. See category for details.


Attributes are the additional information about this object, which are stored in key-value pairs. See attributes for details.


Instance is the unique ID for the multiple 2D polylines, which is mostly used for tracking tasks. See instance for details.


Before adding the MultiPolyline2D labels to data, MultiPolyline2DSubcatalog should be defined.

MultiPolyline2DSubcatalog has categories, attributes and tracking information, see category information, attributes information and tracking information for details.

To add a LabeledMultiPolyline2D label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.multi_polyline2d = []
>>> data.label.multi_polyline2d.append(multipolyline2d_label)


One data may contain multiple MultiPolyline2D labels, so the Data.label.multi_polyline2d must be a list.


Sentence label is the transcripted sentence of a piece of audio, which is often used for autonomous speech recognition.

Each audio can be assigned with multiple sentence labels.

The structure of one sentence label is like:

    "sentence": [
            "text":  <str>
            "begin": <float>
            "end":   <float>
    "spell": [
            "text":  <str>
            "begin": <float>
            "end":   <float>
    "phone": [
            "text":  <str>
            "begin": <float>
            "end":   <float>
    "attributes": {
        <key>: <value>

To create a LabeledSentence label:

>>> from tensorbay.label import LabeledSentence
>>> from tensorbay.label import Word
>>> sentence_label = LabeledSentence(
... sentence=[Word("text", 1.1, 1.6)],
... spell=[Word("spell", 1.1, 1.6)],
... phone=[Word("phone", 1.1, 1.6)],
... attributes={"attribute_name": "attribute_value"}
... )
>>> sentence_label
  (sentence): [
      (text): 'text',
      (begin): 1.1,
      (end): 1.6
  (spell): [
      (text): 'text',
      (begin): 1.1,
      (end): 1.6
  (phone): [
      (text): 'text',
      (begin): 1.1,
      (end): 1.6
  (attributes): {
    'attribute_name': 'attribute_value'


The sentence of a LabeledSentence is a list of Word, representing the transcripted sentence of the audio.


The spell of a LabeledSentence is a list of Word, representing the spell within the sentence.

It is only for Chinese language.

The phone of a LabeledSentence is a list of Word, representing the phone of the sentence label.


Word is the basic component of a phonetic transcription sentence, containing the content of the word, the start and the end time in the audio.

>>> from tensorbay.label import Word
>>> Word("text", 1.1, 1.6)
  (text): 'text',
  (begin): 1,
  (end): 2

sentence, spell, and phone of a sentence label all compose of Word.


The attributes of the transcripted sentence. See attributes information for details.


Before adding sentence labels to the dataset, SentenceSubcatalog should be defined.

Besides attributes information in SentenceSubcatalog, it also has is_sample, sample_rate and lexicon. to describe the transcripted sentences of the audio.

>>> from tensorbay.label import SentenceSubcatalog
>>> sentence_subcatalog = SentenceSubcatalog(
... is_sample=True,
... sample_rate=5,
... lexicon=[["word", "spell", "phone"]]
... )
>>> sentence_subcatalog
  (is_sample): True,
  (sample_rate): 5,
  (lexicon): [...]
>>> sentence_subcatalog.lexicon
[['word', 'spell', 'phone']]

The is_sample is a boolen value indicating whether time format is sample related.

The sample_rate is the number of samples of audio carried per second. If is_sample is Ture, then sample_rate must be provided.

The lexicon is a list consists all of text and phone.

Besides giving the parameters while initialing SentenceSubcatalog, it’s also feasible to set them after initialization.

>>> from tensorbay.label import SentenceSubcatalog
>>> sentence_subcatalog = SentenceSubcatalog()
>>> sentence_subcatalog.is_sample = True
>>> sentence_subcatalog.sample_rate = 5
>>> sentence_subcatalog.append_lexicon(["text", "spell", "phone"])
>>> sentence_subcatalog
  (is_sample): True,
  (sample_rate): 5,
  (lexicon): [...]

To add a LabeledSentence label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.sentence = []
>>> data.label.sentence.append(sentence_label)


One data may contain multiple Sentence labels, so the Data.label.sentence must be a list.