Sentence

Sentence label is the transcripted sentence of a piece of audio, which is often used for autonomous speech recognition.

Each audio can be assigned with multiple sentence labels.

The structure of one sentence label is like:

{
    "sentence": [
        {
            "text":  <str>
            "begin": <float>
            "end":   <float>
        }
        ...
        ...
    ],
    "spell": [
        {
            "text":  <str>
            "begin": <float>
            "end":   <float>
        }
        ...
        ...
    ],
    "phone": [
        {
            "text":  <str>
            "begin": <float>
            "end":   <float>
        }
        ...
        ...
    ],
    "attributes": {
        <key>: <value>
        ...
        ...
    }
}

To create a LabeledSentence label:

>>> from tensorbay.label import LabeledSentence
>>> from tensorbay.label import Word
>>> sentence_label = LabeledSentence(
... sentence=[Word("text", 1.1, 1.6)],
... spell=[Word("spell", 1.1, 1.6)],
... phone=[Word("phone", 1.1, 1.6)],
... attributes={"attribute_name": "attribute_value"}
... )
>>> sentence_label
LabeledSentence(
  (sentence): [
    Word(
      (text): 'text',
      (begin): 1.1,
      (end): 1.6
    )
  ],
  (spell): [
    Word(
      (text): 'text',
      (begin): 1.1,
      (end): 1.6
    )
  ],
  (phone): [
    Word(
      (text): 'text',
      (begin): 1.1,
      (end): 1.6
    )
  ],
  (attributes): {
    'attribute_name': 'attribute_value'
  }

Sentence.sentence

The sentence of a LabeledSentence is a list of Word, representing the transcripted sentence of the audio.

Sentence.spell

The spell of a LabeledSentence is a list of Word, representing the spell within the sentence.

It is only for Chinese language.

Sentence.phone

The phone of a LabeledSentence is a list of Word, representing the phone of the sentence label.

Word

Word is the basic component of a phonetic transcription sentence, containing the content of the word, the start and the end time in the audio.

>>> from tensorbay.label import Word
>>> Word("text", 1.1, 1.6)
Word(
  (text): 'text',
  (begin): 1,
  (end): 2
)

sentence, spell, and phone of a sentence label all compose of Word.

Sentence.attributes

The attributes of the transcripted sentence. See attributes for details.

SentenceSubcatalog

Before adding sentence labels to the dataset, SentenceSubcatalog should be defined.

Besides attributes information in SentenceSubcatalog, it also has is_sample, sample_rate and lexicon. to describe the transcripted sentences of the audio.

The catalog with only Sentence subcatalog is typically stored in a json file as follows:

{
    "SENTENCE": {                                     <object>*
        "isSample":                                  <boolean>! -- Whether the unit of sampling points in Sentence label is the
                                                                   number of samples. The default value is false and the units
                                                                   are seconds.
        "sampleRate":                                 <number>  -- Audio sampling frequency whose unit is Hz. It is required
                                                                   when "isSample" is true.
        "description":                                <string>! -- Subcatalog description, (default: "").
        "attributes": [                                <array>  -- Attribute list, which contains all attribute information.
            {
                "name":                               <string>* -- Attribute name.
                "enum": [...],                         <array>  -- All possible options for the attribute.
                "type":                      <string or array>  -- Type of the attribute including "boolean", "integer",
                                                                   "number", "string", "array" and "null". And it is not
                                                                   required when "enum" is provided.
                "minimum":                            <number>  -- Minimum value of the attribute when type is "number".
                "maximum":                            <number>  -- Maximum value of the attribute when type is "number".
                "items": {                            <object>  -- Used only if the attribute type is "array".
                    "enum": [...],                     <array>  -- All possible options for elements in the attribute array.
                    "type":                  <string or array>  -- Type of elements in the attribute array.
                    "minimum":                        <number>  -- Minimum value of elements in the attribute array when type is
                                                                   "number".
                    "maximum":                        <number>  -- Maximum value of elements in the attribute array when type is
                                                                   "number".
                },
                "description":                        <string>! -- Attribute description, (default: "").
            },
            ...
            ...
        ]
        "lexicon": [                                   <array>  -- A list consists all of text and phone.
            [
                text,                                 <string>  -- Word.
                phone,                                <string>  -- Corresponding phonemes.
                phone,                                <string>  -- Corresponding phonemes (A word can correspond to more than
                                                                   one phoneme).
                ...
            ],
            ...
        ]
    }
}

Note

* indicates that the field is required. ! indicates that the field has a default value.

Besides giving the parameters while initializing SentenceSubcatalog, it’s also feasible to set them after initialization.

>>> from tensorbay.label import SentenceSubcatalog
>>> sentence_subcatalog = SentenceSubcatalog()
>>> sentence_subcatalog.is_sample = True
>>> sentence_subcatalog.sample_rate = 5
>>> sentence_subcatalog.append_lexicon(["text", "spell", "phone"])
>>> sentence_subcatalog
SentenceSubcatalog(
  (is_sample): True,
  (sample_rate): 5,
  (lexicon): [...]
)

To add a LabeledSentence label to one data:

>>> from tensorbay.dataset import Data
>>> data = Data("local_path")
>>> data.label.sentence = []
>>> data.label.sentence.append(sentence_label)

Note

One data may contain multiple Sentence labels, so the Data.label.sentence must be a list.