Sentence#
Sentence label is the transcripted sentence of a piece of audio, which is often used for autonomous speech recognition.
Each audio can be assigned with multiple sentence labels.
The structure of one sentence label is like:
{
"sentence": [
{
"text": <str>
"begin": <float>
"end": <float>
}
...
...
],
"spell": [
{
"text": <str>
"begin": <float>
"end": <float>
}
...
...
],
"phone": [
{
"text": <str>
"begin": <float>
"end": <float>
}
...
...
],
"attributes": {
<key>: <value>
...
...
}
}
To create a LabeledSentence
label:
>>> from tensorbay.label import LabeledSentence
>>> from tensorbay.label import Word
>>> sentence_label = LabeledSentence(
... sentence=[Word("text", 1.1, 1.6)],
... spell=[Word("spell", 1.1, 1.6)],
... phone=[Word("phone", 1.1, 1.6)],
... attributes={"<LABEL_ATTRIBUTE_NAME>": "<LABEL_ATTRIBUTE_VALUE>"}
... )
>>> sentence_label
LabeledSentence(
(sentence): [
Word(
(text): 'text',
(begin): 1.1,
(end): 1.6
)
],
(spell): [
Word(
(text): 'text',
(begin): 1.1,
(end): 1.6
)
],
(phone): [
Word(
(text): 'text',
(begin): 1.1,
(end): 1.6
)
],
(attributes): {
'<LABEL_ATTRIBUTE_NAME>': '<LABEL_ATTRIBUTE_VALUE>'
}
Sentence.sentence#
The sentence
of a
LabeledSentence
is a list of
Word
,
representing the transcripted sentence of the audio.
Sentence.spell#
The spell
of a
LabeledSentence
is a list of
Word
,
representing the spell within the sentence.
It is only for Chinese language.
Sentence.phone#
The phone
of a
LabeledSentence
is a list of
Word
,
representing the phone of the sentence label.
Word#
Word
is the basic component of a phonetic transcription sentence,
containing the content of the word, the start and the end time in the audio.
>>> from tensorbay.label import Word
>>> Word("text", 1.1, 1.6)
Word(
(text): 'text',
(begin): 1,
(end): 2
)
sentence
,
spell
,
and phone
of a sentence label all compose of
Word
.
Sentence.attributes#
The attributes of the transcripted sentence. See attributes for details.
SentenceSubcatalog#
Before adding sentence labels to the dataset,
SentenceSubcatalog
should be defined.
Besides attributes information in
SentenceSubcatalog
,
it also has is_sample
,
sample_rate
and lexicon
.
to describe the transcripted sentences of the audio.
The catalog with only Sentence subcatalog is typically stored in a json file as follows:
{
"SENTENCE": { <object>*
"isSample": <boolean>! -- Whether the unit of sampling points in Sentence label is the
number of samples. The default value is false and the units
are seconds.
"sampleRate": <number> -- Audio sampling frequency whose unit is Hz. It is required
when "isSample" is true.
"description": <string>! -- Subcatalog description, (default: "").
"attributes": [ <array> -- Attribute list, which contains all attribute information.
{
"name": <string>* -- Attribute name.
"enum": [...], <array> -- All possible options for the attribute.
"type": <string or array> -- Type of the attribute including "boolean", "integer",
"number", "string", "array" and "null". And it is not
required when "enum" is provided.
"minimum": <number> -- Minimum value of the attribute when type is "number".
"maximum": <number> -- Maximum value of the attribute when type is "number".
"items": { <object> -- Used only if the attribute type is "array".
"enum": [...], <array> -- All possible options for elements in the attribute array.
"type": <string or array> -- Type of elements in the attribute array.
"minimum": <number> -- Minimum value of elements in the attribute array when type is
"number".
"maximum": <number> -- Maximum value of elements in the attribute array when type is
"number".
},
"description": <string>! -- Attribute description, (default: "").
},
...
...
]
"lexicon": [ <array> -- A list consists all of text and phone.
[
text, <string> -- Word.
phone, <string> -- Corresponding phonemes.
phone, <string> -- Corresponding phonemes (A word can correspond to more than
one phoneme).
...
],
...
]
}
}
Note
*
indicates that the field is required. !
indicates that the field has a default value.
Besides giving the parameters while initializing
SentenceSubcatalog
,
it’s also feasible to set them after initialization.
>>> from tensorbay.label import SentenceSubcatalog
>>> sentence_subcatalog = SentenceSubcatalog()
>>> sentence_subcatalog.is_sample = True
>>> sentence_subcatalog.sample_rate = 5
>>> sentence_subcatalog.append_lexicon(["text", "spell", "phone"])
>>> sentence_subcatalog
SentenceSubcatalog(
(is_sample): True,
(sample_rate): 5,
(lexicon): [...]
)
To add a LabeledSentence
label to one data:
>>> from tensorbay.dataset import Data
>>> data = Data("<DATA_LOCAL_PATH>")
>>> data.label.sentence = []
>>> data.label.sentence.append(sentence_label)
Note
One data may contain multiple Sentence labels,
so the Data.label.sentence
must be a list.