tensorbay.label.label_sentence
Word, LabeledSentence, SentenceSubcatalog.
SentenceSubcatalog
defines the subcatalog for audio transcripted sentence type of labels.
Word
is a word within a phonetic transcription sentence,
containing the content of the word, the start and end time in the audio.
LabeledSentence
is the transcripted sentence type of label.
which is often used for tasks such as automatic speech recognition.
- class tensorbay.label.label_sentence.SentenceSubcatalog(is_sample=False, sample_rate=None, lexicon=None)[source]
Bases:
tensorbay.label.basic.SubcatalogBase
,tensorbay.label.supports.AttributesMixin
This class defines the subcatalog for audio transcripted sentence type of labels.
- Parameters
is_sample (bool) – A boolen value indicates whether time format is sample related.
sample_rate (int) – The number of samples of audio carried per second.
lexicon (List[List[str]]) – A list consists all of text and phone.
- Return type
None
- description
The description of the entire sentence subcatalog.
- Type
str
- is_sample
A boolen value indicates whether time format is sample related.
- Type
bool
- sample_rate
The number of samples of audio carried per second.
- Type
int
- lexicon
A list consists all of text and phone.
- Type
List[List[str]]
- attributes
All the possible attributes in the corresponding dataset stored in a
NameList
with the attribute names as keys and theAttributeInfo
as values.
- Raises
TypeError – When sample_rate is None and is_sample is True.
- Parameters
is_sample (bool) –
sample_rate (int) –
lexicon (List[List[str]]) –
- Return type
None
Examples
Initialization Method 1: Init from
SentenceSubcatalog.__init__()
.>>> SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) SentenceSubcatalog( (is_sample): True, (sample_rate): 16000, (lexicon): [...] )
Initialization Method 2: Init from
SentenceSubcatalog.loads()
method.>>> contents = { ... "isSample": True, ... "sampleRate": 16000, ... "lexicon": [["mean", "m", "iy", "n"]], ... "attributes": [{"name": "gender", "enum": ["male", "female"]}], ... } >>> SentenceSubcatalog.loads(contents) SentenceSubcatalog( (is_sample): True, (sample_rate): 16000, (attributes): NameList [...], (lexicon): [...] )
- dumps()[source]
Dumps the information of this SentenceSubcatalog into a dict.
- Returns
A dict containing all information of this SentenceSubcatalog.
- Return type
Dict[str, Any]
Examples
>>> sentence_subcatalog = SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) >>> sentence_subcatalog.dumps() {'isSample': True, 'sampleRate': 16000, 'lexicon': [['mean', 'm', 'iy', 'n']]}
- append_lexicon(lexemes)[source]
Add lexemes to lexicon.
- Parameters
lexemes (List[str]) – A list consists of text and phone.
- Return type
None
Examples
>>> sentence_subcatalog = SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) >>> sentence_subcatalog.append_lexicon(["example"]) >>> sentence_subcatalog.lexicon [['mean', 'm', 'iy', 'n'], ['example']]
- class tensorbay.label.label_sentence.Word(text, begin=None, end=None)[source]
Bases:
tensorbay.utility.repr.ReprMixin
,tensorbay.utility.attr.AttrsMixin
This class defines the concept of word.
Word
is a word within a phonetic transcription sentence, containing the content of the word, the start and end time in the audio.- Parameters
text (str) – The content of the word.
begin (float) – The begin time of the word in the audio.
end (float) – The end time of the word in the audio.
- text
The content of the word.
- Type
str
- begin
The begin time of the word in the audio.
- Type
float
- end
The end time of the word in the audio.
- Type
float
Examples
>>> Word(text="example", begin=1, end=2) Word( (text): 'example', (begin): 1, (end): 2 )
- classmethod loads(contents)[source]
Loads a Word from a dict containing the information of the word.
- Parameters
contents (Dict[str, Union[str, float]]) – A dict containing the information of the word
- Returns
The loaded
Word
object.- Return type
tensorbay.label.label_sentence._T
Examples
>>> contents = {"text": "Hello, World", "begin": 1, "end": 2} >>> Word.loads(contents) Word( (text): 'Hello, World', (begin): 1, (end): 2 )
- class tensorbay.label.label_sentence.LabeledSentence(sentence=None, spell=None, phone=None, *, attributes=None)[source]
Bases:
tensorbay.label.basic._LabelBase
This class defines the concept of phonetic transcription lable.
LabeledSentence
is the transcripted sentence type of label. which is often used for tasks such as automatic speech recognition.- Parameters
sentence (List[tensorbay.label.label_sentence.Word]) – A list of sentence.
spell (List[tensorbay.label.label_sentence.Word]) – A list of spell, only exists in Chinese language.
phone (List[tensorbay.label.label_sentence.Word]) – A list of phone.
attributes (Dict[str, Union[str, int, float, bool, List[Union[str, int, float, bool]]]]) – The attributes of the label.
- sentence
The transcripted sentence.
- Type
- spell
The spell within the sentence, only exists in Chinese language.
- Type
- phone
The phone of the sentence label.
- Type
- attributes
The attributes of the label.
- Type
Dict[str, Union[str, int, float, bool, List[Union[str, int, float, bool]]]]
Examples
>>> sentence = [Word(text="qi1shi2", begin=1, end=2)] >>> spell = [Word(text="qi1", begin=1, end=2)] >>> phone = [Word(text="q", begin=1, end=2)] >>> LabeledSentence( ... sentence, ... spell, ... phone, ... attributes={"key": "value"}, ... ) LabeledSentence( (sentence): [ Word( (text): 'qi1shi2', (begin): 1, (end): 2 ) ], (spell): [ Word( (text): 'qi1', (begin): 1, (end): 2 ) ], (phone): [ Word( (text): 'q', (begin): 1, (end): 2 ) ], (attributes): { 'key': 'value' } )
- classmethod loads(contents)[source]
Loads a LabeledSentence from a dict containing the information of the label.
- Parameters
contents (Dict[str, Any]) – A dict containing the information of the sentence label.
- Returns
The loaded
LabeledSentence
object.- Return type
tensorbay.label.label_sentence._T
Examples
>>> contents = { ... "sentence": [{"text": "qi1shi2", "begin": 1, "end": 2}], ... "spell": [{"text": "qi1", "begin": 1, "end": 2}], ... "phone": [{"text": "q", "begin": 1, "end": 2}], ... "attributes": {"key": "value"}, ... } >>> LabeledSentence.loads(contents) LabeledSentence( (sentence): [ Word( (text): 'qi1shi2', (begin): 1, (end): 2 ) ], (spell): [ Word( (text): 'qi1', (begin): 1, (end): 2 ) ], (phone): [ Word( (text): 'q', (begin): 1, (end): 2 ) ], (attributes): { 'key': 'value' } )
- dumps()[source]
Dumps the current label into a dict.
- Returns
A dict containing all the information of the sentence label.
- Return type
Dict[str, Any]
Examples
>>> sentence = [Word(text="qi1shi2", begin=1, end=2)] >>> spell = [Word(text="qi1", begin=1, end=2)] >>> phone = [Word(text="q", begin=1, end=2)] >>> labeledsentence = LabeledSentence( ... sentence, ... spell, ... phone, ... attributes={"key": "value"}, ... ) >>> labeledsentence.dumps() { 'attributes': {'key': 'value'}, 'sentence': [{'text': 'qi1shi2', 'begin': 1, 'end': 2}], 'spell': [{'text': 'qi1', 'begin': 1, 'end': 2}], 'phone': [{'text': 'q', 'begin': 1, 'end': 2}] }