tensorbay.label.label_sentence
Word, LabeledSentence, SentenceSubcatalog.
SentenceSubcatalog defines the subcatalog for audio transcripted sentence type of labels.
Word is a word within a phonetic transcription sentence,
containing the content of the word, the start and end time in the audio.
LabeledSentence is the transcripted sentence type of label.
which is often used for tasks such as automatic speech recognition.
- class tensorbay.label.label_sentence.SentenceSubcatalog(is_sample=False, sample_rate=None, lexicon=None)[source]
Bases:
tensorbay.label.basic.SubcatalogBase,tensorbay.label.supports.AttributesMixinThis class defines the subcatalog for audio transcripted sentence type of labels.
- Parameters
is_sample (bool) – A boolen value indicates whether time format is sample related.
sample_rate (int) – The number of samples of audio carried per second.
lexicon (List[List[str]]) – A list consists all of text and phone.
- Return type
None
- description
The description of the entire sentence subcatalog.
- Type
str
- is_sample
A boolen value indicates whether time format is sample related.
- Type
bool
- sample_rate
The number of samples of audio carried per second.
- Type
int
- lexicon
A list consists all of text and phone.
- Type
List[List[str]]
- attributes
All the possible attributes in the corresponding dataset stored in a
NameListwith the attribute names as keys and theAttributeInfoas values.
- Raises
TypeError – When sample_rate is None and is_sample is True.
- Parameters
is_sample (bool) –
sample_rate (int) –
lexicon (List[List[str]]) –
- Return type
None
Examples
Initialization Method 1: Init from
SentenceSubcatalog.__init__().>>> SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) SentenceSubcatalog( (is_sample): True, (sample_rate): 16000, (lexicon): [...] )
Initialization Method 2: Init from
SentenceSubcatalog.loads()method.>>> contents = { ... "isSample": True, ... "sampleRate": 16000, ... "lexicon": [["mean", "m", "iy", "n"]], ... "attributes": [{"name": "gender", "enum": ["male", "female"]}], ... } >>> SentenceSubcatalog.loads(contents) SentenceSubcatalog( (is_sample): True, (sample_rate): 16000, (attributes): NameList [...], (lexicon): [...] )
- dumps()[source]
Dumps the information of this SentenceSubcatalog into a dict.
- Returns
A dict containing all information of this SentenceSubcatalog.
- Return type
Dict[str, Any]
Examples
>>> sentence_subcatalog = SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) >>> sentence_subcatalog.dumps() {'isSample': True, 'sampleRate': 16000, 'lexicon': [['mean', 'm', 'iy', 'n']]}
- append_lexicon(lexemes)[source]
Add lexemes to lexicon.
- Parameters
lexemes (List[str]) – A list consists of text and phone.
- Return type
None
Examples
>>> sentence_subcatalog = SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) >>> sentence_subcatalog.append_lexicon(["example"]) >>> sentence_subcatalog.lexicon [['mean', 'm', 'iy', 'n'], ['example']]
- class tensorbay.label.label_sentence.Word(text, begin=None, end=None)[source]
Bases:
tensorbay.utility.repr.ReprMixin,tensorbay.utility.attr.AttrsMixinThis class defines the concept of word.
Wordis a word within a phonetic transcription sentence, containing the content of the word, the start and end time in the audio.- Parameters
text (str) – The content of the word.
begin (float) – The begin time of the word in the audio.
end (float) – The end time of the word in the audio.
- text
The content of the word.
- Type
str
- begin
The begin time of the word in the audio.
- Type
float
- end
The end time of the word in the audio.
- Type
float
Examples
>>> Word(text="example", begin=1, end=2) Word( (text): 'example', (begin): 1, (end): 2 )
- classmethod loads(contents)[source]
Loads a Word from a dict containing the information of the word.
- Parameters
contents (Dict[str, Union[str, float]]) – A dict containing the information of the word
- Returns
The loaded
Wordobject.- Return type
tensorbay.label.label_sentence._T
Examples
>>> contents = {"text": "Hello, World", "begin": 1, "end": 2} >>> Word.loads(contents) Word( (text): 'Hello, World', (begin): 1, (end): 2 )
- class tensorbay.label.label_sentence.LabeledSentence(sentence=None, spell=None, phone=None, *, attributes=None)[source]
Bases:
tensorbay.label.basic._LabelBaseThis class defines the concept of phonetic transcription lable.
LabeledSentenceis the transcripted sentence type of label. which is often used for tasks such as automatic speech recognition.- Parameters
sentence (List[tensorbay.label.label_sentence.Word]) – A list of sentence.
spell (List[tensorbay.label.label_sentence.Word]) – A list of spell, only exists in Chinese language.
phone (List[tensorbay.label.label_sentence.Word]) – A list of phone.
attributes (Dict[str, Union[str, int, float, bool, List[Union[str, int, float, bool]]]]) – The attributes of the label.
- sentence
The transcripted sentence.
- Type
- spell
The spell within the sentence, only exists in Chinese language.
- Type
- phone
The phone of the sentence label.
- Type
- attributes
The attributes of the label.
- Type
Dict[str, Union[str, int, float, bool, List[Union[str, int, float, bool]]]]
Examples
>>> sentence = [Word(text="qi1shi2", begin=1, end=2)] >>> spell = [Word(text="qi1", begin=1, end=2)] >>> phone = [Word(text="q", begin=1, end=2)] >>> LabeledSentence( ... sentence, ... spell, ... phone, ... attributes={"key": "value"}, ... ) LabeledSentence( (sentence): [ Word( (text): 'qi1shi2', (begin): 1, (end): 2 ) ], (spell): [ Word( (text): 'qi1', (begin): 1, (end): 2 ) ], (phone): [ Word( (text): 'q', (begin): 1, (end): 2 ) ], (attributes): { 'key': 'value' } )
- classmethod loads(contents)[source]
Loads a LabeledSentence from a dict containing the information of the label.
- Parameters
contents (Dict[str, Any]) – A dict containing the information of the sentence label.
- Returns
The loaded
LabeledSentenceobject.- Return type
tensorbay.label.label_sentence._T
Examples
>>> contents = { ... "sentence": [{"text": "qi1shi2", "begin": 1, "end": 2}], ... "spell": [{"text": "qi1", "begin": 1, "end": 2}], ... "phone": [{"text": "q", "begin": 1, "end": 2}], ... "attributes": {"key": "value"}, ... } >>> LabeledSentence.loads(contents) LabeledSentence( (sentence): [ Word( (text): 'qi1shi2', (begin): 1, (end): 2 ) ], (spell): [ Word( (text): 'qi1', (begin): 1, (end): 2 ) ], (phone): [ Word( (text): 'q', (begin): 1, (end): 2 ) ], (attributes): { 'key': 'value' } )
- dumps()[source]
Dumps the current label into a dict.
- Returns
A dict containing all the information of the sentence label.
- Return type
Dict[str, Any]
Examples
>>> sentence = [Word(text="qi1shi2", begin=1, end=2)] >>> spell = [Word(text="qi1", begin=1, end=2)] >>> phone = [Word(text="q", begin=1, end=2)] >>> labeledsentence = LabeledSentence( ... sentence, ... spell, ... phone, ... attributes={"key": "value"}, ... ) >>> labeledsentence.dumps() { 'attributes': {'key': 'value'}, 'sentence': [{'text': 'qi1shi2', 'begin': 1, 'end': 2}], 'spell': [{'text': 'qi1', 'begin': 1, 'end': 2}], 'phone': [{'text': 'q', 'begin': 1, 'end': 2}] }