tensorbay.label.label_sentence#
The implementation of the TensorBay sentence label.
- class tensorbay.label.label_sentence.SentenceSubcatalog(is_sample=False, sample_rate=None, lexicon=None)[source]#
Bases:
tensorbay.label.basic.SubcatalogBase
,tensorbay.label.supports.AttributesMixin
This class defines the subcatalog for audio transcripted sentence type of labels.
- Parameters
is_sample (bool) – A boolen value indicates whether time format is sample related.
sample_rate (int) – The number of samples of audio carried per second.
lexicon (List[List[str]]) – A list consists all of text and phone.
- Return type
None
- description#
The description of the entire sentence subcatalog.
- Type
str
- is_sample#
A boolen value indicates whether time format is sample related.
- Type
bool
- sample_rate#
The number of samples of audio carried per second.
- Type
int
- lexicon#
A list consists all of text and phone.
- Type
List[List[str]]
- attributes#
All the possible attributes in the corresponding dataset stored in a
NameList
with the attribute names as keys and theAttributeInfo
as values.
- Raises
TypeError – When sample_rate is None and is_sample is True.
- Parameters
is_sample (bool) –
sample_rate (int) –
lexicon (List[List[str]]) –
- Return type
None
Examples
Initialization Method 1: Init from
SentenceSubcatalog.__init__()
.>>> SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) SentenceSubcatalog( (is_sample): True, (sample_rate): 16000, (lexicon): [...] )
Initialization Method 2: Init from
SentenceSubcatalog.loads()
method.>>> contents = { ... "isSample": True, ... "sampleRate": 16000, ... "lexicon": [["mean", "m", "iy", "n"]], ... "attributes": [{"name": "gender", "enum": ["male", "female"]}], ... } >>> SentenceSubcatalog.loads(contents) SentenceSubcatalog( (is_sample): True, (sample_rate): 16000, (attributes): NameList [...], (lexicon): [...] )
- dumps()[source]#
Dumps the information of this SentenceSubcatalog into a dict.
- Returns
A dict containing all information of this SentenceSubcatalog.
- Return type
Dict[str, Any]
Examples
>>> sentence_subcatalog = SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) >>> sentence_subcatalog.dumps() {'isSample': True, 'sampleRate': 16000, 'lexicon': [['mean', 'm', 'iy', 'n']]}
- append_lexicon(lexemes)[source]#
Add lexemes to lexicon.
- Parameters
lexemes (List[str]) – A list consists of text and phone.
- Return type
None
Examples
>>> sentence_subcatalog = SentenceSubcatalog(True, 16000, [["mean", "m", "iy", "n"]]) >>> sentence_subcatalog.append_lexicon(["example"]) >>> sentence_subcatalog.lexicon [['mean', 'm', 'iy', 'n'], ['example']]
- class tensorbay.label.label_sentence.Word(text, begin=None, end=None)[source]#
Bases:
tensorbay.utility.repr.ReprMixin
,tensorbay.utility.attr.AttrsMixin
This class defines the concept of word.
Word
is a word within a phonetic transcription sentence, containing the content of the word, the start and end time in the audio.- Parameters
text (str) – The content of the word.
begin (float) – The begin time of the word in the audio.
end (float) – The end time of the word in the audio.
- text#
The content of the word.
- Type
str
- begin#
The begin time of the word in the audio.
- Type
float
- end#
The end time of the word in the audio.
- Type
float
Examples
>>> Word(text="example", begin=1, end=2) Word( (text): 'example', (begin): 1, (end): 2 )
- classmethod loads(contents)[source]#
Loads a Word from a dict containing the information of the word.
- Parameters
contents (Dict[str, Union[str, float]]) – A dict containing the information of the word
- Returns
The loaded
Word
object.- Return type
tensorbay.label.label_sentence._T
Examples
>>> contents = {"text": "Hello, World", "begin": 1, "end": 2} >>> Word.loads(contents) Word( (text): 'Hello, World', (begin): 1, (end): 2 )
- class tensorbay.label.label_sentence.LabeledSentence(sentence=None, spell=None, phone=None, *, attributes=None)[source]#
Bases:
tensorbay.label.basic._LabelBase
This class defines the concept of phonetic transcription lable.
LabeledSentence
is the transcripted sentence type of label. which is often used for tasks such as automatic speech recognition.- Parameters
sentence (List[tensorbay.label.label_sentence.Word]) – A list of sentence.
spell (List[tensorbay.label.label_sentence.Word]) – A list of spell, only exists in Chinese language.
phone (List[tensorbay.label.label_sentence.Word]) – A list of phone.
attributes (Dict[str, Union[str, int, float, bool, List[Union[str, int, float, bool]]]]) – The attributes of the label.
- sentence#
The transcripted sentence.
- Type
- spell#
The spell within the sentence, only exists in Chinese language.
- Type
- phone#
The phone of the sentence label.
- Type
- attributes#
The attributes of the label.
- Type
Dict[str, Union[str, int, float, bool, List[Union[str, int, float, bool]]]]
Examples
>>> sentence = [Word(text="qi1shi2", begin=1, end=2)] >>> spell = [Word(text="qi1", begin=1, end=2)] >>> phone = [Word(text="q", begin=1, end=2)] >>> LabeledSentence( ... sentence, ... spell, ... phone, ... attributes={"key": "value"}, ... ) LabeledSentence( (sentence): [ Word( (text): 'qi1shi2', (begin): 1, (end): 2 ) ], (spell): [ Word( (text): 'qi1', (begin): 1, (end): 2 ) ], (phone): [ Word( (text): 'q', (begin): 1, (end): 2 ) ], (attributes): { 'key': 'value' } )
- classmethod loads(contents)[source]#
Loads a LabeledSentence from a dict containing the information of the label.
- Parameters
contents (Dict[str, Any]) – A dict containing the information of the sentence label.
- Returns
The loaded
LabeledSentence
object.- Return type
tensorbay.label.label_sentence._T
Examples
>>> contents = { ... "sentence": [{"text": "qi1shi2", "begin": 1, "end": 2}], ... "spell": [{"text": "qi1", "begin": 1, "end": 2}], ... "phone": [{"text": "q", "begin": 1, "end": 2}], ... "attributes": {"key": "value"}, ... } >>> LabeledSentence.loads(contents) LabeledSentence( (sentence): [ Word( (text): 'qi1shi2', (begin): 1, (end): 2 ) ], (spell): [ Word( (text): 'qi1', (begin): 1, (end): 2 ) ], (phone): [ Word( (text): 'q', (begin): 1, (end): 2 ) ], (attributes): { 'key': 'value' } )
- dumps()[source]#
Dumps the current label into a dict.
- Returns
A dict containing all the information of the sentence label.
- Return type
Dict[str, Any]
Examples
>>> sentence = [Word(text="qi1shi2", begin=1, end=2)] >>> spell = [Word(text="qi1", begin=1, end=2)] >>> phone = [Word(text="q", begin=1, end=2)] >>> labeledsentence = LabeledSentence( ... sentence, ... spell, ... phone, ... attributes={"key": "value"}, ... ) >>> labeledsentence.dumps() { 'attributes': {'key': 'value'}, 'sentence': [{'text': 'qi1shi2', 'begin': 1, 'end': 2}], 'spell': [{'text': 'qi1', 'begin': 1, 'end': 2}], 'phone': [{'text': 'q', 'begin': 1, 'end': 2}] }