Date of Original Version

8-2014

Type

Article

Journal Title

Proceedings of INTERSPEECH

First Page

2862

Last Page

2866

Rights Management

Copyright © 2014 ISCA

Abstract or Description

This paper presents initial studies on building a vocabulary self-learning speech recognition system that can automatically learn unknown words and expand its recognition vocabulary. Our recognizer can detect and recover out-of-vocabulary (OOV) words in speech, then incorporate OOV words into its lexicon and language model (LM). As a result, these unknown words can be correctly recognized when encountered by the recognizer in future. Specifically, we apply the word-fragment hybrid system framework to detect the presence of OOV words. We propose a better phoneme-to-grapheme (P2G) model so as to correctly recover the written form for more OOV words. Furthermore, we estimate LM scores for OOV words using their syntactic and semantic properties. The experimental results show that more than 40% OOV words are successfully learned from the development data, and about 60% learned OOV words are recognized in the testing data.

Share

COinS
 

Published In

Proceedings of INTERSPEECH, 2862-2866.