Date of Original Version
IEEE Spoken Language Technology Workshop (SLT)
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Abstract or Description
We experiment with active learning for speech recognition in the context of accent adaptation. We adapt a source recognizer on the target accent by selecting a relatively small, matched subset of utterances from a large, untranscribed and multi-accented corpus for human transcription. Traditionally, active learning in speech recognition has relied on uncertainty based sampling to choose the most informative data for manual labeling. Such an approach doesn't include explicit relevance criterion during data selection, which is crucial for choosing utterances to match the target accent, from datasets with wide-ranging speakers of different accents. We formulate a cross-entropy based relevance measure to complement uncertainty based sampling for active learning to aid accent adaptation. We evaluate the algorithm on two different setups for Arabic and English accents and show that our approach performs favorably to conventional data selection. We analyze the results to show the effectiveness of our approach in finding the most relevant subset of utterances for improving the speech recognizer on the target accent.
IEEE Spoken Language Technology Workshop (SLT), 360-365.