Date of Original Version

9-2013

Type

Conference Proceeding

Journal Title

Proceedings of INTERSPEECH

First Page

1902

Last Page

1905

Rights Management

Copyright 2012 ISCA

Abstract or Description

State-of-the-art Automatic Speech Recognition (ASR) models struggle to handle accented speech, particularly if the target accent is under-represented in the training data. The acoustic variations presented by an unfamiliar accent, render the ASR polyphone decision tree (PDT) and its associated Gaussian mixture models (GMM) misfit to the test data. In this paper, we improve on the previous work of adapting the polyphone decision tree, using a semi-continuous model based approach to address the problem of data sparsity. We extend the existing PDT to introduce additional states with shared parameters, corresponding to the new contextual variations identified in the adaptation data, while still robustly estimating the state based parameters on a small adaptation set. We conduct ASR experiments on Arabic and English accents and show that our technique performs better than Maximum A-Posteriori (MAP) adaptation and a previous implementation of polyphone decision tree specialization (PDTS). Compared to MAP adaptation, we obtain 7% relative improvement for Dialectal Arabic and 13.8% relative improvement for Accented English.

Share

COinS
 

Published In

Proceedings of INTERSPEECH, 1902-1905.