Date of Original Version
Proceedings of INTERSPEECH
Copyright 2010 ISCA
Abstract or Description
This paper analyzes the capability of multilayer perceptron frontends to perform speaker normalization. We find the context decision tree to be a very useful tool to assess the speaker normalization power of different frontends. We introduce a gender question into the training of the phonetic context decision tree. After the context clustering the gender specific models are counted. We compare this for the following frontends: (1) Bottle-Neck (BN) with and without vocal tract length normalization (VTLN), (2) standard MFCC, (3) stacking of multiple MFCC frames with linear discriminant analysis (LDA). We find the BN-frontend to be even more effective in reducing the number of gender questions than VTLN. From this we conclude that a Bottle-Neck frontend is more effective for gender normalization. Combining VTLN and BN-features reduces the number of gender specific models further.
Proceedings of INTERSPEECH, 306-309.