Date of Original Version
Abstract or Description
Combining multiple classifiers is of particular interest in the multimedia systems, since there is usually data of very different types/modalities that should be mined or analyzed. Our wearable ‘experience collection’ system unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears the same voice it can then use a summary of the last conversation with this person to remind the wearer. To correctly identify a person from a mixture of video and audio stream, classification judgments from individual modality classifiers must be combined effectively to yield a more accurate decision. A meta-classification strategy of combining multimodal classifiers using Support Vector Machine is proposed. Preliminary, empirical results show that combining different face recognition and speaker identification technology by meta-classification is dramatically more effective than weighted interpolation. Meta-classification is general enough to be applied to any application that needs to combine multiple classifiers without much modification.