Date of Original Version
Abstract or Description
Classifying the identities of people appearing in broadcast news video into anchor, reporter, or news subject is an important topic in high-level video analysis. Given the visual resemblance of different types of people, this work explores multi-modal features derived from a variety of evidences, such as the speech identity, transcript clues, temporal video structure, named entities, and uses a statistical learning approach to combine all the features for person type classification. Experiments conducted on ABC World News Tonight video have demonstrated the effectiveness of the approach, and the contributions of different categories of features have been compared.