Junyun Tay

Date of Award

Spring 5-2016

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Mechanical Engineering


Manuela Veloso

Second Advisor

I-Ming Chen


Gestures and other body movements of humanoid robots can be used to convey meanings which are extracted from an input signal, such as speech or music. For example, the humanoid robot waves its arm to say goodbye or nods its head to dance to the beats of the music. This thesis investigates how to autonomously animate a real humanoid robot given an input signal. This thesis addresses five core challenges, namely: Representation of motions, Mappings between meanings and motions, Selection of relevant motions, Synchronization of motion sequences to the input signal, and Stability of the motion sequences (R-M-S3). We define parameterized motions that allow a large variation of whole body motions to be generated from a small core motion library and synchronization of the motions to different input signals. To assign meanings to motions, we represent meanings using labels and map motions to labels autonomously using motion features. We also examine different metrics to determine similar motions so that a new motion is mapped to existing labels of the most similar motion. We explain how we select relevant motions using labels, synchronize the motion sequence to the input signal, and consider the audience’s preferences. We contribute an algorithm that determines the stability of a motion sequence. We also define the term relative stability, where the stability of one motion sequence is compared to other motion sequences. We contribute an algorithm to determine the most stable motion sequence so that the humanoid robot animates continuously without interruptions. We demonstrate our work with two input signals – music and speech, where a humanoid robot autonomously dances to any piece of music using the beats and emotions of the music and also autonomously gestures according to its speech. We describe how we use our solutions to R-M-S3, and present a complete algorithm that captures the meanings of the input signal and weighs the selection of the best sequence using two criteria: audience feedback and stability. Our approach and algorithms are general to autonomously animate humanoid robots, and we use a real NAO humanoid robot and in simulation as an example.