Morency, L.P., de Kok, I., & Gratch, J.
10th International Conference on Multimodal Interfaces (ICMI 2008)
(Crete, Greece, October 20, 2008)
Read Abstract »
During face-to-face conversation, people use visual feedback such as head nods to communicate relevant information and to synchronize rhythm between participants. In this paperwe describe how contextual information from other participants can be used to predict visual feedback and improve recognition of head gestures in human-human interactions. The main challenges addressed in this paper are optimal feature representation using an encoding dictionary and automatic selection of the optimal feature-encoding pairs. We evaluate our approach on a dataset involving 78 human participants. Using a discriminative approach to multi-modal integration, our context-based recognizer significantly improves head gesture recognition performance over a vision-only recognizer.