IROS 2015 Paper Abstract


Paper ThDT5.5

Ishi, Carlos Toshinori (ATR), Even, Jani (ATR), Hagita, Norihiro (ATR)

Speech Activity Detection and Face Orientation Estimation Using Multiple Microphone Arrays and Human Position Information

Scheduled for presentation during the Regular session "Robot Audition 1" (ThDT5), Thursday, October 1, 2015, 15:00−15:15, Saal A3

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 28 - Oct 03, 2015, Congress Center Hamburg, Hamburg, Germany

This information is tentative and subject to change. Compiled on July 19, 2019

Keywords Robot Audition, Localization, Voice, Speech Synthesis and Recognition


We developed a system for detecting the speech activity intervals of multiple speakers by combining multiple microphone arrays and human tracking technologies. We also proposed a method for estimating the face orientation of the detected speakers. The developed system was evaluated in two steps: individual utterances in different positions and orientations; and simultaneous dialogues by multiple speakers. Evaluation results revealed that the proposed system could detect speech activity intervals with more than 90% of accuracy, and face orientations with standard deviations within 30 degrees, in situations excluding the cases where all arrays are in the opposite direction to the speakerís face orientation.



Technical Content © IEEE Robotics & Automation Society

This site is protected by copyright and trademark laws under US and International law.
All rights reserved. © 2002-2019 PaperCept, Inc.
Page generated 2019-07-19  13:51:54 PST  Terms of use