Hideki Kawahara
Fri 29 Jan 2016, 11:00 - 12:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)


This talk presents underlying concept, technologies and applications of STRAIGHT, a framework for speech analysis, modification and resynthesis, which was originally designed to facilitate speech perception research. The talk also introduces recent advances which may provide new possible strategies in speech communication research. One is "Temporally variable multi-aspect morphing of arbitrarily many voices." The other is "SparkNG: Speech Production and Auditory perception Research Kernel the Next Generation." Speech plays essential roles in human communication by providing rich side information channels which modify/expand linguistic contents. While recent resurgence of machine learning technologies made speech-based communication with smart machines practical and popular, these rich side information channels which make speech unique are not well explored. It is crucially important to make smart machines to share common basis with humans of these rich side information channels based on deep understanding of human speech communication. "Making speech tangible" by introducing tools which enable quantitative and precise as well as intuitive/direct manipulation of speech parameters, I hope, leads to better understanding of human speech communication.


Emeritus Professor, Wakayama University (Japan)
Currently visiting Google’s speech synthesis group in London.

Research Interests

• Auditory signal processing

• Auditory basis of speech and sound perception

• Speech analysis, modification and synthesis

• Design reuse in singing synthesis

Computational auditory scene analysis Creator of STRAIGHT, which is the most widely used vocoder in statistical parametric speech synthesis.