• Home

Conference_programme: 9: General linear and non-linear acoustics



Lecture: Evaluating speech content using remote sound sensing techniques

Author(s): Zervas Panagiotis, Bakarezos Efthimios, Movsesian Ara, Papadogiannis Nektarios

Summary:
There is great interest in developing portable sensing devices for smart environments, such as smart homes, cities, public places, etc. Especially in applied acoustics, research has been carried out for remotely capturing vibrations of objects caused by audible disturbances. Such a solution seems tempting in situations where microphone placement is not possible.\nThis paper reports our work in developing a laser-based audio recording device. It’s principle of operation is based on the deflection of a laser beam from a vibrating monitoring surface due to speech, such as a glass window, in order to remotely record sound events. The changes in the optical path of the laser beam are proportional to the vibration changes of the monitoring surface, which, in turn, are directly related to the sound incident on the monitoring surface. For the detection of laser beam optical path changes a commercially available photodiode was used, having a response time of the order of ns, which is more than adequate for capturing the frequency content of the speech sound samples used.\nFor evaluating the proposed sensing device we conducted experiments for speaker emotion recognition problem. Evaluation was approached as a two-phase task. Initially, the emotional speech data was reproduced by a loudspeaker and captured with the laser-based device. The recordings were performed in the physical space of an auditorium. \nIn the first step, an emotional database of acted speech samples is used, which has been previously used for this task. The database contained speech samples from one speaker in 5 different emotional states, those of happiness, anger, sadness and fear plus a neutral session. All speech samples were played with a loudspeaker and the vibrations they produced on a remote flat monitoring surface were captured. \nAs a final step from the recorded data, we extracted features and formed two datasets for training and testing support vector machines (SVM). The first dataset contained all the emotional classes from the original database, while the latter had two classes after merging the original ones according to valance dimension; a global measure of the pleasure associated with the emotional state, ranging from negative to positive. For forming the binary dataset the emotions of anger, fear and sadness were labelled as the negative emotional state while neutral and happy as the non-negative emotional state.\nHandcrafted features were obtained with the application of openSMILE open-source toolkit. Support Vector Machines (SVM) trained with the Sequential Minimal Optimisation (SMO) algorithm implemented in the WEKA data-mining toolkit were used for classification. Prior to learning, and in order to reduce the complexity of the initial feature vector, a Correlation-Based feature subset selection method was applied. \nResults showed that such solutions, with further refinement, could be utilized in real-life scenarios and they will be presented in the full paper.\n

Download the full paper

Corresponding author

Name: Prof Panagiotis Zervas

e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Country: Greece