• Home

Conference_programme: 21.1 - Source localization and acoustic array processing



Lecture: STFT Bin Selection for Localization Algorithms based on the Sparsity of Speech Signal Spectra

Author(s): Brendel Andreas, Huang Chengyu, Kellermann Walter

Summary:
Many algorithms for the localization or tracking of speech sources, or for estimating their direction of arrivals (DOAs), are relying on the so-called W-disjoint orthogonality, i.e., only one speaker is assumed to be active at a certain time-frequency bin. Based on this assumption, bin-wise DOA estimates can be computed from pairwise phase differences of each time-frequency bin and clustered afterwards.\nAveraging the estimates of each cluster, i.e., computing the cluster centroids, increases the robustness of the localization estimate. However, the clustering can be computationally demanding due to the large amount of DOA estimates, and potentially many of them may not be reliable due to noise and reverberation. Therefore, a suitable selection of reliable STFT bins will increase the accuracy of the estimate and reduce the computational complexity at the same time. In this contribution, we investigate different methods for the selection of STFT bins serving localization algorithms for speech sources,\nwhich are based on the W-disjoint orthogonality, including the bin-wise speech signal power, the\ncoherent-to-diffuse power ratio, and the speech presence probability. The effectiveness of the selection processes is studied for different localization algorithms.

Download the full paper

Corresponding author

Name: Mr Andreas Brendel

e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Country: Germany