-
BRIR
-
Sound source
TIMIT
-
Train set
-
8 mic positions
-
21 target source positions (-50~50)
-
10 interfering source position(
$$\pm 5,\pm 10,\pm 20,\pm 30,\pm 40$$ ) -
3 SNR (0,10,20 dB)
-
sentence = 5040
sentence_mic_pos = 630
-
To obtain reliable models, GMMs are train with features selected based on 4 criterias:
-
VAD (before frequency analysis)
Select frames of which energy exceed the threshold (energe maximum - 40)
-
Target is dominant
the signals of the left and the right ear were added prior to energy computaion
-
L,R channels are correlated
Frame with binaural cross-correlation maximum exceed the threshold (0.3)
-
ITD estimations are within (-1,1)ms
Cues
e.g.
Example: sound source in the front (azimuth=0), training result of 32 frequency bands
Comparable result in reference paper (hightlighted)
Illustration of azimuth estimation
#source | -50 | 0 | 50 |
---|---|---|---|
1 | |||
2 | |||
3 | |||
4 |