jrgillick / laughter-detection Goto Github PK

View Code? Open in Web Editor NEW

211.0 211.0 47.0 85.84 MB

License: MIT License

Python 95.98% Jupyter Notebook 3.84% Shell 0.18%

laughter-detection's People

Contributors

Stargazers

Watchers

laughter-detection's Issues

False positives due to music (BGM)

Hi Jon Gillick,

I tried you code with some podcast audio files and it works excellent. There are some false positive outputs too, where I do see music ( BGM ) also are identified as laughter.

https://content.production.cdn.art19.com/validation=1563883983,1e1020f2-6a03-565e-9c12-e427174fb512,IvBM_SoTXagDGbWmNuqWJkkjFQc/episodes/c3911ffa-4f63-4ceb-810a-c556380d4e24/31b642ca70a6683fab29cd9a11ba6fd448144f26c9491b13d685f2cb2af2bedd20d4e60a58b1a669366c25cd63ffdeb51a2ef07d68a4c02b828598fe765b54a9/MoS-Dara-V11-BP-Mix.mp3

Is there anyway, to detect music and remove them? Do you have any model file for music detection?

I am also interested in the Switchboard files and if possible logic to build my own model using laughter samples I have.

Thank you very much in advance.

SSV

about the audioset-laughter annotations

Hi, thanks a lot for the contribution and the repository.

I have two questions about the audioset annotations (calling that 999-element set the "audioset-laughter" set hereon):

There are some weird annotations like start=end=0 (examples are on lines 7, 29, 80, 88, 95, 102, ... there are more). Is that a special annotation (e.g., does that mean the whole file contains a laugh etc.)? I don't understand what a zero-length laugh segment means
does "window_start" correspond to the start time instant in the youtube video for the recorded audio snippet?
"audio_length" and "window_length" seem to be equal at all times, I'm guessing that's the length (in seconds) of the recorded audio snippet I described above, is that correct?
I think this script downloads mp3 audio files for youtube videos that are specified on a csv. Some csv files can be downloaded using this script, but it seems like none of the csv files there correspond to the clips in the audioset-laughter annotations (950 of the IDs on the "unbalanced_train_segments.csv" and 38 of the IDs on the "eval_segments.csv" match with the audioset-laughter IDs, but this even is not a full list). Is there a csv that can be fed to the download script to download just the audioset-laughter audio files?

I have a question about the training data, because you mentioned in section 3.4 of the paper that the training Audioset dataset does not mark the start and end times of laughter, but the Audioset test dataset does mark the start and end times , so how did you calculate the accuracy when testing?

About training

Can you provide a detailed process of training? I don't know where to start the preprocessing script right now. Can you provide detailed steps, such as step 1, step 2, step 3

Running laughter detector errors:

Ran into a couple of issues trying to run the laughter detector:

The latest version of librosa 0.6.1 doesn't run with the latest version of joblib 0.12.0. Had to rollback to joblib 0.11.0 for librosa to work.
laugh_segmenter.py is dependent on python_speech_features library.

And two problems I haven't been able to solve:

I get this error when I run (though this is only a warning)

UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.
warnings.warn('Error in loading the saved optimizer '

More importantly I get this error saving out the result:

Traceback (most recent call last):
File "segment_laughter.py", line 50, in
laughs = laugh_segmenter.segment_laughs(input_path,model_path,output_path,threshold,min_length)
File "/Users/maneesh/development/testCode/jnotebooks/laughter-detection/laugh_segmenter.py", line 112, in segment_laughs
librosa.output.write_wav(wav_path, (laughs * maxv).astype(np.int16), full_res_sr)
File "/Users/maneesh/anaconda2/envs/tensorflow/lib/python2.7/site-packages/librosa/output.py", line 223, in write_wav
util.valid_audio(y, mono=False)
File "/Users/maneesh/anaconda2/envs/tensorflow/lib/python2.7/site-packages/librosa/util/utils.py", line 159, in valid_audio
raise ParameterError('data must be floating-point')
librosa.util.exceptions.ParameterError: data must be floating-point

Tensorflow Dependency Issues

Hey there,

I'm curious what version of tensorflow you're running alongside your keras dependency, as when I follow your requirements I receive the following error when trying to run the segment_laughter.py file:

Traceback (most recent call last):
File "segment_laughter.py", line 3, in
import laugh_segmenter
File "/home/mark/work/work/voice/laughterdetection/laugh_segmenter.py", line 2, in
config = tf.ConfigProto()
AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

ModuleNotFoundError: No module named 'pandas'

When running from CMD

Output a time-allign annotation file.

At the moment, segment_laughter.py cuts out the detected laughter sequences and prints out a list of time locations of these in the original file. It would be useful if in addition it could produce a time-aligned annotation file in some commonly used format.

In phonetics, which is my field, one widely used annotation tool is Praat, which saves its annotations in TextGrid files. I could provide a pull request adding such a functionality.

Sampling rate mismatch

Hello @jrgillick ,

Thanks for this work. I have a doubt regarding using this code for laughter detection in datasets which are sampled at 16 kHz. Where all do we need to change the code in that case?

Also, do you expect any drop in performance with this change?

Thanks,
Soumya

save_cuts rename not complete

298d8c2 renamed save_cuts but segment_laughter.py still uses save_cuts, causing a NameError

jrgillick / laughter-detection Goto Github PK

laughter-detection's People

Contributors

Stargazers

Watchers

Forkers

laughter-detection's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs