GithubHelp home page GithubHelp logo

laughter-detection's People

Contributors

jrgillick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

laughter-detection's Issues

False positives due to music (BGM)

Hi Jon Gillick,

I tried you code with some podcast audio files and it works excellent. There are some false positive outputs too, where I do see music ( BGM ) also are identified as laughter.

https://content.production.cdn.art19.com/validation=1563883983,1e1020f2-6a03-565e-9c12-e427174fb512,IvBM_SoTXagDGbWmNuqWJkkjFQc/episodes/c3911ffa-4f63-4ceb-810a-c556380d4e24/31b642ca70a6683fab29cd9a11ba6fd448144f26c9491b13d685f2cb2af2bedd20d4e60a58b1a669366c25cd63ffdeb51a2ef07d68a4c02b828598fe765b54a9/MoS-Dara-V11-BP-Mix.mp3

Is there anyway, to detect music and remove them? Do you have any model file for music detection?

I am also interested in the Switchboard files and if possible logic to build my own model using laughter samples I have.

Thank you very much in advance.

SSV

about the audioset-laughter annotations

Hi, thanks a lot for the contribution and the repository.

I have two questions about the audioset annotations (calling that 999-element set the "audioset-laughter" set hereon):

  1. There are some weird annotations like start=end=0 (examples are on lines 7, 29, 80, 88, 95, 102, ... there are more). Is that a special annotation (e.g., does that mean the whole file contains a laugh etc.)? I don't understand what a zero-length laugh segment means

  2. does "window_start" correspond to the start time instant in the youtube video for the recorded audio snippet?

  3. "audio_length" and "window_length" seem to be equal at all times, I'm guessing that's the length (in seconds) of the recorded audio snippet I described above, is that correct?

  4. I think this script downloads mp3 audio files for youtube videos that are specified on a csv. Some csv files can be downloaded using this script, but it seems like none of the csv files there correspond to the clips in the audioset-laughter annotations (950 of the IDs on the "unbalanced_train_segments.csv" and 38 of the IDs on the "eval_segments.csv" match with the audioset-laughter IDs, but this even is not a full list). Is there a csv that can be fed to the download script to download just the audioset-laughter audio files?

About the training data.

I have a question about the training data, because you mentioned in section 3.4 of the paper that the training Audioset dataset does not mark the start and end times of laughter, but the Audioset test dataset does mark the start and end times , so how did you calculate the accuracy when testing?

About training

Can you provide a detailed process of training? I don't know where to start the preprocessing script right now. Can you provide detailed steps, such as step 1, step 2, step 3

Running laughter detector errors:

Ran into a couple of issues trying to run the laughter detector:

  1. The latest version of librosa 0.6.1 doesn't run with the latest version of joblib 0.12.0. Had to rollback to joblib 0.11.0 for librosa to work.

  2. laugh_segmenter.py is dependent on python_speech_features library.

And two problems I haven't been able to solve:

  1. I get this error when I run (though this is only a warning)

UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.
warnings.warn('Error in loading the saved optimizer '

  1. More importantly I get this error saving out the result:

Traceback (most recent call last):
File "segment_laughter.py", line 50, in
laughs = laugh_segmenter.segment_laughs(input_path,model_path,output_path,threshold,min_length)
File "/Users/maneesh/development/testCode/jnotebooks/laughter-detection/laugh_segmenter.py", line 112, in segment_laughs
librosa.output.write_wav(wav_path, (laughs * maxv).astype(np.int16), full_res_sr)
File "/Users/maneesh/anaconda2/envs/tensorflow/lib/python2.7/site-packages/librosa/output.py", line 223, in write_wav
util.valid_audio(y, mono=False)
File "/Users/maneesh/anaconda2/envs/tensorflow/lib/python2.7/site-packages/librosa/util/utils.py", line 159, in valid_audio
raise ParameterError('data must be floating-point')
librosa.util.exceptions.ParameterError: data must be floating-point

Tensorflow Dependency Issues

Hey there,

I'm curious what version of tensorflow you're running alongside your keras dependency, as when I follow your requirements I receive the following error when trying to run the segment_laughter.py file:

Traceback (most recent call last):
File "segment_laughter.py", line 3, in
import laugh_segmenter
File "/home/mark/work/work/voice/laughterdetection/laugh_segmenter.py", line 2, in
config = tf.ConfigProto()
AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

Output a time-allign annotation file.

At the moment, segment_laughter.py cuts out the detected laughter sequences and prints out a list of time locations of these in the original file. It would be useful if in addition it could produce a time-aligned annotation file in some commonly used format.

In phonetics, which is my field, one widely used annotation tool is Praat, which saves its annotations in TextGrid files. I could provide a pull request adding such a functionality.

Sampling rate mismatch

Hello @jrgillick ,

Thanks for this work. I have a doubt regarding using this code for laughter detection in datasets which are sampled at 16 kHz. Where all do we need to change the code in that case?

Also, do you expect any drop in performance with this change?

Thanks,
Soumya

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.