paloukari / orcadetector Goto Github PK

A VGGish-based DNN trained on the Watkins Marine Mammal Sound Database, with transfer learning from Audioset, to detect multiple marine mammal species.

License: MIT License

Python 0.17% Jupyter Notebook 99.83% Shell 0.01%

audioset dnn docker-container gpu tx2 vggish whoi

orcadetector's People

Contributors

Stargazers

Watchers

Forkers

olivakar mwinton kamalsky z30g0d tubbz-alt sanindu thivinab nuwansribandara marinetech canbaba0517

orcadetector's Issues

6/25 project intro for class

title
what does audio look like
melspectorgrams
dataset of audio samples
VGGish model
system architecture
real world: hydrophone / simulation: proxy w/ triggered samples

Investigate if there is any value to keep the trailing segment of the audio files

Remove vggish smoke test, add orca smoke test

Add argparser support to specify a model name

We can pass in a CLI arg to specify which model to run (e.g. vggish or cnn or logreg), then we conditionally build the desired model before running model.fit_generator().

Capture noise data

Record noise data in random segments over the course of a day or two.

@paloukari @mwinton @ram-iyer
Hello,
It seems that the data are no more available:
C:\Users\quentin.hamard>aws s3 cp s3://w251-orca-detector-data/data.tar.gz ./
fatal error: An error occurred (404) when calling the HeadObject operation: Key "data.tar.gz" does not exist
C:\Users\quentin.hamard>aws s3 cp s3://w251-orca-detector-data/vggish_weights.tar.gz ./
fatal error: An error occurred (404) when calling the HeadObject operation: Key "vggish_weights.tar.gz" does not exist
C:\Users\quentin.hamard>aws s3 cp s3://w251-orca-detector-data/orca_weights_616776.hdf5 ~/OrcaDetector/results/orca_weights_latest.hdf5
fatal error: An error occurred (404) when calling the HeadObject operation: Key "orca_weights_616776.hdf5" does not exist

I would like to re-use the CNN you trained to use it as a feature extractor.

live feed proxy

we set up a proxy for the live stream. tx2 connects to this to read audio input.

later we plug in audio sample overlays.

Remove links to download the tar archive containing the audio files

Before we wrap up this project and make the repo public, we should remove links to the audio file archives since we don't have the right to redistribute. (Instructions are in the setup.md doc for our own sake for now.)

Add a brief /vggish/README.md

Add a brief README.md file in the ./orca_detector/vggish/ directory crediting the original Google project that the code in that directory came from.

Add support for running test

We shouldn't be regularly running against our test set, but eventually will need to add support for running test with a trained model.

Need to improve performance of the Keras generators

Right now, too much computation is done real-time as the generator tries to load an individual sample (and it all appears to happen on the CPU). We probably need to pregenerate numpy arrays and save to hdf5 files which can be loaded w/o additional processing.

Create central CLI entry point for the various scripts

Right now, we have several standalone scripts. We can create a central CLI entry point that triggers each as appropriate.

Set up train/val/test split of our downloaded data

Unfortunately, I don't think we can ask Keras to just do it's own train/val split when we use model.fit_generator(). We will have to implement our own validation generator (as the initial codebase indicates), but that involves us doing an initial split of our dataset.

I'd suggest creating a directory structure that has /data/train, /data/val, and /data/test directories at the top level, with subdirectories for the various species classes.

Probably a 70/20/10 stratified split? We can use sklearn.model_selection.train_test_split().

But before we do this, we need to have decided which of the species will get the "Other" label.

Docker container for the TX2

We may need a separate Dockerfile for the TX2 (or we may get lucky and be able to share our orca_dev image).

Use https://github.com/asottile/future-fstrings to tidy up our string formatting

Low priority, but nice to have:

https://github.com/asottile/future-fstrings

Decide which species to explicitly classify

We need to decide which species to explicitly classify -- the N species with the most samples, where I'm thinking N is ~3-4. Then we should classify everything else as "Other" (aka. random sea noises of animals we don't care about).

Add --verbose CLI flag to trigger debugging output

Our code can conveniently output debugging output with something simple like:

if verbose:
print(...)

track down and give credit to original author of the web scraper

Update to support shorter (2 sec?) audio clips

Right now, everything's working with 5 sec clips, but I ran into matrix dimensionality mismatch errors in the model when trying to drop to 2 sec. But this is work that's probably worth doing, as it would give us more training examples.

Dockerize the project

Manually annotate audio samples to find the important segments

Rather than assuming that the entire audio file contains a "positive" example of a class, I think we may need to manually annotate the appropriate portions.

From HW8: https://github.com/CrowdCurio/audio-annotator

Record loss and accuracy after each training epoch; generate plots

By plugging in to the Keras callback framework, we can record train/val loss and accuracy after each training epoch, and then generate plots after each training run.

NOTE: we can take this code directly from Ram's and my 266 project and plug it in here. It doesn't need new development.

Investigate if/how to reduce the used classes

We need to investigate what's the best way to reduce the total classes of the data set.

Create the vggish Keras model

Fill in the stubbed out code for creating the vggish model, based on:

https://github.com/DTaoo/VGGish/blob/master/vggish.py

add Batch Normalization before feeding audio into model

Additional EDA plots

Apply the resampling logic from our code to see what the audio waveforms look like after resampling. (The current value of the SAMPLE_RATE constant in mel_params.py is 16000.)
Generate a mel spectrogram of the resampled audio and plot that. That way we will be able to visualize the image in the same format that the model will train on. I think that will help us pick some species to classify which show some visual distinction.

Set up MLFlow logging server for recording results from experimental runs

I can set this up (I've done it several times). It makes it easy to keep track of training runs -- we can push parameters, metrics, and artifacts (e.g. trained weights, loss plots, etc...) to the logging server. It makes it much easier to keep track of experimental runs and retrieve data or assets associated with them if/when we need it later.

notebook for exploring dataset

crawl file system
count files per label
total, avg sample length for each label

paloukari / orcadetector Goto Github PK

orcadetector's People

Contributors

Stargazers

Watchers

Forkers

orcadetector's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs