ibm / max-audio-classifier Goto Github PK

View Code? Open in Web Editor NEW

152.0 29.0 56.0 39.16 MB

Identify sounds in short audio clips

Home Page: https://developer.ibm.com/exchanges/models/all/max-audio-classifier/

License: Apache License 2.0

Python 97.57% Dockerfile 2.43%

docker-image machine-learning machine-learning-models audio-classification keras-tensorflow

max-audio-classifier's Introduction

IBM Developer Model Asset Exchange: Audio Classifier

This repository contains code to instantiate and deploy an audio classification model. This model recognizes a signed 16-bit PCM wav file as an input, generates embeddings, applies PCA transformation/quantization, uses the embeddings as an input to a multi-attention classifier and outputs top 5 class predictions and probabilities as output. The model currently supports 527 classes which are part of the Audioset Ontology. The classes and the label_ids can be found in class_labels_indices.csv. The model was trained on AudioSet as described in the paper 'Multi-level Attention Model for Weakly Supervised Audio Classification' by Yu et al.

The model has been tested across multiple audio classes, however it tends to perform best for Music / Speech categories. This is largely due to the bias towards these classes in the training dataset (90% of audio belong to either of these categories). Though the model is trained on data from Audioset which was extracted from YouTube videos, the model can be applied to a wide range of audio files outside the domain of music/speech. The test assets provided along with this model provide a broad range.

The model files are hosted on IBM Cloud Object Storage. The code in this repository deploys the model as a web service in a Docker container. This repository was developed as part of the IBM Developer Model Asset Exchange and the public API is powered by IBM Cloud.

Model Metadata

Domain	Application	Industry	Framework	Training Data	Input Data Format
Audio	Classification	Multi	Keras/TensorFlow	Google AudioSet	signed 16-bit PCM WAV audio file

References

Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, Marvin Ritter,"Audio set: An ontology and human-labeled dataset for audio events", IEEE ICASSP, 2017.
Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley,"Audio Set classification with attention model: A probabilistic perspective." arXiv preprint arXiv:1711.00927 (2017).
Changsong Yu, Karim Said Barsim, Qiuqiang Kong, Bin Yang ,"Multi-level Attention Model for Weakly Supervised Audio Classification." arXiv preprint arXiv:1803.02353 (2018).
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold et al., "CNN architectures for large-scale audio classification," arXiv preprint arXiv:1609.09430, 2016.

Licenses

Component	License	Link
This repository	Apache 2.0	LICENSE
Model Files	Apache 2.0	AudioSet
Model Code	MIT	AudioSet Classification
Test Samples	Various	Samples README

Pre-requisites:

docker: The Docker command-line interface. Follow the installation instructions for your system.
The minimum recommended resources for this model is 8 GB Memory and 4 CPUs.
If you are on x86-64/AMD64, your CPU must support AVX at the minimum.

Deployment options

Deploy from Quay
Deploy on Red Hat OpenShift
Deploy on Kubernetes
Run Locally

Deploy from Quay

To run the docker image, which automatically starts the model serving API, run:

$ docker run -it -p 5000:5000 quay.io/codait/max-audio-classifier

This will pull a pre-built image from the Quay.io container registry (or use an existing image if already cached locally) and run it. If you'd rather checkout and build the model locally you can follow the run locally steps below.

Deploy on Red Hat OpenShift

You can deploy the model-serving microservice on Red Hat OpenShift by following the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial, specifying quay.io/codait/max-audio-classifier as the image name.

Deploy on Kubernetes

You can also deploy the model on Kubernetes using the latest docker image on Quay.

On your Kubernetes cluster, run the following commands:

$ kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Audio-Classifier/master/max-audio-classifier.yaml

The model will be available internally at port 5000, but can also be accessed externally through the NodePort.

A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here

1. Build the Model

Clone this repository locally. In a terminal, run the following command:

$ git clone https://github.com/IBM/MAX-Audio-Classifier.git

Change directory into the repository base folder:

$ cd MAX-Audio-Classifier

To build the Docker image locally, run:

$ docker build -t max-audio-classifier .

All required model assets will be downloaded during the build process. Note that currently this Docker image is CPU only (we will add support for GPU images later).

2. Deploy the Model

To run the Docker image, which automatically starts the model serving API, run:

$ docker run -it -p 5000:5000 max-audio-classifier

3. Use the Model

The API server automatically generates an interactive Swagger documentation page. Go to http://localhost:5000 to load it. From there you can explore the API and also create test requests.

Note : The input is a 10 second signed 16-bit PCM wav audio file. Files longer than 10 seconds will be clipped so that only the first 10 seconds will be used by the model. Conversely, files shorter than 10 seconds will be repeated to create a clip 10 seconds in length.

Use the model/predict endpoint to load a signed 16-bit PCM wav audio file (you can use the fireworks.wav file located in the samples folder) and get predictions from the API.

You can also test it on the command line, for example (with the thunder.wav file):

$ curl -F "audio=@samples/thunder.wav;type=audio/wav" -XPOST http://localhost:5000/model/predict

You should see a JSON response like that below:

{
    "status": "ok",
    "predictions": [
        {
            "label_id": "/m/06mb1",
            "label": "Rain",
            "probability": 0.7376469373703003
        },
        {
            "label_id": "/m/0ngt1",
            "label": "Thunder",
            "probability": 0.60517817735672
        },
        {
            "label_id": "/t/dd00038",
            "label": "Rain on surface",
            "probability": 0.5905200839042664
        },
        {
            "label_id": "/m/0jb2l",
            "label": "Thunderstorm",
            "probability": 0.5793699026107788
        },
        {
            "label_id": "/m/07yv9",
            "label": "Vehicle",
            "probability": 0.34878015518188477
        }
    ]
}

4. Development

To run the Flask API app in debug mode, edit config.py to set DEBUG = True under the application settings. You will then need to rebuild the Docker image (see step 1).

5. Cleanup

To stop the Docker container, type CTRL + C in your terminal.

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.

max-audio-classifier's People

Contributors

Stargazers

Watchers

max-audio-classifier's Issues

pip install from requirement.txt

Example PR:
IBM/MAX-Review-Text-Generator#7

Extend the scoring endpoint to accept an optional filter parameter

As a developer, I might want to use the model to detect the presence of a particular sound (or group of sounds). If I wanted to do that today, I'd have to invoke the scoring endpoint and filter the returned results on the client side. To simplify the process, we should extend the endpoint to optionally perform filtering on the server side if a certain parameter (containing an array of labels) is specified as input. The output format is not impacted by this ER.

Error building container on raspberry pi

Hello,

I'm having a similar issue like on #60 while building the container on raspberry os (Bullseye).

Step 11/13 : RUN sha512sum -c sha512sums.txt ---> Running in 786bbb077460 assets/classifier_model.h5: FAILED open or read assets/vggish_model.ckpt: FAILED open or read assets/vggish_pca_params.npz: FAILED open or read sha512sum: assets/classifier_model.h5: No such file or directory sha512sum: assets/vggish_model.ckpt: No such file or directory sha512sum: assets/vggish_pca_params.npz: No such file or directory sha512sum: WARNING: 3 listed files could not be read The command '/bin/sh -c sha512sum -c sha512sums.txt' returned a non-zero code: 1

I checked for CRLF but since I'm on linux i don't think, thats the issue. Also i commented out RUN sha512sum -c sha512sums.txt. This will create the container but the /assets only has "SRGAN" in it.

Can you guys help?

Add support for other popular audio format(s)

For example, it would be great if we could support mp3. There are some Python utilities that convert between audio formats, such as https://github.com/jiaaro/pydub/.

Add request param descriptions and update Swagger models help text

Example in PR:
IBM/MAX-Object-Detector#20

Prediction fails on Red Hat OpenShift

The following error is raised, when a prediction is performed:


[2019-08-05 21:56:03,078] ERROR in app: Exception on /model/predict [POST]
--
  | Traceback (most recent call last):
  | File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1832, in full_dispatch_request
  | rv = self.dispatch_request()
  | File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1818, in dispatch_request
  | return self.view_functions[rule.endpoint](**req.view_args)
  | File "/opt/conda/lib/python3.6/site-packages/flask_restplus/api.py", line 319, in wrapper
  | resp = resource(*args, **kwargs)
  | File "/opt/conda/lib/python3.6/site-packages/flask/views.py", line 88, in view
  | return self.dispatch_request(*args, **kwargs)
  | File "/opt/conda/lib/python3.6/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
  | resp = meth(*args, **kwargs)
  | File "/opt/conda/lib/python3.6/site-packages/flask_restplus/marshalling.py", line 136, in wrapper
  | resp = f(*args, **kwargs)
  | File "/workspace/api/predict.py", line 63, in post
  | file = open("audio.wav", "wb")
  | PermissionError: [Errno 13] Permission denied: 'audio.wav'

The issue is that the code assumes that it has write access in the current working directory https://github.com/IBM/MAX-Audio-Classifier/blob/master/api/predict.py#L59-L89. The temporary audio file should be created in a temporary location that any user should have write access to.

Swagger specification is outdated

Swagger spec reports version 0.1 + "An API for serving models"

Launch error

Hello, I'm having problems when lauching the program.
This is the errors i'm getting :

2023-10-04 16:51:06 Traceback (most recent call last):
2023-10-04 16:51:06 File "app.py", line 17, in
2023-10-04 16:51:06 from api import ModelMetadataAPI, ModelPredictAPI
2023-10-04 16:51:06 File "/home/max/api/init.py", line 16, in
2023-10-04 16:51:06 from .metadata import ModelMetadataAPI # noqa
2023-10-04 16:51:06 File "/home/max/api/metadata.py", line 16, in
2023-10-04 16:51:06 from core.model import ModelWrapper
2023-10-04 16:51:06 File "/home/max/core/model.py", line 19, in
2023-10-04 16:51:06 import tensorflow as tf
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow/init.py", line 102, in
2023-10-04 16:51:06 from tensorflow_core import *
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/init.py", line 28, in
2023-10-04 16:51:06 from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
2023-10-04 16:51:06 File "", line 1019, in _handle_fromlist
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow/init.py", line 50, in getattr
2023-10-04 16:51:06 module = self._load()
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow/init.py", line 44, in _load
2023-10-04 16:51:06 module = _importlib.import_module(self.name)
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/importlib/init.py", line 127, in import_module
2023-10-04 16:51:06 return _bootstrap._gcd_import(name[level:], package, level)
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/init.py", line 52, in
2023-10-04 16:51:06 from tensorflow.core.framework.graph_pb2 import *
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/core/framework/graph_pb2.py", line 16, in
2023-10-04 16:51:06 from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/core/framework/node_def_pb2.py", line 16, in
2023-10-04 16:51:06 from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/core/framework/attr_value_pb2.py", line 16, in
2023-10-04 16:51:06 from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/core/framework/tensor_pb2.py", line 16, in
2023-10-04 16:51:06 from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/core/framework/resource_handle_pb2.py", line 16, in
2023-10-04 16:51:06 from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/core/framework/tensor_shape_pb2.py", line 42, in
2023-10-04 16:51:06 serialized_options=None, file=DESCRIPTOR),
2023-10-04 16:51:06 File "/opt/conda/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 561, in new
2023-10-04 16:51:06 _message.Message._CheckCalledFromGeneratedFile()
2023-10-04 16:51:06 TypeError: Descriptors cannot not be created directly.
2023-10-04 16:51:06 If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
2023-10-04 16:51:06 If you cannot immediately regenerate your protos, some other possible workarounds are:
2023-10-04 16:51:06 1. Downgrade the protobuf package to 3.20.x or lower.
2023-10-04 16:51:06 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
2023-10-04 16:51:06
2023-10-04 16:51:06 More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

I tried installing an older version of protobuf (3.20) and forcing it in the requirements.txt, i put the lastest version of tensorflow
Any idea what i'm doing wrong ?
Thanks

Increase travis sleep time

Increase the sleep time in .travis.yml. It's fine on Travis CI but the internal CI tools need more time.

Document ARM in README

Document ARM support in the README (eg IBM/MAX-Object-Detector@5d25fa5)

Lack of AVX in Docker image leading to Tensorflow crash?

With the default version of numpy (1.13.1) and tensorflow (1.8.0) given in the Dockerfile, I get a message saying Illegal instruction (core dumped). Downgrading Tensorflow to 1.5.0 fixes the import issue, so I believe the problem is a lack of AVX support in the Docker image (see tensorflow/tensorflow#17411). However, the code doesn't seem to be backward compatible, as with tensorflow 1.5.0 (and numpy 1.15.1 since 1.13.1 wasn't compatible) I then get:

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool

Here is my original stack trace via GDB (from the unmodified Dockerfile):

#0  0x00007fffce85dfd0 in std::pair<std::__detail::_Node_iterator<std::pair<tensorflow::Stri\
ngPiece const, std::function<bool (tensorflow::Variant*)> >, false, true>, bool> std::_Hasht\
able<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, std::function<bool (t\
ensorflow::Variant*)> >, std::allocator<std::pair<tensorflow::StringPiece const, std::functi\
on<bool (tensorflow::Variant*)> > >, std::__detail::_Select1st, std::equal_to<tensorflow::St\
ringPiece>, tensorflow::hash<tensorflow::StringPiece>, std::__detail::_Mod_range_hashing, st\
d::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hash\
table_traits<true, false, true> >::_M_emplace<std::pair<tensorflow::StringPiece, std::functi\
on<bool (tensorflow::Variant*)> > >(std::integral_constant<bool, true>, std::pair<tensorflow\
::StringPiece, std::function<bool (tensorflow::Variant*)> >&&) ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#1  0x00007fffce8618e5 in tensorflow::UnaryVariantOpRegistry::RegisterDecodeFn(std::string c\
onst&, std::function<bool (tensorflow::Variant*)> const&) ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#2  0x00007fffce83d95c in tensorflow::variant_op_registry_fn_registration::UnaryVariantDecod\
eRegistration<tensorflow::Tensor>::UnaryVariantDecodeRegistration(std::string const&) ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#3  0x00007fffce7a91f5 in _GLOBAL__sub_I_tensor.cc ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#4  0x00007ffff7de885a in call_init (l=<optimized out>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffec18, env=env@entry=0x555555e59d40) at dl-init.c:72
#5  0x00007ffff7de896b in call_init (env=0x555555e59d40, argv=0x7fffffffec18, argc=2,
    l=<optimized out>) at dl-init.c:30
#6  _dl_init (main_map=main_map@entry=0x5555566dfde0, argc=2, argv=0x7fffffffec18,
    env=0x555555e59d40) at dl-init.c:120
#7  0x00007ffff7decf18 in dl_open_worker (a=a@entry=0x7fffffff6380) at dl-open.c:575
#8  0x00007ffff7de8704 in _dl_catch_error (objname=objname@entry=0x7fffffff6370,
    errstring=errstring@entry=0x7fffffff6378, mallocedp=mallocedp@entry=0x7fffffff636f,
    operate=operate@entry=0x7ffff7decb30 <dl_open_worker>, args=args@entry=0x7fffffff6380)
    at dl-error.c:187

How did everyone else fix this (or this somehow specific to my set-up)?

Note: I've only worked with PyTorch, not Tensorflow/Keras

Long running instance has old title

Current title for long-running instance: Model Asset Exchange Microservice
To be changed: MAX Audio Classifier

Application not running in heroku

Application error

An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command
heroku logs --tail

2018-10-31T12:13:08.838546+00:00 app[web.1]: * Serving Flask app "app" (lazy loading)
2018-10-31T12:13:08.838730+00:00 app[web.1]: * Environment: production
2018-10-31T12:13:08.841062+00:00 app[web.1]: WARNING: Do not use the development server in a production environment.
2018-10-31T12:13:08.841387+00:00 app[web.1]: Use a production WSGI server instead.
2018-10-31T12:13:08.841452+00:00 app[web.1]: * Debug mode: off
2018-10-31T12:13:11.523163+00:00 app[web.1]: Using TensorFlow backend.
2018-10-31T12:13:11.523187+00:00 app[web.1]: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2018-10-31T12:13:22.226579+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
2018-10-31T12:13:22.226579+00:00 heroku[web.1]: Stopping process with SIGKILL
2018-10-31T12:13:22.511802+00:00 heroku[web.1]: Process exited with status 137
2018-10-31T12:13:22.530997+00:00 heroku[web.1]: State changed from starting to crashed
2018-10-31T12:13:22.533485+00:00 heroku[web.1]: State changed from crashed to starting
2018-10-31T12:14:05.139706+00:00 heroku[router]: at=error code=H20 desc="App boot timeout" method=GET path="/" host=shocking-citadel-93453.herokuapp.com request_id=7fead736-998e-4eac-ba2e-1e7802a25ec3 fwd="103.99.148.171" dyno= connect= service= status=503 bytes= protocol=https
2018-10-31T12:14:07.746292+00:00 heroku[web.1]: Starting process with command /bin/sh -c python\ app.py
2018-10-31T12:14:10.722000+00:00 app[web.1]: [WARN tini (4)] Tini is not running as PID 1 and isn't registered as a child subreaper.
2018-10-31T12:14:10.722030+00:00 app[web.1]: Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
2018-10-31T12:14:10.722036+00:00 app[web.1]: To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
2018-10-31T12:14:48.668818+00:00 app[web.1]: * Serving Flask app "app" (lazy loading)
2018-10-31T12:14:48.668878+00:00 app[web.1]: * Environment: production
2018-10-31T12:14:48.674407+00:00 app[web.1]: WARNING: Do not use the development server in a production environment.
2018-10-31T12:14:48.674877+00:00 app[web.1]: Use a production WSGI server instead.
2018-10-31T12:14:48.674964+00:00 app[web.1]: * Debug mode: off
2018-10-31T12:14:50.868248+00:00 app[web.1]: Using TensorFlow backend.
2018-10-31T12:14:50.868263+00:00 app[web.1]: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2018-10-31T12:15:05.937244+00:00 heroku[router]: at=error code=H20 desc="App boot timeout" method=GET path="/" host=shocking-citadel-93453.herokuapp.com request_id=9e325bbd-846b-451e-a336-a7c39633ccd3 fwd="103.99.148.171" dyno= connect= service= status=503 bytes= protocol=https
2018-10-31T12:15:08.174990+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
2018-10-31T12:15:08.174990+00:00 heroku[web.1]: Stopping process with SIGKILL
2018-10-31T12:15:08.518134+00:00 heroku[web.1]: State changed from starting to crashed
2018-10-31T12:15:08.500982+00:00 heroku[web.1]: Process exited with status 137
2018-10-31T12:15:08.731539+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/" host=shocking-citadel-93453.herokuapp.com request_id=c0b6e63b-8e18-459c-848a-c192d83422c2 fwd="103.99.148.171" dyno= connect= service= status=503 bytes= protocol=https
2018-10-31T12:16:54.389356+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/" host=shocking-citadel-93453.herokuapp.com request_id=849239a0-19ca-49f6-9ca4-41355911d118 fwd="103.99.148.171" dyno= connect= service= status=503 bytes= protocol=https
2018-10-31T12:16:54.872502+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/favicon.ico" host=shocking-citadel-93453.herokuapp.com request_id=76e7d4cc-4464-49d4-9f67-6197c11e6341 fwd="103.99.148.171" dyno= connect= service= status=503 bytes= protocol=https
2018-10-31T12:18:48.486029+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/" host=shocking-citadel-93453.herokuapp.com request_id=d670a063-accc-4fa4-a2f3-e4bb9e8bdbe2 fwd="103.99.148.171" dyno= connect= service= status=503 bytes= protocol=https

Error in building Docker Image in Windows

Hello, I have tried MAX audio Classifier in Ubuntu and it works effortlessly.

I want to know can we use it in Windows also when I try to build the image I got the following error
sha512sum: 'assets/classifier_model.h5'$'\r': No such file or directory sha512sum: 'assets/vggish_model.ckpt'$'\r': No such file or directory

(Here is the Image of error)

Do not use absolute paths

Package/compress model artifacts on COS

The [checkpoint] assets are currently stored in individual files on COS, e.g.
http://max-assets.s3-api.us-geo.objectstorage.softlayer.net/audioset/vggish_model.ckpt
http://max-assets.s3-api.us-geo.objectstorage.softlayer.net/audioset/vggish_pca_params.npz. The individual files should be packaged and compressed in a tar.gz archive, following the approach used by other models. Refer to https://github.com/IBM/MAX-Object-Detector/blob/master/Dockerfile for an example.

Benefits:

Only a single file needs to be downloaded (less work)
Artifacts are compressed, potentially reducing the time it takes to download them

Error running the docker on Windows