In the release notes you mention flexible data formats including audio. I see lots of

I would disagree with <a class="user-mention notranslate" data-hovercard-type="user" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for looking into this <a class="user-mention notranslate" data-hovercard-type="

Support for audio use cases about turicreate HOT 24 CLOSED

apple commented on September 25, 2024 4

Support for audio use cases

from turicreate.

Comments (24)

TobyRoseman commented on September 25, 2024 3

It's slated for the 5.4 release, which we plan to go out in March.

from turicreate.

MatthewWaller commented on September 25, 2024 2

Would love to know about this as well, especially as it relates to speech recognition.

from turicreate.

MatthewWaller commented on September 25, 2024 1

I would disagree with @coolioxlr

Apple’s native SDK does not allow for all on-device speech recognition and even wants you not to speak health and other sensitive data.

And there is the opportunity to optimize for specific words in specialized fields. Any work you can do in general speech recognition on device with CoreML would be helpful.

from turicreate.

TobyRoseman commented on September 25, 2024 1

This conversation is certainly not dead. In fact I just put up two pull requests for a sound classifier.

@jamois - could you tell me more about your use case?

from turicreate.

davidcittadini commented on September 25, 2024 1

@TobyRoseman Have you had any more thoughts about using ML to apply "effects" to audio. ML could be very useful with non-linear audio, which existing coding approaches are not very good at. For example, ML could learn a distortion profile for an audio stream and then apply that same distortion profile to any clean audio stream. The trick is to then be able to apply the model to live, real-time audio streams.

from turicreate.

jamois commented on September 25, 2024 1

Excellent! Thanks for the update and have a nice weekend.

from turicreate.

coolioxlr commented on September 25, 2024 1

Thanks @TobyRoseman. This is exactly what I have been waiting. Looking forward to WWDC too.

from turicreate.

coolioxlr commented on September 25, 2024

from turicreate.

TobyRoseman commented on September 25, 2024

@jrjames83 @MatthewWaller @coolioxlr - Could you please share more details about what types of audio use cases you would like us to support?

from turicreate.

MatthewWaller commented on September 25, 2024

Thanks for looking into this @TobyRoseman.

Speech recognition, as mentioned before, would be great in a toolkit that takes something like frames of MFCC features and outputs probabilities of letters and punctuation at each frame. Something like the Deepspeech architecture that Mozilla is working on or Listen Attend Spell architectures that Google has recently published on.

Outside of that, it would be great to have a deep learning speaker diarization toolkit that can identify different speakers in an audio file.

from turicreate.

coolioxlr commented on September 25, 2024

@tbartelmess Will be great to provide a simple example like the following just detecting few commands https://www.tensorflow.org/tutorials/sequences/audio_recognition
or
https://github.com/aqibsaeed/Urban-Sound-Classification
I know we can kind of achieve this using the activity classification sample in Turi create but they are not optimized for audio classification. An iOS sample how to use the model will be helpful as well since we might have to convert the audio to spectrogram.

I don't think building another deep learning speech recognition model is helpful here since iOS already provides speech recognition in native SDK.

from turicreate.

narner commented on September 25, 2024

Hey there; just wanted to see if there was any update on this - thanks!

from turicreate.

TobyRoseman commented on September 25, 2024

@davidcittadini - that is a cool use case. Thanks for sharing. Unfortunately this is not possible with Turi Create.

from turicreate.

jamois commented on September 25, 2024

Hoping this conversation is not dead. I too am interested in a Turi example using audio, not necessarily for speech recognition. Thx.

from turicreate.

jamois commented on September 25, 2024

This conversation is certainly not dead. In fact I just put up two pull requests for a sound classifier.

@jamois - could you tell me more about your use case?

Sure. I just want to be able to train a model using audio files (e.g. .wav). So, for instance, if I have 5 sounds I want my system to recognize, I would train using 5 classes where each class would be represented by numerous (e.g. 100) sound files. I know all of this is possible via Tensorflow but would prefer (at the moment) to use Turi if possible. Thanks for the help!

from turicreate.

TobyRoseman commented on September 25, 2024

@jamois - Your use case sounds like exactly what we are planning to support with our new Sound Classifier.

from turicreate.

jamois commented on September 25, 2024

@jamois - Your use case sounds like exactly what we are planning to support with our new Sound Classifier.

Thanks for the update. When are you planning to roll this out?

from turicreate.

TobyRoseman commented on September 25, 2024

@davidcittadini - I have not thought more about this, but it sounds very interesting. I'd like to learn more. Are there any resources (ex: papers, blog posts, other products) you recommend?

from turicreate.

rplom commented on September 25, 2024

It's slated for the 5.4 release, which we plan to go out in March.

I was about to implement my own custom classifier when I ran into this post. How will it be accessed in the client code? IE: There's MLImageClassifier will there be a MLSoundClassifier? Or will clients writer their own?

from turicreate.

TobyRoseman commented on September 25, 2024

@rplom - to be clear: the Sound Classifier will be included in the next release of Turi Create. Two new functions will be added:

turicreate.load_audio(...)
turicreate.sound_classifier.create(...)

The first version of the sound classifier will support exporting to Core ML.

from turicreate.

TobyRoseman commented on September 25, 2024

Everything needed to use the Sound Classifier has now been merged into master. If you're willing to build from master, please give it a try.

I'm currently working on updating our User Guide with a Sound Classifier section. Until then you should be able to get started by using the docstrings of the above methods.

from turicreate.

rplom commented on September 25, 2024

This is great!

from turicreate.

jamois commented on September 25, 2024

Wow, great news! Thanks @TobyRoseman !

from turicreate.

TobyRoseman commented on September 25, 2024

Turi Create 5.4 is now launched. With this version you can create a sound classifier, using turicreate.load_audio(...) and turicreate.sound_classifier.create(...).

See the Sound Classifier Section of the User Guide for details.

Since we now support an audio use case, I'm going to close this issue. Feel free to open new issues, either about the sound classifier or for new audio use cases.

from turicreate.

Support for audio use cases about turicreate HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs