huggingface / audio-transformers-course Goto Github PK
View Code? Open in Web Editor NEWThe Hugging Face Course on Transformers for Audio
License: Apache License 2.0
The Hugging Face Course on Transformers for Audio
License: Apache License 2.0
In UNIT4 : Pretrained models for audio classification
Weโll load an official Audio Spectrogram Transformer checkpoint fine-tuned on the Speech Commands dataset, under the namespace "MIT/ast-finetuned-speech-commands-v2":
Copied
classifier = pipeline(
"audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])
Fix it to be classifier(sample["audio"]["array"])
I don't know how to make a pull request yet! :)
Original:
from IPython.display import Audio
classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])
Modified working:
from IPython.display import Audio
sample = next(iter(speech_commands))
classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])
I am going through the Load and explore an audio dataset section of the tutorial.
And I saw that code and code outputs are contained in the same code block, which I believe will confuse people.
I strongly belief that code and code output should go into different blocks of code, or the output may even be inserted as plaitext with monospace font, outside of a code block.
Please make this change.
If you need help, I can create PRs doing this. Let me know once you decide the style.
MMS is already released in transformers
, so https://huggingface.co/learn/audio-course/chapter6/pre-trained_models#massive-multilingual-speech-mms could be updated to use that instead
I'm getting the following error while creating an instance of TrainingArguments in Unit-4:
ImportError: Using the Trainer
with PyTorch
requires accelerate>=0.21.0
: Please run pip install transformers[torch]
or pip install accelerate -U
I've tried doing both "pip install transformers[torch]" as well as "pip install accelerate -U" but the same error pops up still.
There is no issue in importing TrainingArguments, the error pops up while creating an instance only.
On the setting up section of this course, it says that:
Google Colab for hands-on exercises. The free version is enough.
But the section where the feature extractor is applied to the music database, the Colab runtime crashes saying that it crashed due to low RAM.
What could be a possible workaround?
Hi,
I previously reported #114 and it was fixed, but it seems the evaluation page is not fixed yet. I get all checks on self evaluation page but somehow certificate page doesn't understand that I have all the needed projects.
Hi๐,
I think the output should be:
DatasetDict({
train: Dataset({
features: ['genre', 'input_values', 'attention_mask'],
num_rows: 899
})
test: Dataset({
features: ['genre', 'input_values', 'attention_mask'],
num_rows: 100
})
})
Instead of
audio-transformers-course/chapters/en/chapter4/fine-tuning.mdx
Lines 321 to 336 in c3a8701
Since return_attention_mask=True
in the feature_extractor
. Is this the case?
when running this comamnd:
doc-builder preview audio-transformers-course ../audio-transformers-course/chapters/en --not_python_module
we got this:
Traceback (most recent call last):
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 31, in check_node_is_available
p = subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'node'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/silvacarl/.local/bin/doc-builder", line 8, in
sys.exit(main())
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
args.func(args)
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/preview.py", line 158, in preview_command
check_node_is_available()
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 40, in check_node_is_available
raise EnvironmentError(
OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.
any ideas?
Hi,
Your course is so awesome! I think that you should consider adding a section about Stream ASR models
Regards,
Michael
Seems like there's a problem with the code for self evaluation. I've completed the unit 4 hands on task but self evaluation doesn't count it as succesful. I took a look at the code and it seem's like my model card outputs the accuracy as eval_accuracy instead of Accuracy. Am I doing something wrong or the model card pattern has been changed and the self evaluation script is not yet updated?
Im chapters/en/chapter4/fine-tuning.mdx the following snippet is presented for loading the gtzan dataset:
from datasets import load_dataset
gtzan = load_dataset("marsyas/gtzan", "all")
gtzan
This returns a DatasetDict object, not a Dataset object, which causes the next snippet to fail:
gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan
When I run these together as is I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-4-0475a19d0be5>](https://localhost:8080/#) in <cell line: 1>()
----> 1 gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
2 gtzan
AttributeError: 'DatasetDict' object has no attribute 'train_test_split'
I can bypass this by pointing the train_test_split function to the "train" split within the original DatasetDict object returned by the load_dataset function:
gtzan = gtzan["train"].train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan
Output:
DatasetDict({
train: Dataset({
features: ['file', 'audio', 'genre'],
num_rows: 899
})
test: Dataset({
features: ['file', 'audio', 'genre'],
num_rows: 100
})
})
Recommend updating the second code snippet to call train_test_split
on the "train"
split. Unless there is a way to get load_dataset
to return the Dataset object itself - I'm not even sure what the "all" flag refers to there. I can make this change but was instructed on the discord server to file an issue.
In the section Speech Commands, the code that is supposed to be run is:
classifier = pipeline(
"audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])
But it leads to the following error:
ValueError: We expect a numpy ndarray as input
ValueError Traceback (most recent call last)
[<ipython-input-8-a13009e6d325>](https://localhost:8080/#) in <cell line: 4>()
2 "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
3 )
----> 4 classifier(sample["audio"])
3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in __call__(self, inputs, **kwargs)
128 - **score** (`float`) -- The corresponding probability.
129 """
--> 130 return super().__call__(inputs, **kwargs)
131
132 def _sanitize_parameters(self, top_k=None, **kwargs):
[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
1118 )
1119 else:
-> 1120 return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
1121
1122 def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):
[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
1124
1125 def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
-> 1126 model_inputs = self.preprocess(inputs, **preprocess_params)
1127 model_outputs = self.forward(model_inputs, **forward_params)
1128 outputs = self.postprocess(model_outputs, **postprocess_params)
[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in preprocess(self, inputs)
153
154 if not isinstance(inputs, np.ndarray):
--> 155 raise ValueError("We expect a numpy ndarray as input")
156 if len(inputs.shape) != 1:
157 raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")
ValueError: We expect a numpy ndarray as input
Hi,
Your course is so awesome! I think that you should consider adding a section about training ASR with Lora:
https://github.com/Vaibhavs10/fast-whisper-finetuning
Regards,
Michael
In the section about preprocessing, it would be useful to add type/shape information of data produced after pre processing the data.
Specifically,
here it be very useful to add as a comment what is the type/shape ofinput_features
. Is it 3d array of floats like [time, freq, ampl] ?The course link in the README is broken (leads to https://huggingface.co/audio-course). I think it should be https://huggingface.co/learn/audio-course/
An import error is raised when the TrainingArguments() is called.
[ImportError: Using the Trainer
with PyTorch
requires accelerate>=0.20.1
: Please run pip install transformers[torch]
or pip install accelerate -U
]
However, this issue continues even if the latest version of Accelerate (0.21.0) is installed.
@sanchit-gandhi You can have a look at this notebook.
Hi there ๐
Let's translate the course to Korean so that the whole community can benefit from this resource ๐!
Below are the chapters and files that need translating - let us know here if you'd like to translate any. Once you're finished, open a pull request and tag this issue by including #issue-number
in the description, where issue-number
is the number of this issue.
๐ If you'd like others to help you with the translation, you can also post in our forums.
This PR template from @gabrielwithappy might help.
Unit 0: Welcome to the course!
Unit 1: Working with audio data
introduction.mdx
audio_data.mdx
load_and_explore.mdx
preprocessing.mdx
streaming.mdx
quiz.mdx
supplemental_reading.mdx
Unit 2: A gentle introduction to audio applications
Unit 3: Transformer architectures for audio
Unit 4: Build a music genre classifier
Course Events
In the unit 5 of the audio course, the following code is used:
class DataCollatorSpeechSeq2SeqWithPadding:
processor: Any
def __call__(
self, features: List[Dict[str, Union[List[int], torch.Tensor]]]
) -> Dict[str, torch.Tensor]:
# split inputs and labels since they have to be of different lengths and need different padding methods
# first treat the audio inputs by simply returning torch tensors
input_features = [
{"input_features": feature["input_features"][0]} for feature in features
]
batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")
# get the tokenized label sequences
label_features = [{"input_ids": feature["labels"]} for feature in features]
# pad the labels to max length
labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")
# replace padding with -100 to ignore loss correctly
labels = labels_batch["input_ids"].masked_fill(
labels_batch.attention_mask.ne(1), -100
)
# if bos token is appended in previous tokenization step,
# cut bos token here as it's append later anyways
if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
labels = labels[:, 1:]
batch["labels"] = labels
return batch
However, according to the following issue, bos_token_id
shouldn't be used (@ArthurZucker). In my opinion, this should be replaced with self.processor.tokenizer.convert_tokens_to_ids("<|startoftranscript|>")
or with model.config.decoder_start_token_id
. What do you think?
Note if this is true, then there would be a similar error in @sanchit-gandhi's fine-tuning tutorial too.
Thanks for your attention.
Regards,
Tony
Good day @sanchit-gandhi , @MKhalusova and the whole course team! Congratulations on the new year 2024!
First of all, I would like to thank you and the entire course team once again for the work done. The course turned out to be very good and allowed many of us to immerse ourselves in the topic of working with sound, some of us even managed to find a job thanks to your course. Especially liked the practical orientation of the course and the good and quite accessible presentation of the theoretical material. On behalf of all the members of the course translation team - "Thank you so much for your work!".
Over the past two months, people who have taken the course in Russian have been leaving feedback on topics they would like to see covered in the course. Together with Sergey (@Lightmourne), we systematized this feedback, which eventually became our request in this issue. If possible, could you further address the following topics in the course?
List of topics:
Hi!
I just finished training a model for hands-on exercise of Unit 4. Since even DistilHuBERT can take hours to train, I used PEFT library and used LoRA to finetune the model. When finished, I pushed my model to hf hub using the command provided in the course, but slightly modified it to get rid of error I was getting.
The provided command was:
trainer.push_to_hub(**kwargs)
I used:
trainer.model.push_to_hub(**kwargs)
This is the repository for the model: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan.
However when I try to check my progress with the Check My Progress space, it doesn't show any evaluation results for my model. The progress table stays with default values.
I thought maybe it has something to do with me pushing LoRA model, not the full version. So I merged the LoRA weights with the base model weights and pushed it to huggingface hub, resulting in this repository: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan-merged.
The Check My Progress space still didn't update after 24 hours.
I never modified kwargs
that was provided in the course.
What am I doing wrong? Is there something I'm missing here?
This line should be changed to the line below
# original
gtzan = load_dataset("marsyas/gtzan", "all")
# change to
gtzan = load_dataset("marsyas/gtzan", "all").get('train')
Hello!
I translated chapter2 into Russian. After finishing the translation, I wanted to test the correct display of the content of this chapter. For this purpose I used doc-builder. I ran into a problem because the files asr_pipeline.mdx and audio_classification_pipeline.mdx are not displayed in the browser at http://localhost:3000/. Instead I get the following message:
Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
ReferenceError: Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
at base64 (file:///tmp/tmpalt8krht/kit/preprocess.js:435:28)
at highlighter (file:///tmp/tmpalt8krht/kit/preprocess.js:488:13)
at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25745:25
at Array.map (<anonymous>)
at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25743:11
at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:206:19)
at next (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:299:28)
at done (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:236:16)
at then (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:243:5)
at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:226:9)
I would like to note that the files introduction.mdx and hands_on.mdx are displayed correctly. These files do not contain blocks with python code.
How to solve this problem?
I attach a screenshot of one of the correctly displayed pages.
Here is the main page:
https://huggingface.co/learn/audio-course/chapter0/introduction
It has internal links to units like this:
https://huggingface.co/learn/audio-course/chapter1
which end up at :
https://huggingface.co/docs/audio-course/main/en/chapter1
With a 404 error.
On the left menu bar, we see working links with:
https://huggingface.co/learn/audio-course/chapter1/introduction
Seem's like the PolyAI/minds14 dataset isn't available. When loading from code with load_dataset("PolyAI/minds14", "en-US")
, I get the following error: ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429)
. After reaching the following link, DropBox says:
Link temporarily disabled
This can happen when the link has been shared or downloaded too many times in a day. Check back later and weโll open access to more people.
It'd be great if this could be fixed, or at least replaced with another dataset which is available so that people can finish the course (which is great btw). ๐ค
The code was run on Colab Pro+ environment with A100 GPU, and the following is the entire traceback:
`---------------------------------------------------------------------------
ConnectionError Traceback (most recent call last)
in <cell line: 5>()
3 minds = DatasetDict()
4
----> 5 minds["train"] = load_dataset(
6 "PolyAI/minds14", "en-US",
7 )
10 frames
/usr/local/lib/python3.10/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
2151
2152 # Download and prepare data
-> 2153 builder_instance.download_and_prepare(
2154 download_config=download_config,
2155 download_mode=download_mode,
/usr/local/lib/python3.10/dist-packages/datasets/builder.py in download_and_prepare(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs)
952 if num_proc is not None:
953 prepare_split_kwargs["num_proc"] = num_proc
--> 954 self._download_and_prepare(
955 dl_manager=dl_manager,
956 verification_mode=verification_mode,
/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs)
1715
1716 def _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs):
-> 1717 super()._download_and_prepare(
1718 dl_manager,
1719 verification_mode,
/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_split_kwargs)
1025 split_dict = SplitDict(dataset_name=self.dataset_name)
1026 split_generators_kwargs = self._make_split_generators_kwargs(prepare_split_kwargs)
-> 1027 split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
1028
1029 # Checksums verification
~/.cache/huggingface/modules/datasets_modules/datasets/PolyAI--minds14/65c7e0f3be79e18a6ffaf879a083daf706312d421ac90d25718459cbf3c42696/minds14.py in _split_generators(self, dl_manager)
130 )
131
--> 132 archive_path = dl_manager.download_and_extract(self.config.data_url)
133 audio_path = dl_manager.extract(
134 os.path.join(archive_path, "MInDS-14", "audio.zip")
/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download_and_extract(self, url_or_urls)
563 extracted_path(s): str
, extracted paths of given URL(s).
564 """
--> 565 return self.extract(self.download(url_or_urls))
566
567 def get_recorded_sizes_checksums(self):
/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download(self, url_or_urls)
426
427 start_time = datetime.now()
--> 428 downloaded_path_or_paths = map_nested(
429 download_func,
430 url_or_urls,
/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py in map_nested(function, data_struct, dict_only, map_list, map_tuple, map_numpy, num_proc, parallel_min_length, types, disable_tqdm, desc)
454 # Singleton
455 if not isinstance(data_struct, dict) and not isinstance(data_struct, types):
--> 456 return function(data_struct)
457
458 disable_tqdm = disable_tqdm or not logging.is_progress_bar_enabled()
/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in _download(self, url_or_filename, download_config)
452 # append the relative path to the base_path
453 url_or_filename = url_or_path_join(self._base_path, url_or_filename)
--> 454 return cached_path(url_or_filename, download_config=download_config)
455
456 def iter_archive(self, path_or_buf: Union[str, io.BufferedReader]):
/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in cached_path(url_or_filename, download_config, **download_kwargs)
180 if is_remote_url(url_or_filename):
181 # URL, so get it from the cache (downloading if necessary)
--> 182 output_path = get_from_cache(
183 url_or_filename,
184 cache_dir=cache_dir,
/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only, use_etag, max_retries, token, use_auth_token, ignore_url_params, storage_options, download_desc)
599 raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
600 elif response is not None:
--> 601 raise ConnectionError(f"Couldn't reach {url} (error {response.status_code})")
602 else:
603 raise ConnectionError(f"Couldn't reach {url}")
ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429)`
Hi there ๐
Let's translate the course to Spanish
so that the whole community can benefit from this resource ๐!
Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number
in the description, where issue-number
is the number of this issue.
๐ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.
##Chapters
Unit 0: Welcome to the course!
Unit 1: Working with audio data
Unit 2: A gentle introduction to audio applications
Unit 3: Transformer architectures for audio
Unit 4: Build a music genre classifier
Unit 5: Automatic Speech Recognition
Course Events
Hi there ๐
Let's translate the course to BENGALI
so that the whole community can benefit from this resource ๐!
I would like to translate Chapters 0 and 1
Hi there ๐
Let's translate the course to Japanese so that the whole community can benefit from this resource ๐!
Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number
in the description, where issue-number
is the number of this issue.
๐ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.
0 - Setup
1 - Transformer models
2 - Using ๐ค Transformers
3 - Fine-tuning a pretrained model
4 - Sharing models and tokenizers
5 - The ๐ค Datasets library
6 - The ๐ค Tokenizers library
7 - Main NLP tasks
8 - How to ask for help
Events
Hey
I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:
When I follow the course and try to execute
# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]
it will fail because the path, or x
in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav
(which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).
Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path'
value in the dataset with the local path on my machine?
Alternatively, one could change the function call of librosa.get_duration(path=x)
and pass the audio array and the sampling_rate instead, e.g.
new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]
Hi there ๐
Let's translate the course to Russian so that the whole community can benefit from this resource ๐!
Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number
in the description, where issue-number
is the number of this issue.
๐ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.
UNIT 0. WELCOME TO THE COURSE!
introduction.mdx
@blademoon MINOR_FIX_DONEget_ready.mdx
@blademoon MINOR_FIX_DONECommunity.mdx
@blademoon MINOR_FIX_DONEUNIT 1. WORKING WITH AUDIO DATA
introduction.mdx
@blademoon MINOR_FIX_DONEaudio_data.mdx
@blademoon MINOR_FIX_DONEload_and_explore.mdx
@blademoon MINOR_FIX_DONEpreprocessing.mdx
@blademoon MINOR_FIX_DONEstreaming.mdx
@blademoon MINOR_FIX_DONEquiz.mdx
@blademoon MINOR_FIX_DONEsupplemental_reading.mdx
@blademoon MINOR_FIX_DONEUNIT 2. A GENTLE INTRODUCTION TO AUDIO APPLICATIONS
introductoin.mdx
@Lightmourneaudio_classification_pipeline.mdx
@Lightmourneasr_pipeline.mdx
@Lightmournehands_on.mdx
@LightmourneUNIT 3. TRANSFORMER ARCHITECTURES FOR AUDIO
introduction.mdx
@blademoonctc.mdx
@blademoonseq2seq.mdx
@blademoonclassification.mdx
@blademoonquiz.mdx
@blademoonsupplemental_reading.mdx
@blademoonUNIT 4. BUILD A MUSIC GENRE CLASSIFIER
introduction.mdx
@Lightmourneclassification_models.mdx
@Lightmournefine_tuning.mdx
@Lightmournedemo.mdx
hands_on.mdx
@LightmourneUNIT 5. AUTOMATIC SPEECH RECOGNITION
introduction.mdx
@Lightmourneasr_models.mdx
@Lightmournechoosing_dataset.mdx
@Lightmourneevaluation.mdx
@Lightmournedemo.mdx
@Lightmournehands_on.mdx
@Lightmournesupplemental_reading.mdx
@LightmourneUNIT 6. From text to speech
evaluation.mdx
@blademoonfine-tuning.mdx
@blademoonhands_on.mdx
@blademoonintroduction.mdx
@blademoonpre-trained_models.mdx
@blademoonsupplemental_reading.mdx
@blademoontts_datasets.mdx
@blademoonUNIT 7. Putting it all together
hands-on.mdx
@blademoonintroduction.mdx
@blademoonspeech-to-speech.mdx
@blademoonsupplemental_reading.mdx
@blademoontranscribe-meeting.mdx
@blademoonvoice-assistant.mdx
@blademoonUNIT 8. Finish line
certification.mdx
@blademoon MINOR_FIX_DONEintroduction.mdx
@blademoon MINOR_FIX_DONECourse Events
introduction.mdx
@blademoon MINOR_FIX_DONEAdding extra material:
Hi there ๐
Let's translate the course to Simplified Chinese
so that the whole community can benefit from this resource ๐!
Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number
in the description, where issue-number
is the number of this issue.
๐ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.
Unit 0: Welcome to the course!
Unit 1: Working with audio data
introduction.mdx
audio_data.mdx
load_and_explore.mdx
preprocessing.mdx
streaming.mdx
quiz.mdx
supplemental_reading.mdx
Unit 2: A gentle introduction to audio applications
Unit 3: Transformer architectures for audio
Unit 4: Build a music genre classifier
Unit 5: Automatic speech recognition
introduction.mdx
asr_models.mdx
choosing_dataset.mdx
evaluation.mdx
fine-tuning.mdx
demo.mdx
hands_on.mdx
supplemental_reading
Unit 6: From text to speech
introduction.mdx
tts_datasets.mdx
pre-trained_models.mdx
fine-tuning.mdx
evaluation.mdx
hands_on.mdx
supplemental_reading.mdx
Unit 7: Pulling it all together
introduction.mdx
speech-to-speech.mdx
voice-assistant.mdx
transcribe-meeting.mdx
hands-on.mdx
supplemental_reading.mdx
Unit 8: Finish line
Course Events
I need to login to access the "speechcolab/gigaspeech" dataset. This information is essential. It should be provided the footnote.chapters/en/chapter1/streaming.mdx
For Unit 7 task, (Speech to speech translation), I tried duplicating the given space and that gave me runtime error (could not allocate streams 404, hardware not available), then i create new space (/nimrita/speech-to-speech-translation-MMS1) from scratch and that too fails to build with same errors. Please help.
When doing "putting it all together" I ran into a problem because when I ran the wake up code the for loop gets skipped for some reason
Hi there ๐
Many course participants faced issues working through the course materials and exercises on a free tier of Google Colab. An alternative to it is Kaggle Notebooks (it provides a fixed amount of GPU hours but is consistent in experience).
However, there are some differences in working with Kaggle notebooks, such as pushing models to Hub and setting up your environment.
How can you help?
Thank you for deciding to volunteer your time and experience with the course.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.