The audio-transformers-course from huggingface

fix a typoe

In UNIT4 : Pretrained models for audio classification
We’ll load an official Audio Spectrogram Transformer checkpoint fine-tuned on the Speech Commands dataset, under the namespace "MIT/ast-finetuned-speech-commands-v2":

Copied
classifier = pipeline(
"audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])
Fix it to be classifier(sample["audio"]["array"])
I don't know how to make a pull request yet! :)

In chapter 4, had to add another next before the example started working

Original:

from IPython.display import Audio

classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

Modified working:

from IPython.display import Audio

sample = next(iter(speech_commands))
classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

Why are code and code outputs in the same code block in the text?

I am going through the Load and explore an audio dataset section of the tutorial.

And I saw that code and code outputs are contained in the same code block, which I believe will confuse people.

I strongly belief that code and code output should go into different blocks of code, or the output may even be inserted as plaitext with monospace font, outside of a code block.

Please make this change.

If you need help, I can create PRs doing this. Let me know once you decide the style.

Update MMS section

MMS is already released in transformers, so https://huggingface.co/learn/audio-course/chapter6/pre-trained_models#massive-multilingual-speech-mms could be updated to use that instead

Error while using TrainingArguments in Unit4

I'm getting the following error while creating an instance of TrainingArguments in Unit-4:

ImportError: Using the Trainer with PyTorch requires accelerate>=0.21.0: Please run pip install transformers[torch] or pip install accelerate -U

I've tried doing both "pip install transformers[torch]" as well as "pip install accelerate -U" but the same error pops up still.

There is no issue in importing TrainingArguments, the error pops up while creating an instance only.

Cannot finish tasks in Colab because runtime crashes due to low RAM.

On the setting up section of this course, it says that:

Google Colab for hands-on exercises. The free version is enough.

But the section where the feature extractor is applied to the music database, the Colab runtime crashes saying that it crashed due to low RAM.

What could be a possible workaround?

Certificate page doesn't correctly check all projects

Hi,
I previously reported #114 and it was fixed, but it seems the evaluation page is not fixed yet. I get all checks on self evaluation page but somehow certificate page doesn't understand that I have all the needed projects.

Wrong output on chapter4/fine-tuning

Hi👋,
I think the output should be:

DatasetDict({
    train: Dataset({
        features: ['genre', 'input_values', 'attention_mask'],
        num_rows: 899
    })
    test: Dataset({
        features: ['genre', 'input_values', 'attention_mask'],
        num_rows: 100
    })
})

Instead of

audio-transformers-course/chapters/en/chapter4/fine-tuning.mdx

Lines 321 to 336 in c3a8701

 gtzan_encoded 

 ``` 

 **Output:** 

 ```out 

 DatasetDict({ 

 train: Dataset({ 

 features: ['genre', 'input_values'], 

 num_rows: 899 

 }) 

 test: Dataset({ 

 features: ['genre', 'input_values'], 

 num_rows: 100 

 }) 

 }) 

 ```

Since return_attention_mask=True in the feature_extractor. Is this the case?

small issue: OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.

when running this comamnd:

doc-builder preview audio-transformers-course ../audio-transformers-course/chapters/en --not_python_module

we got this:

Traceback (most recent call last):
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 31, in check_node_is_available
p = subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'node'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/silvacarl/.local/bin/doc-builder", line 8, in
sys.exit(main())
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
args.func(args)
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/preview.py", line 158, in preview_command
check_node_is_available()
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 40, in check_node_is_available
raise EnvironmentError(
OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.

any ideas?

Adding new topic - Stream ASR models

Hi,

Your course is so awesome! I think that you should consider adding a section about Stream ASR models

Regards,
Michael

Self evaluation doesn't work for my model card

Seems like there's a problem with the code for self evaluation. I've completed the unit 4 hands on task but self evaluation doesn't count it as succesful. I took a look at the code and it seem's like my model card outputs the accuracy as eval_accuracy instead of Accuracy. Am I doing something wrong or the model card pattern has been changed and the self evaluation script is not yet updated?

Gtzan Split Unit 4

Im chapters/en/chapter4/fine-tuning.mdx the following snippet is presented for loading the gtzan dataset:

from datasets import load_dataset

gtzan = load_dataset("marsyas/gtzan", "all")
gtzan

This returns a DatasetDict object, not a Dataset object, which causes the next snippet to fail:

gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan

When I run these together as is I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-4-0475a19d0be5>](https://localhost:8080/#) in <cell line: 1>()
----> 1 gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
      2 gtzan

AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

I can bypass this by pointing the train_test_split function to the "train" split within the original DatasetDict object returned by the load_dataset function:

gtzan = gtzan["train"].train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan

Output:

DatasetDict({
    train: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 899
    })
    test: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 100
    })
})

Recommend updating the second code snippet to call train_test_split on the "train" split. Unless there is a way to get load_dataset to return the Dataset object itself - I'm not even sure what the "all" flag refers to there. I can make this change but was instructed on the discord server to file an issue.

Wrong keyword in "Unit 4: Pre-trained models and datasets for audio classification"

In the section Speech Commands, the code that is supposed to be run is:

classifier = pipeline(
    "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])

But it leads to the following error:

ValueError: We expect a numpy ndarray as input

Full Error Output

ValueError                                Traceback (most recent call last)

[<ipython-input-8-a13009e6d325>](https://localhost:8080/#) in <cell line: 4>()
      2     "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
      3 )
----> 4 classifier(sample["audio"])

3 frames

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in __call__(self, inputs, **kwargs)
    128             - **score** (`float`) -- The corresponding probability.
    129         """
--> 130         return super().__call__(inputs, **kwargs)
    131 
    132     def _sanitize_parameters(self, top_k=None, **kwargs):

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1118             )
   1119         else:
-> 1120             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
   1121 
   1122     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
   1124 
   1125     def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
-> 1126         model_inputs = self.preprocess(inputs, **preprocess_params)
   1127         model_outputs = self.forward(model_inputs, **forward_params)
   1128         outputs = self.postprocess(model_outputs, **postprocess_params)

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in preprocess(self, inputs)
    153 
    154         if not isinstance(inputs, np.ndarray):
--> 155             raise ValueError("We expect a numpy ndarray as input")
    156         if len(inputs.shape) != 1:
    157             raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")

ValueError: We expect a numpy ndarray as input

Adding Lora training to the course (For asr)

Hi,

Your course is so awesome! I think that you should consider adding a section about training ASR with Lora:
https://github.com/Vaibhavs10/fast-whisper-finetuning

Regards,
Michael

Suggestion to add shape info in preprocessing

In the section about preprocessing, it would be useful to add type/shape information of data produced after pre processing the data.

Specifically,

audio-transformers-course/chapters/en/chapter1/preprocessing.mdx

Line 186 in ac81306

input_features = example["input_features"]

here it be very useful to add as a comment what is the type/shape of input_features. Is it 3d array of floats like [time, freq, ampl] ?

Broken audio-course link

The course link in the README is broken (leads to https://huggingface.co/audio-course). I think it should be https://huggingface.co/learn/audio-course/

Unusual ImportError (Chapter 4, Fine-Tuning)

An import error is raised when the TrainingArguments() is called.
[ImportError: Using the Trainer with PyTorch requires accelerate>=0.20.1: Please run pip install transformers[torch] or pip install accelerate -U]
However, this issue continues even if the latest version of Accelerate (0.21.0) is installed.
@sanchit-gandhi You can have a look at this notebook.

Translation to Korean

Hi there 👋

Let's translate the course to Korean so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

🙋 If you'd like others to help you with the translation, you can also post in our forums.

This PR template from @gabrielwithappy might help.

Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Course Events

introduction.mdx

Error in DataCollatorSpeechSeq2SeqWithPadding (Unit 5)

In the unit 5 of the audio course, the following code is used:

class DataCollatorSpeechSeq2SeqWithPadding:
    processor: Any

    def __call__(
        self, features: List[Dict[str, Union[List[int], torch.Tensor]]]
    ) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lengths and need different padding methods
        # first treat the audio inputs by simply returning torch tensors
        input_features = [
            {"input_features": feature["input_features"][0]} for feature in features
        ]
        batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")

        # get the tokenized label sequences
        label_features = [{"input_ids": feature["labels"]} for feature in features]
        # pad the labels to max length
        labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(
            labels_batch.attention_mask.ne(1), -100
        )

        # if bos token is appended in previous tokenization step,
        # cut bos token here as it's append later anyways
        if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
            labels = labels[:, 1:]

        batch["labels"] = labels

        return batch

However, according to the following issue, bos_token_id shouldn't be used (@ArthurZucker). In my opinion, this should be replaced with self.processor.tokenizer.convert_tokens_to_ids("<|startoftranscript|>") or with model.config.decoder_start_token_id. What do you think?

Note if this is true, then there would be a similar error in @sanchit-gandhi's fine-tuning tutorial too.

Thanks for your attention.

Regards,
Tony

Happy New Year greetings and request for new topics in the course

Good day @sanchit-gandhi , @MKhalusova and the whole course team! Congratulations on the new year 2024!

First of all, I would like to thank you and the entire course team once again for the work done. The course turned out to be very good and allowed many of us to immerse ourselves in the topic of working with sound, some of us even managed to find a job thanks to your course. Especially liked the practical orientation of the course and the good and quite accessible presentation of the theoretical material. On behalf of all the members of the course translation team - "Thank you so much for your work!".

Over the past two months, people who have taken the course in Russian have been leaving feedback on topics they would like to see covered in the course. Together with Sergey (@Lightmourne), we systematized this feedback, which eventually became our request in this issue. If possible, could you further address the following topics in the course?

List of topics:

Audio data preparation (broadly defined)
Finding partial duplicates (duplication by time-shifting the audio) and full duplicate audio (filtering the dataset before training classification models). A common case of filtering datasets of 1 second duration.
increasing the volume of audio data
determining the need for class balancing for different tasks and models in the audio domain. Examples of class balancing for audio data, methods and techniques.

Check My Progress doesn't evaluate submission

Hi!

I just finished training a model for hands-on exercise of Unit 4. Since even DistilHuBERT can take hours to train, I used PEFT library and used LoRA to finetune the model. When finished, I pushed my model to hf hub using the command provided in the course, but slightly modified it to get rid of error I was getting.
The provided command was:
trainer.push_to_hub(**kwargs)
I used:
trainer.model.push_to_hub(**kwargs)
This is the repository for the model: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan.
However when I try to check my progress with the Check My Progress space, it doesn't show any evaluation results for my model. The progress table stays with default values.

I thought maybe it has something to do with me pushing LoRA model, not the full version. So I merged the LoRA weights with the base model weights and pushed it to huggingface hub, resulting in this repository: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan-merged.
The Check My Progress space still didn't update after 24 hours.

I never modified kwargs that was provided in the course.
What am I doing wrong? Is there something I'm missing here?

Missing code in chapter 4

This line should be changed to the line below

# original
gtzan = load_dataset("marsyas/gtzan", "all")

# change to
gtzan = load_dataset("marsyas/gtzan", "all").get('train')

doc-builder error

Hello!

I translated chapter2 into Russian. After finishing the translation, I wanted to test the correct display of the content of this chapter. For this purpose I used doc-builder. I ran into a problem because the files asr_pipeline.mdx and audio_classification_pipeline.mdx are not displayed in the browser at http://localhost:3000/. Instead I get the following message:

Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined

ReferenceError: Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
    at base64 (file:///tmp/tmpalt8krht/kit/preprocess.js:435:28)
    at highlighter (file:///tmp/tmpalt8krht/kit/preprocess.js:488:13)
    at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25745:25
    at Array.map (<anonymous>)
    at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25743:11
    at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:206:19)
    at next (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:299:28)
    at done (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:236:16)
    at then (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:243:5)
    at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:226:9)

I would like to note that the files introduction.mdx and hands_on.mdx are displayed correctly. These files do not contain blocks with python code.
How to solve this problem?

I attach a screenshot of one of the correctly displayed pages.

Main page has broken links

Here is the main page:
https://huggingface.co/learn/audio-course/chapter0/introduction

It has internal links to units like this:
https://huggingface.co/learn/audio-course/chapter1

which end up at :
https://huggingface.co/docs/audio-course/main/en/chapter1

With a 404 error.

On the left menu bar, we see working links with:
https://huggingface.co/learn/audio-course/chapter1/introduction

PolyAI/minds14 not available

Seem's like the PolyAI/minds14 dataset isn't available. When loading from code with load_dataset("PolyAI/minds14", "en-US"), I get the following error: ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429). After reaching the following link, DropBox says:

Link temporarily disabled
This can happen when the link has been shared or downloaded too many times in a day. Check back later and we’ll open access to more people.

It'd be great if this could be fixed, or at least replaced with another dataset which is available so that people can finish the course (which is great btw). 🤗

The code was run on Colab Pro+ environment with A100 GPU, and the following is the entire traceback:
`---------------------------------------------------------------------------
ConnectionError Traceback (most recent call last)
in <cell line: 5>()
3 minds = DatasetDict()
4
----> 5 minds["train"] = load_dataset(
6 "PolyAI/minds14", "en-US",
7 )

10 frames
/usr/local/lib/python3.10/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
2151
2152 # Download and prepare data
-> 2153 builder_instance.download_and_prepare(
2154 download_config=download_config,
2155 download_mode=download_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in download_and_prepare(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs)
952 if num_proc is not None:
953 prepare_split_kwargs["num_proc"] = num_proc
--> 954 self._download_and_prepare(
955 dl_manager=dl_manager,
956 verification_mode=verification_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs)
1715
1716 def _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs):
-> 1717 super()._download_and_prepare(
1718 dl_manager,
1719 verification_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_split_kwargs)
1025 split_dict = SplitDict(dataset_name=self.dataset_name)
1026 split_generators_kwargs = self._make_split_generators_kwargs(prepare_split_kwargs)
-> 1027 split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
1028
1029 # Checksums verification

~/.cache/huggingface/modules/datasets_modules/datasets/PolyAI--minds14/65c7e0f3be79e18a6ffaf879a083daf706312d421ac90d25718459cbf3c42696/minds14.py in _split_generators(self, dl_manager)
130 )
131
--> 132 archive_path = dl_manager.download_and_extract(self.config.data_url)
133 audio_path = dl_manager.extract(
134 os.path.join(archive_path, "MInDS-14", "audio.zip")

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download_and_extract(self, url_or_urls)
563 extracted_path(s): str, extracted paths of given URL(s).
564 """
--> 565 return self.extract(self.download(url_or_urls))
566
567 def get_recorded_sizes_checksums(self):

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download(self, url_or_urls)
426
427 start_time = datetime.now()
--> 428 downloaded_path_or_paths = map_nested(
429 download_func,
430 url_or_urls,

/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py in map_nested(function, data_struct, dict_only, map_list, map_tuple, map_numpy, num_proc, parallel_min_length, types, disable_tqdm, desc)
454 # Singleton
455 if not isinstance(data_struct, dict) and not isinstance(data_struct, types):
--> 456 return function(data_struct)
457
458 disable_tqdm = disable_tqdm or not logging.is_progress_bar_enabled()

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in _download(self, url_or_filename, download_config)
452 # append the relative path to the base_path
453 url_or_filename = url_or_path_join(self._base_path, url_or_filename)
--> 454 return cached_path(url_or_filename, download_config=download_config)
455
456 def iter_archive(self, path_or_buf: Union[str, io.BufferedReader]):

/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in cached_path(url_or_filename, download_config, **download_kwargs)
180 if is_remote_url(url_or_filename):
181 # URL, so get it from the cache (downloading if necessary)
--> 182 output_path = get_from_cache(
183 url_or_filename,
184 cache_dir=cache_dir,

/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only, use_etag, max_retries, token, use_auth_token, ignore_url_params, storage_options, download_desc)
599 raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
600 elif response is not None:
--> 601 raise ConnectionError(f"Couldn't reach {url} (error {response.status_code})")
602 else:
603 raise ConnectionError(f"Couldn't reach {url}")

ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429)`

Translation to Spanish

Hi there 👋

Let's translate the course to Spanish so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

🙋 If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

##Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Unit 5: Automatic Speech Recognition

Course Events

introduction.mdx

Translation to BENGALI

Hi there 👋

Let's translate the course to BENGALI so that the whole community can benefit from this resource 🌎!

Chapters

I would like to translate Chapters 0 and 1

Translation to Japanese

Hi there 👋

Let's translate the course to Japanese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

🙋 If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1.mdx

1 - Transformer models

2 - Using 🤗 Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The 🤗 Datasets library

6 - The 🤗 Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

1.mdx

No such file or directory in chapter1-preprocessing when trying to calculate durations

Hey

I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:

When I follow the course and try to execute

# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]

it will fail because the path, or x in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav (which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).

Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path' value in the dataset with the local path on my machine?

Alternatively, one could change the function call of librosa.get_duration(path=x) and pass the audio array and the sampling_rate instead, e.g.

new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]

Translation to Russian

Hi there 👋

Let's translate the course to Russian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

🙋 If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

UNIT 0. WELCOME TO THE COURSE!

introduction.mdx @blademoon MINOR_FIX_DONE
get_ready.mdx @blademoon MINOR_FIX_DONE
Community.mdx @blademoon MINOR_FIX_DONE

UNIT 1. WORKING WITH AUDIO DATA

introduction.mdx @blademoon MINOR_FIX_DONE
audio_data.mdx @blademoon MINOR_FIX_DONE
load_and_explore.mdx @blademoon MINOR_FIX_DONE
preprocessing.mdx @blademoon MINOR_FIX_DONE
streaming.mdx @blademoon MINOR_FIX_DONE
quiz.mdx @blademoon MINOR_FIX_DONE
supplemental_reading.mdx @blademoon MINOR_FIX_DONE

UNIT 2. A GENTLE INTRODUCTION TO AUDIO APPLICATIONS

UNIT 3. TRANSFORMER ARCHITECTURES FOR AUDIO

UNIT 4. BUILD A MUSIC GENRE CLASSIFIER

UNIT 5. AUTOMATIC SPEECH RECOGNITION

UNIT 6. From text to speech

UNIT 7. Putting it all together

UNIT 8. Finish line

certification.mdx @blademoon MINOR_FIX_DONE
introduction.mdx @blademoon MINOR_FIX_DONE

Course Events

introduction.mdx @blademoon MINOR_FIX_DONE

Adding extra material:

Translation to Simplified Chinese

Hi there 👋

Let's translate the course to Simplified Chinese so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

🙋 If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Unit 5: Automatic speech recognition

Unit 6: From text to speech

Unit 7: Pulling it all together

Unit 8: Finish line

Course Events

introduction.mdx

Step 4 jumped to step 6

In https://github.com/huggingface/audio-transformers-course/blob/main/chapters/en/chapter0/get_ready.mdx, theres a step 6 right after step 4.

Need to login to access the "speechcolab/gigaspeech". This information is missing

I need to login to access the "speechcolab/gigaspeech" dataset. This information is essential. It should be provided the footnote.chapters/en/chapter1/streaming.mdx

Spaces not building:runtime error, streams could not be allocated 404

For Unit 7 task, (Speech to speech translation), I tried duplicating the given space and that gave me runtime error (could not allocate streams 404, hardware not available), then i create new space (/nimrita/speech-to-speech-translation-MMS1) from scratch and that too fails to build with same errors. Please help.

Marvin

When doing "putting it all together" I ran into a problem because when I ran the wake up code the for loop gets skipped for some reason

[Kaggle Notebooks] Create Kaggle Notebooks for course units

Hi there 👋

Many course participants faced issues working through the course materials and exercises on a free tier of Google Colab. An alternative to it is Kaggle Notebooks (it provides a fixed amount of GPU hours but is consistent in experience).
However, there are some differences in working with Kaggle notebooks, such as pushing models to Hub and setting up your environment.

How can you help?

Write a short tutorial illustrating the extra steps needed to run the course examples and exercises successfully in Kaggle Notebooks.
Once done, tag @MKhalusova and @Vaibhavs10 in the comments. We can review and suggest changes if required.

Thank you for deciding to volunteer your time and experience with the course.

	gtzan_encoded
	```

	Output:
	```out
	DatasetDict({
	train: Dataset({
	features: ['genre', 'input_values'],
	num_rows: 899
	})
	test: Dataset({
	features: ['genre', 'input_values'],
	num_rows: 100
	})
	})
	```

huggingface / audio-transformers-course Goto Github PK

audio-transformers-course's People

Contributors

Stargazers

Watchers

Forkers

audio-transformers-course's Issues

Chapters

Chapters

Chapters

Chapters

Chapters

Recommend Projects

Recommend Topics

Recommend Org

Jobs