GithubHelp home page GithubHelp logo

huggingface / audio-transformers-course Goto Github PK

View Code? Open in Web Editor NEW
266.0 266.0 80.0 4.12 MB

The Hugging Face Course on Transformers for Audio

License: Apache License 2.0

Makefile 0.01% Python 0.80% MDX 99.19%
audio deep-learning hacktoberfest transformers

audio-transformers-course's People

Contributors

agercas avatar bharatr21 avatar blademoon avatar crcdng avatar dame-cell avatar fisheggg avatar gabrielwithappy avatar hollance avatar jinnsp avatar jjyaoao avatar lbourdois avatar lewtun avatar lightmourne avatar merveenoyan avatar mhrdyn7 avatar mishig25 avatar mkhalusova avatar osamja avatar practice-dump avatar proshian avatar ptah23 avatar ritog avatar rtrompier avatar sanchit-gandhi avatar susnato avatar vaibhavs10 avatar veluchs avatar wetdog avatar ylacombe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-transformers-course's Issues

fix a typoe

In UNIT4 : Pretrained models for audio classification
Weโ€™ll load an official Audio Spectrogram Transformer checkpoint fine-tuned on the Speech Commands dataset, under the namespace "MIT/ast-finetuned-speech-commands-v2":

Copied
classifier = pipeline(
"audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])
Fix it to be classifier(sample["audio"]["array"])
I don't know how to make a pull request yet! :)

In chapter 4, had to add another next before the example started working

Original:

from IPython.display import Audio

classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

Modified working:

from IPython.display import Audio

sample = next(iter(speech_commands))
classifier(sample["audio"].copy())
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

Why are code and code outputs in the same code block in the text?

I am going through the Load and explore an audio dataset section of the tutorial.

And I saw that code and code outputs are contained in the same code block, which I believe will confuse people.

image

I strongly belief that code and code output should go into different blocks of code, or the output may even be inserted as plaitext with monospace font, outside of a code block.

Please make this change.

If you need help, I can create PRs doing this. Let me know once you decide the style.

Error while using TrainingArguments in Unit4

I'm getting the following error while creating an instance of TrainingArguments in Unit-4:

ImportError: Using the Trainer with PyTorch requires accelerate>=0.21.0: Please run pip install transformers[torch] or pip install accelerate -U

I've tried doing both "pip install transformers[torch]" as well as "pip install accelerate -U" but the same error pops up still.

There is no issue in importing TrainingArguments, the error pops up while creating an instance only.

Wrong output on chapter4/fine-tuning

Hi๐Ÿ‘‹,
I think the output should be:

DatasetDict({
    train: Dataset({
        features: ['genre', 'input_values', 'attention_mask'],
        num_rows: 899
    })
    test: Dataset({
        features: ['genre', 'input_values', 'attention_mask'],
        num_rows: 100
    })
})

Instead of

gtzan_encoded
```
**Output:**
```out
DatasetDict({
train: Dataset({
features: ['genre', 'input_values'],
num_rows: 899
})
test: Dataset({
features: ['genre', 'input_values'],
num_rows: 100
})
})
```

Since return_attention_mask=True in the feature_extractor. Is this the case?

small issue: OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.

when running this comamnd:

doc-builder preview audio-transformers-course ../audio-transformers-course/chapters/en --not_python_module

we got this:

Traceback (most recent call last):
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 31, in check_node_is_available
p = subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'node'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/silvacarl/.local/bin/doc-builder", line 8, in
sys.exit(main())
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
args.func(args)
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/preview.py", line 158, in preview_command
check_node_is_available()
File "/home/silvacarl/.local/lib/python3.8/site-packages/doc_builder/commands/build.py", line 40, in check_node_is_available
raise EnvironmentError(
OSError: Using the --html flag requires node v14 to be installed, but it was not found in your system.

any ideas?

Self evaluation doesn't work for my model card

Seems like there's a problem with the code for self evaluation. I've completed the unit 4 hands on task but self evaluation doesn't count it as succesful. I took a look at the code and it seem's like my model card outputs the accuracy as eval_accuracy instead of Accuracy. Am I doing something wrong or the model card pattern has been changed and the self evaluation script is not yet updated?

Gtzan Split Unit 4

Im chapters/en/chapter4/fine-tuning.mdx the following snippet is presented for loading the gtzan dataset:

from datasets import load_dataset

gtzan = load_dataset("marsyas/gtzan", "all")
gtzan

This returns a DatasetDict object, not a Dataset object, which causes the next snippet to fail:

gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan

When I run these together as is I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-4-0475a19d0be5>](https://localhost:8080/#) in <cell line: 1>()
----> 1 gtzan = gtzan.train_test_split(seed=42, shuffle=True, test_size=0.1)
      2 gtzan

AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

I can bypass this by pointing the train_test_split function to the "train" split within the original DatasetDict object returned by the load_dataset function:

gtzan = gtzan["train"].train_test_split(seed=42, shuffle=True, test_size=0.1)
gtzan

Output:

DatasetDict({
    train: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 899
    })
    test: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 100
    })
})

Recommend updating the second code snippet to call train_test_split on the "train" split. Unless there is a way to get load_dataset to return the Dataset object itself - I'm not even sure what the "all" flag refers to there. I can make this change but was instructed on the discord server to file an issue.

Wrong keyword in "Unit 4: Pre-trained models and datasets for audio classification"

In the section Speech Commands, the code that is supposed to be run is:

classifier = pipeline(
    "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"])

But it leads to the following error:

ValueError: We expect a numpy ndarray as input
Full Error Output
ValueError                                Traceback (most recent call last)

[<ipython-input-8-a13009e6d325>](https://localhost:8080/#) in <cell line: 4>()
      2     "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
      3 )
----> 4 classifier(sample["audio"])

3 frames

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in __call__(self, inputs, **kwargs)
    128             - **score** (`float`) -- The corresponding probability.
    129         """
--> 130         return super().__call__(inputs, **kwargs)
    131 
    132     def _sanitize_parameters(self, top_k=None, **kwargs):

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1118             )
   1119         else:
-> 1120             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
   1121 
   1122     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py](https://localhost:8080/#) in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
   1124 
   1125     def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
-> 1126         model_inputs = self.preprocess(inputs, **preprocess_params)
   1127         model_outputs = self.forward(model_inputs, **forward_params)
   1128         outputs = self.postprocess(model_outputs, **postprocess_params)

[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/audio_classification.py](https://localhost:8080/#) in preprocess(self, inputs)
    153 
    154         if not isinstance(inputs, np.ndarray):
--> 155             raise ValueError("We expect a numpy ndarray as input")
    156         if len(inputs.shape) != 1:
    157             raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")

ValueError: We expect a numpy ndarray as input

Unusual ImportError (Chapter 4, Fine-Tuning)

An import error is raised when the TrainingArguments() is called.
[ImportError: Using the Trainer with PyTorch requires accelerate>=0.20.1: Please run pip install transformers[torch] or pip install accelerate -U]
However, this issue continues even if the latest version of Accelerate (0.21.0) is installed.
@sanchit-gandhi You can have a look at this notebook.

Translation to Korean

Hi there ๐Ÿ‘‹

Let's translate the course to Korean so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums.

This PR template from @gabrielwithappy might help.

Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Course Events

Error in DataCollatorSpeechSeq2SeqWithPadding (Unit 5)

In the unit 5 of the audio course, the following code is used:

class DataCollatorSpeechSeq2SeqWithPadding:
    processor: Any

    def __call__(
        self, features: List[Dict[str, Union[List[int], torch.Tensor]]]
    ) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lengths and need different padding methods
        # first treat the audio inputs by simply returning torch tensors
        input_features = [
            {"input_features": feature["input_features"][0]} for feature in features
        ]
        batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")

        # get the tokenized label sequences
        label_features = [{"input_ids": feature["labels"]} for feature in features]
        # pad the labels to max length
        labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(
            labels_batch.attention_mask.ne(1), -100
        )

        # if bos token is appended in previous tokenization step,
        # cut bos token here as it's append later anyways
        if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
            labels = labels[:, 1:]

        batch["labels"] = labels

        return batch

However, according to the following issue, bos_token_id shouldn't be used (@ArthurZucker). In my opinion, this should be replaced with self.processor.tokenizer.convert_tokens_to_ids("<|startoftranscript|>") or with model.config.decoder_start_token_id. What do you think?

Note if this is true, then there would be a similar error in @sanchit-gandhi's fine-tuning tutorial too.

Thanks for your attention.

Regards,
Tony

Happy New Year greetings and request for new topics in the course

Good day @sanchit-gandhi , @MKhalusova and the whole course team! Congratulations on the new year 2024!

First of all, I would like to thank you and the entire course team once again for the work done. The course turned out to be very good and allowed many of us to immerse ourselves in the topic of working with sound, some of us even managed to find a job thanks to your course. Especially liked the practical orientation of the course and the good and quite accessible presentation of the theoretical material. On behalf of all the members of the course translation team - "Thank you so much for your work!".

Over the past two months, people who have taken the course in Russian have been leaving feedback on topics they would like to see covered in the course. Together with Sergey (@Lightmourne), we systematized this feedback, which eventually became our request in this issue. If possible, could you further address the following topics in the course?

List of topics:

  1. Audio data preparation (broadly defined)
  2. Finding partial duplicates (duplication by time-shifting the audio) and full duplicate audio (filtering the dataset before training classification models). A common case of filtering datasets of 1 second duration.
  3. increasing the volume of audio data
  4. determining the need for class balancing for different tasks and models in the audio domain. Examples of class balancing for audio data, methods and techniques.

Check My Progress doesn't evaluate submission

Hi!

I just finished training a model for hands-on exercise of Unit 4. Since even DistilHuBERT can take hours to train, I used PEFT library and used LoRA to finetune the model. When finished, I pushed my model to hf hub using the command provided in the course, but slightly modified it to get rid of error I was getting.
The provided command was:
trainer.push_to_hub(**kwargs)
I used:
trainer.model.push_to_hub(**kwargs)
This is the repository for the model: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan.
However when I try to check my progress with the Check My Progress space, it doesn't show any evaluation results for my model. The progress table stays with default values.

I thought maybe it has something to do with me pushing LoRA model, not the full version. So I merged the LoRA weights with the base model weights and pushed it to huggingface hub, resulting in this repository: https://huggingface.co/ThreeBlessings/distilhubert-finetuned-gtzan-merged.
The Check My Progress space still didn't update after 24 hours.

I never modified kwargs that was provided in the course.
What am I doing wrong? Is there something I'm missing here?

Missing code in chapter 4

This line should be changed to the line below

# original
gtzan = load_dataset("marsyas/gtzan", "all")
# change to
gtzan = load_dataset("marsyas/gtzan", "all").get('train')

doc-builder error

Hello!

I translated chapter2 into Russian. After finishing the translation, I wanted to test the correct display of the content of this chapter. For this purpose I used doc-builder. I ran into a problem because the files asr_pipeline.mdx and audio_classification_pipeline.mdx are not displayed in the browser at http://localhost:3000/. Instead I get the following message:

Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
ReferenceError: Error while preprocessing /tmp/tmpalt8krht/kit/src/routes/chapter2/asr_pipeline.mdx - btoa is not defined
    at base64 (file:///tmp/tmpalt8krht/kit/preprocess.js:435:28)
    at highlighter (file:///tmp/tmpalt8krht/kit/preprocess.js:488:13)
    at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25745:25
    at Array.map (<anonymous>)
    at /tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:25743:11
    at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:206:19)
    at next (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:299:28)
    at done (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:236:16)
    at then (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:243:5)
    at wrapped (/tmp/tmpalt8krht/kit/node_modules/mdsvex/dist/main.cjs.js:226:9)

I would like to note that the files introduction.mdx and hands_on.mdx are displayed correctly. These files do not contain blocks with python code.
How to solve this problem?

I attach a screenshot of one of the correctly displayed pages.
ะกะฝะธะผะพะบ ัะบั€ะฐะฝะฐ ะพั‚ 2023-08-02 13-51-31

PolyAI/minds14 not available

Seem's like the PolyAI/minds14 dataset isn't available. When loading from code with load_dataset("PolyAI/minds14", "en-US"), I get the following error: ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429). After reaching the following link, DropBox says:

Link temporarily disabled
This can happen when the link has been shared or downloaded too many times in a day. Check back later and weโ€™ll open access to more people.

It'd be great if this could be fixed, or at least replaced with another dataset which is available so that people can finish the course (which is great btw). ๐Ÿค—

The code was run on Colab Pro+ environment with A100 GPU, and the following is the entire traceback:
`---------------------------------------------------------------------------
ConnectionError Traceback (most recent call last)
in <cell line: 5>()
3 minds = DatasetDict()
4
----> 5 minds["train"] = load_dataset(
6 "PolyAI/minds14", "en-US",
7 )

10 frames
/usr/local/lib/python3.10/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
2151
2152 # Download and prepare data
-> 2153 builder_instance.download_and_prepare(
2154 download_config=download_config,
2155 download_mode=download_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in download_and_prepare(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs)
952 if num_proc is not None:
953 prepare_split_kwargs["num_proc"] = num_proc
--> 954 self._download_and_prepare(
955 dl_manager=dl_manager,
956 verification_mode=verification_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs)
1715
1716 def _download_and_prepare(self, dl_manager, verification_mode, **prepare_splits_kwargs):
-> 1717 super()._download_and_prepare(
1718 dl_manager,
1719 verification_mode,

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verification_mode, **prepare_split_kwargs)
1025 split_dict = SplitDict(dataset_name=self.dataset_name)
1026 split_generators_kwargs = self._make_split_generators_kwargs(prepare_split_kwargs)
-> 1027 split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
1028
1029 # Checksums verification

~/.cache/huggingface/modules/datasets_modules/datasets/PolyAI--minds14/65c7e0f3be79e18a6ffaf879a083daf706312d421ac90d25718459cbf3c42696/minds14.py in _split_generators(self, dl_manager)
130 )
131
--> 132 archive_path = dl_manager.download_and_extract(self.config.data_url)
133 audio_path = dl_manager.extract(
134 os.path.join(archive_path, "MInDS-14", "audio.zip")

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download_and_extract(self, url_or_urls)
563 extracted_path(s): str, extracted paths of given URL(s).
564 """
--> 565 return self.extract(self.download(url_or_urls))
566
567 def get_recorded_sizes_checksums(self):

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in download(self, url_or_urls)
426
427 start_time = datetime.now()
--> 428 downloaded_path_or_paths = map_nested(
429 download_func,
430 url_or_urls,

/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py in map_nested(function, data_struct, dict_only, map_list, map_tuple, map_numpy, num_proc, parallel_min_length, types, disable_tqdm, desc)
454 # Singleton
455 if not isinstance(data_struct, dict) and not isinstance(data_struct, types):
--> 456 return function(data_struct)
457
458 disable_tqdm = disable_tqdm or not logging.is_progress_bar_enabled()

/usr/local/lib/python3.10/dist-packages/datasets/download/download_manager.py in _download(self, url_or_filename, download_config)
452 # append the relative path to the base_path
453 url_or_filename = url_or_path_join(self._base_path, url_or_filename)
--> 454 return cached_path(url_or_filename, download_config=download_config)
455
456 def iter_archive(self, path_or_buf: Union[str, io.BufferedReader]):

/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in cached_path(url_or_filename, download_config, **download_kwargs)
180 if is_remote_url(url_or_filename):
181 # URL, so get it from the cache (downloading if necessary)
--> 182 output_path = get_from_cache(
183 url_or_filename,
184 cache_dir=cache_dir,

/usr/local/lib/python3.10/dist-packages/datasets/utils/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only, use_etag, max_retries, token, use_auth_token, ignore_url_params, storage_options, download_desc)
599 raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
600 elif response is not None:
--> 601 raise ConnectionError(f"Couldn't reach {url} (error {response.status_code})")
602 else:
603 raise ConnectionError(f"Couldn't reach {url}")

ConnectionError: Couldn't reach https://www.dropbox.com/s/e2us0hcs3ilr20e/MInDS-14.zip?dl=1 (error 429)`

Translation to Spanish

Hi there ๐Ÿ‘‹

Let's translate the course to Spanish so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

##Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Unit 5: Automatic Speech Recognition

Course Events

Translation to BENGALI

Hi there ๐Ÿ‘‹

Let's translate the course to BENGALI so that the whole community can benefit from this resource ๐ŸŒŽ!

Chapters

I would like to translate Chapters 0 and 1

Translation to Japanese

Hi there ๐Ÿ‘‹

Let's translate the course to Japanese so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using ๐Ÿค— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The ๐Ÿค— Datasets library

6 - The ๐Ÿค— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

No such file or directory in chapter1-preprocessing when trying to calculate durations

Hey

I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:

When I follow the course and try to execute

# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]

it will fail because the path, or x in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav (which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).

Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path' value in the dataset with the local path on my machine?

Alternatively, one could change the function call of librosa.get_duration(path=x) and pass the audio array and the sampling_rate instead, e.g.

new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]

Translation to Russian

Hi there ๐Ÿ‘‹

Let's translate the course to Russian so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

UNIT 0. WELCOME TO THE COURSE!

UNIT 1. WORKING WITH AUDIO DATA

UNIT 2. A GENTLE INTRODUCTION TO AUDIO APPLICATIONS

UNIT 3. TRANSFORMER ARCHITECTURES FOR AUDIO

UNIT 4. BUILD A MUSIC GENRE CLASSIFIER

UNIT 5. AUTOMATIC SPEECH RECOGNITION

UNIT 6. From text to speech

UNIT 7. Putting it all together

UNIT 8. Finish line

Course Events

Adding extra material:

Translation to Simplified Chinese

Hi there ๐Ÿ‘‹

Let's translate the course to Simplified Chinese so that the whole community can benefit from this resource ๐ŸŒŽ!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

๐Ÿ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

Unit 0: Welcome to the course!

Unit 1: Working with audio data

Unit 2: A gentle introduction to audio applications

Unit 3: Transformer architectures for audio

Unit 4: Build a music genre classifier

Unit 5: Automatic speech recognition

Unit 6: From text to speech

Unit 7: Pulling it all together

Unit 8: Finish line

Course Events

Spaces not building:runtime error, streams could not be allocated 404

For Unit 7 task, (Speech to speech translation), I tried duplicating the given space and that gave me runtime error (could not allocate streams 404, hardware not available), then i create new space (/nimrita/speech-to-speech-translation-MMS1) from scratch and that too fails to build with same errors. Please help.

Marvin

When doing "putting it all together" I ran into a problem because when I ran the wake up code the for loop gets skipped for some reason

[Kaggle Notebooks] Create Kaggle Notebooks for course units

Hi there ๐Ÿ‘‹

Many course participants faced issues working through the course materials and exercises on a free tier of Google Colab. An alternative to it is Kaggle Notebooks (it provides a fixed amount of GPU hours but is consistent in experience).
However, there are some differences in working with Kaggle notebooks, such as pushing models to Hub and setting up your environment.

How can you help?

  • Write a short tutorial illustrating the extra steps needed to run the course examples and exercises successfully in Kaggle Notebooks.
  • Once done, tag @MKhalusova and @Vaibhavs10 in the comments. We can review and suggest changes if required.

Thank you for deciding to volunteer your time and experience with the course.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.