speech_course's Introduction

YSDA Speech Processing Course

Materials for each week are in ./week* folders

Course program

Week 1: Slides | Lecture | Seminar
- Lecture: Intro to Digital Signal Processing (DSP)
- Seminar: Implement DSP pipeline
Week 2: Slides | Lecture | Seminar
- Lecture: Introduction to speech NN discriminative models. Voice Activity Detection (VAD) and Sound Event Detection (SED) tasks
- Seminar: Train VAD models
- Homework: Train SED models
Week 3: Slides | Lecture | Seminar
- Lecture: Keyword Spotting and Speech Biometrics tasks
- Seminar: Train Biometrics model and look at embeddings
- Homework: Train Biometrics model to better quality
Week 4: Slides | Lecture | Seminar
- Lecture: Speech Recognition I
- Seminar: Metrics and augmentations for speech recognition
- Homework: Implement CTC algorithm
Week 5: Slides | Lecture
- Lecture: Speech Recognition II, Pretraining
- Homework: Finetune Wav2Vec2
Week 6: Slides | Lecture
- Lecture: Text-to-Speech I, intro, preprocessor, metrics
Week 7: Slides | Lecture
- Lecture: Text-to-Speech II, Acoustic models
- Seminar: Pitch estimation, Monotonic Alignment Search for phoneme duration estimation
- Homework: Train FastPitch model
Week 8: Slides, p1 | Lecture, p1 | Slides, p2 | Lecture, p2 | Seminar
- Lecture, p1: Text-to-Speech III, Vocoding
- Lecture, p2: Vector Quantization, Codecs
- Seminar: Vector Quantizaton, Residual Vector Quantization
Week 9: Slides | Lecture, p1 | Lecture, p2
- Lecture: Tranformers for TTS
- Homework: write inference for pre-trained transformer
Week 10: Slides | Lecture | Seminar
- Lecture: noise reduction
- Seminar: Streaming STFT and ISTFT
- Homework: Noise reduction model implementation
Week 11: Slides | Lecture
- Lecture: Acoustic Echo Cancelation (AEC) and Beamforming
Week 12: Slides | Lecture | Seminar
- Lecture: ASR Inference
- Seminar: Streaming ASR
Week 13: Slides | Lecture
- Lecture: Flow based TTS + Voice Conversion

Contributors & course staff

Current:

Alex Rak - VAD, spotter, biometry
Mikhail Andreev - ASR
Stepan Kargaltsev - ASR
Evgeniia Elistratova - TTS
Roman Kail - TTS
Vladimir Platonov - TTS
Evgenii Shabalin - TTS
Ravil Khisamov - VQE

Previous iteration:

Andrey Malinin - Course admin, lectures, seminars, homeworks
Vladimir Kirichenko - lectures, seminars, homeworks
Segey Dukanov - lecures, seminars, homeworks

speech_course's People

Contributors

Stargazers

Watchers

speech_course's Issues

Problems in week_10_vqe_noise_reduction/homework.ipynb

Known problems:

SNR definition:

"Given a ground truth signal ... and its estimate ..., we define noise as ... . Slightly abusing notation we get: "
In the math expression the numerator and the denominator should be swapped.

`from vqe.data.mixing import RandomMixtureSampler`

Just remove this line. It is an artifact of testing, which I forgot to remove.

class RandomMixtureSampler, method `call`:

        # input_signal and mic_signal should be multiplied by the same factor to match each other
        mult_signal = normalize_to_rms(
            signal_target, self.normalization_rms_db
        )

This snippet is wrong. Instead, it is supposed to calculate the multiplication factor here (that's why the variable is called mult_signal)