GithubHelp home page GithubHelp logo

distant_conversational_asr_and_analysis's Introduction

[ICASSP 2021 Tutorial] Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization

This repository contains a set of materials used for the ICASSP2021 Tutorial T9 "Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization" presented by Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, Shinji Watanabe.

Slides

PDF slides (latest)

Abstract

Recognizing unsegmented conversational speech recorded with distant microphone(s) is a challenging but an essential task to be solved to unfold a myriad of new speech applications, such as a communication agent that can understand, respond to and facilitate our conversation. This task contains a number of subtasks, which has been studied rather independently for a decade, such as multichannel/single-channel source separation, speaker diarization with source number counting, and conversational speech recognition. This tutorial first revisits, with demonstration, current state-of-the-art systems for this task, which were developed for challenges such as CHiME 5-6 challenges, and commercial products. These systems typically consist of a combination of well-established independently optimized modules. While these systems are designed carefully to consolidate these independent modules, there is still a large room for improvement. In the latter part of the tutorial, we introduce a recent new research trend that aims to establish an optimal joint neural system that solves those subtasks all together, through end-to-end optimization based on common integrated objective. By showing the potential of such jointly-optimal systems that now start outperforming previous top-performing systems in many tasks, we discuss the future directions and challenges for this task from both industry and academic perspectives.

Tutorial Presentors

  • Keisuke Kinoshita (NTT Corporation, Japan) (Email)
  • Yusuke Fujita (Line Corporation, Japan)
  • Naoyuki Kanda (Microsoft, USA)
  • Shinji Watanabe (Carnegie Mellon University, USA)

Changelog

1.0.0 / 2021-06-07

  • First public release

distant_conversational_asr_and_analysis's People

Contributors

icassp2021-tutorial9 avatar keisukekino avatar sw005320 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.