GithubHelp home page GithubHelp logo

vijaypurohit / speech-processing-project-playlist Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 36.51 MB

Project Playlist Developed in the Subject Speech Processing CS 566 IITG.

C++ 21.08% C 78.92%
speech-processing speech-recognition hmm iitg playlist

speech-processing-project-playlist's Introduction

Speech Processing CS 566

PROJECT (SPEECH BASED PLAYLIST)

Roll No: 214101058 MTech CSE'23 IITG | Vijay Purohit


Input: 
	*	input_lamda/* 		= Contains: PRESENTLY USED "Lamda Model for Each Digit"
		**	      		= Contains: Generated "Universe.csv" of Cepstral Coefficients of Training Files
		**	      		= Contains: "Codebook" file to be read.
	*	input_live_voice_data/*	= Contains: Live Recording Files, 
						Their Observation Sequences, 
						Their Test Result Using Model (default model or newly converged model)
					 (Can Clean These Files Except the Folders Present)
		**	input_live_voice_data/TRAINING	= Contains Digits Recordings generated Using Application for Training Purpose 
		**	input_live_voice_data/TESTING	= Contains: Digits Recordings generated Using Application for Testing Purpose
		*** 	in the end, replace appropriately using menu. (Why replace utterance files? bec default folder for observation seq is input_voice)
				input_live_voice_data/TRAINING/* --> input_voice_training_data/*
				input_live_voice_data/TESTING/* --> input_voice_testing_data/*
	*	input_voice_training_data/*	= Contains: Training Utterance Recordings for Input into model
	*	input_voice_testing_data/*	= Contains: Test Utterance Recordings for Input into model
	* 	RecordingModule/*		= Contains: Recording Module Files
	* 	SONGS/*				= Contains: SONGS for each language (to be uploaded manually by user in their respective folder)

Output:  
	*	output/*		= Contains:	Result Analysis of Converged Model for Each Digit.
		**	output/Models/*	= Contains: 	Newly Generated Model using Input Trainning Files provided.
	*	output_voice_recordings_analysis_files	= Contains: Recording Analysis files which 
							shows Frames used, Samples used, STE Marker, Cepstral Coefficients etc
							For Files of Input Training, Input Testing, Live Recordings
	*	output_voice_recordings_normalised_segregated	= Contains: Segragated Speech Part using Start and End Marker
							For Files of Input Training, Input Testing, Live Recordings
Debug Variables:	
	* 	segregate_speech :	True: to segreagate speech data with respect to start and end marker in its output folder (output_voice_recordings_normalised_segregated). 
	*	segregate_Live_speech :	True: to segreagate Live Recording data with respect to start and end marker in its output folder (output_voice_recordings_normalised_segregated). 
	*	showCoefficientsInFile :	True: show Coefficients Values R, A, C's of each frames in its analysised files (output_voice_recordings_analysis_files).
	*	showAlphaBetaPstarInConsole :	True: to show alpa, beta probabilities in the console for each observation sequence. (also saved in files in (output/) )
	*	showStateSeqAlphaBetaInFileForEachObsAfterConverge :	True: It will save each utterance alpha, beta probabilites and state sequence in the file in (output/).

FILES:
  • main_hmm.cpp = Main File Contains Menu for interactive session
  • hmm_testing.h = Contains HMM offline and live testing functions
  • observation_sequence.h = Contains Observation Sequence Generation Functions, calculating coefficients, marker etc for preprocesing of speech.
  • hmm_solutions.h = contains functions for solutions of problem one two and three for HMM.
  • hmm_record.h = contains functions for recording the utterances.
  • hmm_playlist.h = playlist menu and contains functions for showing playlist and playing songs.
  • WndMainPlayList.h = Microsoft Form for GUI of Playlist. Contains functions for GUI handling
For Error: PlaySound() is not Identified: Do:
  • Right Click Project Name in Solution Explorer
  • Select Propertes --> Linker --> Input
  • Select Additional Dependencies --> Edit
  • Add name " winmm.lib "

Instructions to execute Code.

  1. Open it in Visual Studio 2010.
  2. Compile it and Run. GUI window will be shown along with console.
    • Interact With Menu
      • Output will be shown on the Console.
      • Detailed Output *.txt will be present in their respective folder.
  3. Take Care:
    • To generate The Respective Sequence (Training/Testing) before Converging or Testing.
  4. Please Add 5 Songs of the playlist in the respective folder within SONGS.

DOC FOLDER


THE END.

speech-processing-project-playlist's People

Contributors

vijaypurohit avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.