miteshputhran / speech-emotion-analyzer Goto Github PK

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

License: MIT License

Jupyter Notebook 100.00%

emotion python3 deep-learning neural-network data-science deep-neural-networks speech voice audio-files natural-language-processing

speech-emotion-analyzer's Issues

how to run this code in flask server

Hi
How do i run your code in flask server,

dataset error

where can i find the dataset of audio files? and i am not able to find the dataset of audiofiles from this project?

getting RawData missing Error

mylist= os.listdir('RawData/')

getting the error as FileNotFoundError.
please let me know if anyone know how to solve this error and also guide me if where i need to place the dataset

error

from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder

X_train = np.array(trainfeatures)
y_train = np.array(trainlabel)
X_test = np.array(testfeatures)
y_test = np.array(testlabel)

lb = LabelEncoder()

y_train = np_utils.to_categorical(lb.fit_transform(y_train))
y_test = np_utils.to_categorical(lb.fit_transform(y_test))

About paper

Does this experiment have a corresponding paper?

Error

Check the image. This is the error i am getting when i run! plz reply asap

how to calculate emotions in time intervals..!

say 30sec male_happy,
1min male_say..
so on..!!

Issue in scoring on different .wav sound files

Whenever i am scoring the model on different set of .wav sound, during extracting the features with same parameters I am getting different number of features. How to overcome this problem?

ValueError: Shapes (None, 4) and (None, 10) are incompatible

In the model training part, line:
cnnhistory=model.fit(x_traincnn, y_train, batch_size=16, epochs=700, validation_data=(x_testcnn, y_test))

I am getting the following error:
ValueError: Shapes (None, 4) and (None, 10) are incompatible

Would anyone kindly tell me why am I getting this and how to solve this?
Thanks in advance!

Using the model for prediction

Hello,

I am trying to use the already trained model directly for predicting the emotions.

I wrote this put this code in a python file and run it:
def predict():
lb = LabelEncoder()
Model_filename = 'saved_models/Emotion_Voice_Detection_Model.h5'
Model = load_model(Model_filename)
X, sample_rate = librosa.load('filename.wav', res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13),axis=0)
featurelive = mfccs
livedf2 = featurelive
livedf2= pd.DataFrame(data=livedf2)
livedf2 = livedf2.stack().to_frame().T
twodim= np.expand_dims(livedf2, axis=2)
livepreds = Model.predict(twodim,batch_size=32,verbose=1)
livepreds1=livepreds.argmax(axis=1)
liveabc = livepreds1.astype(int).flatten()
livepredictions = (lb.inverse_transform((liveabc)))
livepredictions

But, it displays an error in the (lb.inverse_transform), it says that the (lb) need to be trained first .. Is there a method where I can use it which returns the emotion's name, without a need for using the dataset and training the model again?

Also I have another question, Is this model a language-independent model?
Thanks,

Number of MFCC features

Number of MFCC features is 214 for Surprised, Neutral and Disgust, but 216 for Happy, Sad, Angry , even for the same duration of audio file, Can you please explain

Recommendations for Replicability

The notebook final_results_gender_test.ipynb can benefit from some slight modifications that will allow others to replicate exactly the results:

After the label encoder is fitted, print what it looks like (i.e. lb.classes_), so that we know the order of the labels if we want to just decode without training our own model. This is a simple suggestion that makes a big difference. (I think it's ['female_angry', 'female_calm', 'female_fearful', 'female_happy', 'female_sad', 'male_angry', 'male_calm', 'male_fearful', 'male_happy', 'male_sad'], but I'm not 100% sure).
Use a fixed seed for the shuffle in shuffle(newdf), something like shuffle(newdf, random_state=1) .
Include a script or function that assembles the two different data sources into the RawData directory, from original Zip files. The main issue here is ensuring that copying the different subdirectories in the Savee dataset get the same filename changes (eg. " (1)", " (10)") as the checked-in notebook get. Different operating systems like MacOS and Linux behave differently than Windows in how they copy files when there is already an existing file by the same name. I wouldn't mind writing this, but I actually can't replicate the data assemblage. (cf. #22)

final_results_gender_test.ipynb not running

Hi!
I am trying to run your code, but I am unable to run final_results_gender_test.ipynb because there is no RawData/ folder. I have downloaded the RAVDESS data set, but I do not know how should I organize it.

What should the RawData folder contain?

Having doubt regarding the Rawdata in code :final_results_gender_test.ipynb

having doubt regarding these lines, what is the data in that and which format .

mylist= os.listdir('RawData/')

data, sampling_rate = librosa.load('RawData/f11 (2).wav')

Stratified label splits

The current label splits are not stratified. This could cause issues with not all labels being present in the train or test set, which gives errors when training the model. Please replace the following code with the code down below:

newdf1 = np.random.rand(len(rnewdf)) < 0.8

train = rnewdf[newdf1]
test = rnewdf[~newdf1]

trainfeatures = train.iloc[:, :-1]
trainlabel = train.iloc[:, -1:]
testfeatures = test.iloc[:, :-1]
testlabel = test.iloc[:, -1:]

from sklearn.model_selection import StratifiedShuffleSplit
X = rnewdf.iloc[:, :-1]
y = rnewdf.iloc[:, -1:]

def dataSplitting(X, y):
"""Returns training and test set matrices/vectors for X and y"""
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]

return X_train, X_test, y_train, y_test

trainfeatures, testfeatures, trainlabel, testlabel = dataSplitting(X, y)

Using this code will ensure that all labels are presented equally when training, causing no errors when making a random selection that would have led to the one hot encoding to a categorical variable not making an output layer of size 10

Predicting longer audio files

Hello, I want to ask how can I predict audio files with more than 2.5 seconds, I don't want to predict part of the audio only

Inference code?

Hi,
Thanks for the nice work. I was trying to just use your model for inference. I looked at the notebook and copied the necessary parts for inference, but get error This LabelEncoder instance is not fitted yet. Can you help what is missing in this code?

import os

from keras import regularizers
import keras
from keras.callbacks import ModelCheckpoint
from keras.layers import Conv1D, MaxPooling1D, AveragePooling1D, Dense, Embedding, Input, Flatten, Dropout, Activation, LSTM
from keras.models import Model, Sequential, model_from_json
from keras.preprocessing import sequence
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
import librosa
import librosa.display
from matplotlib.pyplot import specgram
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf


opt = keras.optimizers.rmsprop(lr=0.00001, decay=1e-6)
lb = LabelEncoder()


json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("saved_models/Emotion_Voice_Detection_Model.h5")
print("Loaded model from disk")
 
X, sample_rate = librosa.load('h04.wav', res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13),axis=0)
featurelive = mfccs
livedf2 = featurelive
livedf2= pd.DataFrame(data=livedf2)
livedf2 = livedf2.stack().to_frame().T
twodim= np.expand_dims(livedf2, axis=2)
livepreds = loaded_model.predict(twodim, batch_size=32, verbose=1)

livepreds1=livepreds.argmax(axis=1)
liveabc = livepreds1.astype(int).flatten()
livepredictions = (lb.inverse_transform((liveabc)))
print(livepredictions)

Regarding the dataset

I really liked your work ... But I am very confused how you combined the two datasets.. Can you plz share the whole dataset as it will save a lot of my time and it will be easier for me to figure out things as I am a beginner..Thanks a lot! :)

model accuracy

first, thanks for your great work. that let me quickly learn much domain knowledge about how to process voice data.but i am still troubled by the model result, i use your pre-trained model to test some vocie, i can get better performance.i want to reproduce your work, use the same data and same code as you do, but i can not get the same results, when i run about 100 epochs, the train loss still deline,but the test loss become to rise. i have tried many times and still no difference.so i want to ask what factors may result in this phenomenon?

RawData files

I want to understand what dataset have you used in the RawData folder

live demo

Hello!
im getting this error when im trying to run the live demo.
can please help me with this. Thank you.

NameError Traceback (most recent call last)
in
----> 1 livepreds = loaded_model.predict(twodim,
2 batch_size=32,
3 verbose=1)

NameError: name 'loaded_model' is not defined

Reagarding the labelling

Can you help me in the code how you have set the labels. I am unable to detect the female emotion.

Can this model deal with different languages?

I am wondering can I use it on Chinese speeches?
And also, is there an age restriction? For example, is it more accurate on adult than kids?

Thank you.

Training from scratch doesn't reach the same loss

Hey, thanks a lot for the release. I've tried training the model from scratch using the datasets, but I can't reach the same validation loss. I noticed that the pre-trained network in the repo has two more convolutional layers compared to the code in the notebook, but adding them back doesn't help either.

Did you se any additional tricks for training?

For reference, above is what I see, below is what you have in the dataset:

对中文的识别效果怎么样

ValueError: Incomplete wav chunk

Using TensorFlow backend.
D:/Projects/Audio/Emotion/Speech-Emotion-Analyzer/final_results_gender_test.py:97: WavFileWarning: Chunk (non-data) not understood, skipping it.
sr,x = scipy.io.wavfile.read('RawData/EP03_seq04_sc133.wav')
Traceback (most recent call last):
File "D:/Projects/Audio/Emotion/Speech-Emotion-Analyzer/final_results_gender_test.py", line 97, in
sr,x = scipy.io.wavfile.read('RawData/EP03_seq04_sc133.wav')
File "D:\InstallPath\Develop\Anaconda3\5.3.1\envs\SpeechEmotionAnalyzer3.5\lib\site-packages\scipy\io\wavfile.py", line 289, in read
raise ValueError("Incomplete wav chunk.")
ValueError: Incomplete wav chunk.

Process finished with exit code 1

Does your model work regardless the language?

I want to do the same but with french voices, what should I change?

Thanks in advance.

TypeError: '<' not supported between instances of 'str' and 'int'

from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder

X_train = np.array(trainfeatures)
y_train = np.array(trainlabel)
X_test = np.array(testfeatures)
y_test = np.array(testlabel)

lb = LabelEncoder()

y_train = np_utils.to_categorical(lb.fit_transform(y_train))
y_test = np_utils.to_categorical(lb.fit_transform(y_test))

ERROR AT THIS LINE... please help

Need A help...

Hi, i am a beginner ... Can you help me with where to add the path of my dataset in the code?
and it would be very helpful for me if you help me how to run the project..! i would be very thankful to you..
#request.

getting an error in this state Getting the features of audio files using librosa

getting the error in Getting the features of audio files using librosa
it is becausee of training data is wrong?

how do I use audios larger than 4 seconds?

I just want to use the saved .h5 model to test my own inputs.

Would you mind telling me how the input should be given to the models and how the outputs would be recieved

Loading and Testing

The model was imported perfectly. But the LabelEncoder did not work as:
#added in cell 496
from sklearn.preprocessing import LabelEncoder
lb = LabelEncoder()
livepredictions = (lb.inverse_transform((liveabc)))
livepredictions

throws the error:

This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments....

If you could help with this, it will be of great help.

PS: I started by importing all files in cell 1,2,3.
Then added
'opt = keras.optimizers.rmsprop(lr=0.00001, decay=1e-6)'
to cell 137, since opt was not defined
then executed all the blocks in Demo section.

Contribution for an example

Hey,
Since there is no contact info on your profile. I wanted to share a project we built at a hackathon last month using your model.
Thank you very much for uploading the source code. It was a huge help!

Best,

Shahzeb

my val_accuracy only has 40+%

i use the data, two of the RAVDESS and SAVEE. not 2000, but 1920。
i try to train a new model, but only got "942/942 [==============================] - 1s 2ms/step - loss: 0.2656 - accuracy: 0.9013 - val_loss: 1.9338 - val_accuracy: 0.4341"
you can see that my train data has 942 item, not 1378. that's the problem.

permission error 13

i cant run code
error:
C:\Speech-Emotion-Analyzer>python train.py
Using TensorFlow backend.
Traceback (most recent call last):
File "train.py", line 102, in
X,sample_rate = librosa.load('data/'+y)
File "C:\Python35\lib\site-packages\librosa\core\audio.py", line 112, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "C:\Python35\lib\site-packages\audioread_init_.py", line 80, in audio_open
return rawread.RawAudioFile(path)
File "C:\Python35\lib\site-packages\audioread\rawread.py", line 61, in init
self._fh = open(filename, 'rb')
PermissionError: [Errno 13] Permission denied: 'C:\Speech-Emotion-Analyzer\data\Actor_01'

Accuracy problem

Hi Mitesh,
I’m trying to obtain the 70% accuracy you got but I’m only getting a 35% could you please tell me the database and send me the exact code you used to get the 70%??
Thank you very much.
My email is [email protected]

AttributeError: module 'numba' has no attribute 'ctypes_support'

Getting an error on the window code

ValueError Traceback (most recent call last)
in ()
25 print(np.transpose(window).shape)
26 print(xseg.shape)
---> 27 z = np.fft.fft(window * xseg, nfft)
28 X[i,:] = np.log(np.abs(z[:nfft//2]))
29

ValueError: operands could not be broadcast together with shapes (1323,) (1323,2)

AttributeError:'list' object has no attribute'items'

When I tried Loading the model with final_result_gender_test, I got AttributeError:'list' object has no attribute'items'.
Please tell me how to resolve.

OS: MacOS Big Sur
Environment: VSCode Docker Ubuntu 18.0.4

librosa==0.8.0
numpy==1.18.5
matplotlib==3.1.0
tensorflow==2.2.0
Keras==2.4.3
sklearn==0.0

Where is rawdata file????

If you guys are posting codes then make sure that everything is working correctly otherwise dont post such stuff.
You are wasting times of coders who are genuinely taking interest in speech analysis.
Please recheck all files and update them asap.

Thank you

please any one can tell me what the dataset to be used in rawdata and what would the name given to the files i m getting this error when i m running this ===>>from keras.utils import np_utils from sklearn.preprocessing import LabelEncoder X_train = np.array(trainfeatures) y_train = np.array(trainlabel) X_test = np.array(testfeatures) y_test = np.array(testlabel) lb = LabelEncoder() y_train = np_utils.to_categorical(lb.fit_transform(y_train)) y_test = np_utils.to_categorical(lb.fit_transform(y_test))

Dataset Link is broken

This link: RAVDESS is broken. Can you do something about it?

Dataset question

Hi, thanks for the work, I have a question about the samples number. After I filter out all the data (ravdess and savee), I just have 1200 samples, 960 train data, and the final acc is around 0.5. I found your train samples is 1378 (X_train.shape is 1378*216), I wonder what I did wrong.

Error on JupiterLab

Hello,

I am running final_results_gender_test.ipynb on Amazon JupiterLab; line

mylist= os.listdir('RawData/')

gives error

FileNotFoundError: [Errno 2] No such file or directory: 'RawData/'

How is this RawData folder supposed to appear on the kernel, is there any precondition to running this notebook. Thanks.

Ivaylo

How to increase accuracy?

I am getting very less accuracy.
I have a dataset of 1700+ audio wav files.
please tell me how to increase the accuracy because i m getting only 8.5% accuracy after traning.
help me please. as i m a beginner

Loading and testing the model requires lb.fitTransform()?

Hello,

I am trying to use your model to test the live audio recorded 'output10.wav'. I get the error for livepredictions = (lb.inverse_transform((liveabc))) as

This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

Do I need to do lb.firTransform(y_train) and y_test before I just run your model?
Is there any other possibility to test the model before getting all the features from dataset?

Traceback (most recent call last):
File "C:\Users\BhargaviiNadendla\Documents\GitHub\Speech-Emotion-Analyzer\load.py", line 73, in
livepredictions = (lb.inverse_transform((liveabc)))
File "D:\Anaconda3\envs\Speech-Emotion-Analyzer\lib\site-packages\sklearn\preprocessing\label.py", line 272, in inverse_transform
check_is_fitted(self, 'classes_')
File "D:\Anaconda3\envs\Speech-Emotion-Analyzer\lib\site-packages\sklearn\utils\validation.py", line 951, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).name})
sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

(Speech-Emotion-Analyzer) C:\Users\BhargaviiNadendla>sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
'sklearn.exceptions.NotFittedError:' is not recognized as an internal or external command,
operable program or batch file.

Getting error on loading model weights: ValueError: axes don't match array

I am loading the weights of the pre-trained model by writing model.load_weights("saved_models/Emotion_Voice_Detection_Model.h5")

This is generating the following error.

Traceback (most recent call last):
  File "C:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 56, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
AttributeError: 'Dataset' object has no attribute 'transpose'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "final_results_gender_test.py", line 172, in <module>
    model.load_weights("saved_models/Emotion_Voice_Detection_Model.h5")
  File "C:\anaconda3\lib\site-packages\keras\models.py", line 738, in load_weights
    topology.load_weights_from_hdf5_group(f, layers, reshape=reshape)
  File "C:\anaconda3\lib\site-packages\keras\engine\topology.py", line 3369, in load_weights_from_hdf5_group
    reshape=reshape)
  File "C:\anaconda3\lib\site-packages\keras\engine\topology.py", line 3141, in preprocess_weights_for_loading
    weights[0] = np.transpose(weights[0], (3, 2, 0, 1))
  File "C:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 639, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "C:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "C:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 46, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: axes don't match array

Keras - 2.1.5
Tensorflow - 1.8.0

ERRROR : 'feeling_list' is not defined

I am getting the error 'feeling_list' is not defined, if someone can, then please help me.

Can't open file with.

Currently I am trying to re-procedure processes in your final_results_gender_test notebook but I ran into an issue which is can not load file from RAVDESS
Here is my error:

ValueError Traceback (most recent call last)
in
5
6
----> 7 sr,x = scipy.io.wavfile.read(wav_file)
8
9 ## Parameters: 10ms step, 30ms window

~/tmp/deepspeech-venv/lib/python3.7/site-packages/scipy/io/wavfile.py in read(filename, mmap)
246 raise ValueError("Unexpected end of file.")
247 elif len(chunk_id) < 4:
--> 248 raise ValueError("Incomplete wav chunk.")
249
250 if chunk_id == b'fmt ':

ValueError: Incomplete wav chunk.

miteshputhran / speech-emotion-analyzer Goto Github PK

speech-emotion-analyzer's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs