xuanjihe / speech-emotion-recognition Goto Github PK

speech emotion recognition using a convolutional recurrent networks based on IEMOCAP

Python 100.00%

tensorflow python iemocap-database speech-emotion-recognition

speech-emotion-recognition's Introduction

speech-emotion-recognition

TensorFlow implementation of Convolutional Recurrent Neural Networks for speech emotion recognition (SER) on the IEMOCAP database.In order to address the problem of the uncertainty of frame emotional labels, we perform three pooling strategies(max-pooling, mean-pooling and attention-based weighted-pooling) to produce utterance-level features for SER. These codes have only been tested on ubuntu 16.04(x64), python2.7, cuda-8.0, cudnn-6.0 with a GTX-1080 GPU.To run these codes on your computer, you need install the following dependency:

tensorflow 1.3.0
python_speech_features
wave
cPickle
numpy
sklearn
os

Demo

For running a demo, after forking the repository, run the following scrit:

      python zscore.py

      python ExtractMel.py

      python model.py

Note

There are some little differences betweem the implementation and the paper, eg.when run python model.py, you will find that the recognition rate of happy is very poor, which is caused by the imbalance of the training samples. An effective method is to use the happy sample twice.

If you want to download the IEMOCAP database, you can access the link:https://sail.usc.edu/iemocap/release_form.php

The detailed information of this code can be found in 3-D.pdf, you can download if from the top of this page.

Author

Xuanji He
E-mail: [email protected]

Citation

If you used this code, please kindly consider citing the following paper:

Mingyi Chen, Xuanji He, Jing Yang, Han Zhang, "3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition", IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440-1444, 2018.

speech-emotion-recognition's People

Contributors

Stargazers

Watchers

Forkers

githubxhx ai-jie01 sunlinyu1993 dacson inceatakan zhuqiaoqiao1307 xiaoyeye1117 353xiong maggie0830 zh794390558 yongxuustc entn-at audiobucket saber5433 lbaitemple aascode ecohnoch hunterhawk xingzai0617 shijing123 syzer wuguowuge lypenghao shaheenperveen jasonaidm labrecord 1arpanet fastcode3d huimm dyf102 ayinarachfias haorotu qianfan1996 hnxiaoyf koihioh qnxgy921 qzz971126 ramonps7 akatoshi zivnice ttccnu giang12 mnadirs linhld0811 kyroad daniel-davaris pjj167 yingmuying fujita1224 xuhunan yougnseokchoi liujuihung hiaweng xliucs garlicdevs mberkehan lovestar-huang melissachen15 lh0216 huangqiang97 riviera2015 jackli95 wj1159 wardyx kelly2016 nangongmu amberisy longinhit ruddy202 shubham-saudolla arronchan pikaqiuweixiao wuyx517 girinishant shenyi666666 gohjx8808 kunzhou9646 ayush-110 shayezkarim flyingcat901 phillip1029 tom-and-jerry-gif cybertron1609 daisey666 divyajotsingh093 stbmilu s-janibekova yanglijiajenny philglobal mikedesjardine kangxi9 olivermod monikadeepti keithwang5 manikantachowdhary 13301338176 mjain12 chenyang918 adamv17 the-original-shruti

speech-emotion-recognition's Issues

关于数据格式

请问你的输入数据格式是怎么处理的，有关输入数据不等长的问题

Sometimes you have used conditional clauses like "if(emotion in ['ang','neu','sad'])" without considering the happy emotion. Why?

Placeholder dimension does not match

File "model.py", line 290, in
train_op(1)
File "model.py", line 280, in train_op
loss, train_acc = sess.run([cost,accuracy],feed_dict = {X:valid_data, Y:valid_label,is_training:False, keep_prob:1})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (436, 300, 40, 3) for Tensor u'Placeholder:0', which has shape '(?, 200, 40, 3)'

train loss 下降，但valid loss 有下小降，然后大幅上升

hi，直接用download代码，进行了实验，
1、train loss是稳定下降，但是再valid loss上，前30个迭代是下降，但是后续就开始上升了。应该是过拟合了吧？
2、在论文中提及的64%的UA，我也未复现，请问是进行了什么参数调优吗？
还请指教，谢谢

tensorflow.python.framework.errors_imple.InvalidArgumentError.

Then I run the code in my enviroment, An Issue occured and I tried to find the solution.
The error Info is ：
Traceback (most recent call last):
File "D://3D-speech-emotion-recognition-master/model-cnn.py", line 372, in
train_op(1)
File "D:/*/3D-speech-emotion-recognition-master/model-cnn.py", line 358, in train_op
loss, train_acc = sess.run([cost,accuracy],feed_dict = {X:valid_data, Y:valid_label,is_training:False, keep_prob:1})
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 895, in run
run_metadata_ptr)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run
options, run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be same size: logits_size=[654,6] labels_size=[436,6]
[[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5, Reshape_6)]]

Caused by op 'SoftmaxCrossEntropyWithLogits', defined at:
File "D:/2_exp-emotion/3D-emotion/3D-speech-emotion-recognition-master/model.py", line 372, in
train_op(1)
File "D:/2_exp-emotion/3D-emotion/3D-speech-emotion-recognition-master/model.py", line 303, in train_op
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = Ylogits)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1597, in softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 2385, in _softmax_cross_entropy_with_logits
features=features, labels=labels, name=name)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[654,6] labels_size=[436,6]
[[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5, Reshape_6)]]

数据集的处理问题

请问您的代码是不是没有用交叉验证呢？只是把前4个Session的作为训练集，剩下的Session5作为验证集和测试集？

Exceeds 10% of system memory

Hi xuanjihe,
I got the system memory issue as mentioned below. To train the model I m using 8 GB RAM and Processor - i7 .
Can you give a solution for that? And I would like to know that how much memory needs to do the training process and are they any modifications to optimize the code.
Thank u.

2018-08-31 10:13:53.134003: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:101] Allocation of 5357568000 exceeds 10% of system memory

不平衡问题

针对hap类过少的不平衡问题你是如何解决的？

关于特征提取问题

您好，我按照您给出的网址下载下iemocap数据库后，运行程序发现No such file or directory: './IEMOCAP1.pkl'错误，但是在特征提取代码中，output = './IEMOCAP.pkl'这句代码与上下文代码没有联系，导致我无法找出错误。于是我想问：
第一，下载下来的数据库是否在之前就已经处理好了，生成/IEMOCAP.pkl文件
第二，您文中提到openear提取logmel特征，这部分内容代码里面有设计到？？能否用opensmile代替？

关于one_hot编码

您好，我目前有几个问题请教:
（1）按照您给出的网址，下载了音频数据，可是在ExtractMel中不出现错误，但是也没有任何结果输出
（2）在one_hot编码时，train_label = dense_to_one_hot(train_label,4)这句出错，会显示IndexError: index 4841 is out of bounds for size 4800，这种出错原因会在哪？
希望能得到您的回复，谢谢！

How to use 'Happy' data twice

as you note,
dataset is imbalance ,so how to use 'happy' data tiwice

No module named 'python_speech_features

When I run the fiels , it just shows that , and I have installed the speech_features

对代码的几点疑惑

1.数据预处理的时候，你这边是先fbank,然后再把时间大于300的语音序列变为多个片段.其实就相当于增加新的记录. 你尝试过先把语音序列分为几个片段，然后再fbank吗?
２. 对于time大于300的测试数据, 你这边是设置两个片段. 然而实际应用中应该是判断整句话的情感。我想问一下有什么比较好的处理整句话情感的方法吗

A KeyError

您好，请问一下，当我运行ExtracMel.py文件时，出现了一个KeyError，like this：

烦请帮忙看一下，是什么问题？感激不尽

Can you provide Checkpoint, please

This project is very interesting, but it doesn't have checkpoint right now.
Can you please provide the checkpoint so more people can benefit from it.
Thanks

Did not found the IEMOCAP DATASET

Please Send me the IEMOCAP_full_Realease dataset at [email protected] ASAP.

Incompatible shapes: [654] vs. [436]

while running model.py
i am getting error
InvalidArgumentError (see above for traceback): Incompatible shapes: [654] vs. [436]

and my valid data,valid_label looks good
('valid_label==', (436, 4))
('valid_data==', (436, 300, 40, 3))

and the network architecture is

('layer1 shape', TensorShape([Dimension(None), Dimension(300), Dimension(40), Dimension(256)]))
('layer 2 shape', TensorShape([Dimension(None), Dimension(300), Dimension(10), Dimension(512)]))
(?, 300, 5, 512)
('layer2 shape', TensorShape([Dimension(None), Dimension(300), Dimension(5), Dimension(512)]))
('layer2 shape', TensorShape([Dimension(None), Dimension(200), Dimension(2560)]))
('layer2 shape', TensorShape([Dimension(None), Dimension(2560)]))
('linear1 shape', TensorShape([Dimension(None), Dimension(768)]))
('linear1 shape', TensorShape([Dimension(None), Dimension(768)]))
('linear1 shape', TensorShape([Dimension(None), Dimension(200), Dimension(768)]))
('outputs1 shape', (<tf.Tensor 'LSTM1/fw/fw/transpose:0' shape=(?, 200, 128) dtype=float32>, <tf.Tensor 'ReverseV2:0' shape=(?, 200, 128) dtype=float32>))
('outputs shape', TensorShape([Dimension(None), Dimension(200), Dimension(256)]))
('outputs shape', TensorShape([Dimension(None), Dimension(200), Dimension(256), Dimension(1)]))
('gru shape', TensorShape([Dimension(None), Dimension(256)]))
('fully1 shape', TensorShape([Dimension(None), Dimension(64)]))
('Ylogits shape', TensorShape([Dimension(None), Dimension(4)]))

i checked and i printed all shapes looking like shapes are good . but getting same error. why i am getting more loggits. many of them getting the same error as i get. can you please check sir. Thanking you

你好，测试后效果达不到论文写的64%，我用了你的model.py两层CNN+LSTM+attention

你好！我有些问题向您请教：

model.py 里面的load_data返回的六项数据不对吧；应该返回train_data,train_label, test_data,Test_label, valid_data,Valid_label；
为啥选择1200个train数据（样本平衡考虑）那这样的话样本太少了，在IEMOCAP数据集上应该用4类吧，lables设置为6，去处理4类样本会不会提升模型输出的accuracy；
特征为啥没用FBANK特征+一阶差分+二阶差分，而不去用MFCC特征+一阶差分+二阶差分；
论文用1-6层CNN+LSTM我看效果差别不大，我就用了你的model里面的2层CNN+LSTM+attention结果accuracy才50%；

how train with EmoBD (7 emotion)

i want to set input as emoBD(7 emotion) and all file in 1 folder
thanks!

Request help

Hello, I have several problems when I am re-creating the code, I hope I can get help, thank you.

一些文件没有用到

请问 utils.py, train.py, conn.py, decision.py文件是不是没有用到？他们是做什么用的呢？谢谢～

能不能发一份IEMOCAP给我？？

Incompatible shapes: [654] vs. [436]

i am running model.py
at this step
loss, train_acc = sess.run([cost,accuracy],feed_dict = {X:valid_data, Y:valid_label,is_training:False, keep_prob:1})
i am getting error
my valid _label shape is (436,4)
my valid _data shape is (436,4)

to calculate loss we need Ylogits . but logits shape is (654,4)
logits and labels must be same size: logits_size=[654,4] labels_size=[436,4]

can any one solve this problem.... Thanking you

请规范下代码，给点注释。

在ExtractMel.py文件中的cpickle.dump()10个文件，而在model.py中load_data加载数据只返回6个数据，并且那个保存的名字也对不上号，请您详细规范下代码，给点注释谢谢cPickle.dump(Train_data,Train_label,test_data,test_label,valid_data,valid_label,Valid_label,Test_label,pernums_test,pernums_valid)
在model.py中train_data,train_label,test_data,test_label,valid_data,valid_label = cPickle.load(f)

进行模型评估的时候报错 key cnn/unit-1/bn1/beta not found，求解答，谢谢

About python_speech_features

Hi,
Could you please provide the python_speech_features file?
Best wishes

How to predict an emotion from a new audio file

Hi,
After training a model, I want to predict an emotion of a new audio file by using a trained model. So I want to know what are features do we need to extract from that audio file and how to extract these features. Actually, let me know the process (prediction.py) of predicting emotion from a new audio file.
Thank u very much. Plz reply ASAP.

How to use model1.py?

when i use: python model1.py
there is such errors:

How to use this code

Can you provide a guide?

关于utils.py，关于berlin库

作者您好，我有几个疑问：
（1）在您的程序上，我的utils.py出错，其他都可以运行，我看您给别人回复，model.py可以边训边策，是不是model.py已经包含了utils.py？
（2）你将一个session用作测试集，那验证集？？还是测试集和验证集都在一个session里？
（3）关于beilin库的修改，您能否给一点建议？？

KeyError: 'wav\\Ses01F_impro01\\Ses01F_impro01_F000'

IndexError: index 4833 is out of bounds for size 4800

Getting below index error on model.py.

np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "model.py", line 139, in
train()
File "model.py", line 56, in train
train_label = dense_to_one_hot(train_label,FLAGS.num_classes)
File "model.py", line 49, in dense_to_one_hot
labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
IndexError: index 4833 is out of bounds for size 4800

Dependency

Python 3.7.8
pip 20.1.1
tensorflow 2.2
win 10 64bit

数据处理部分

请问生成这几个参数的作用是什么呢？mean1,std1,mean2,std2,mean3,std3

Question: for dataset

dataet
'''
test_data, test_label, valid_data, valid_label, Valid_label, Test_label, pernums_test, pernums_valid = load_data()
'''
请问 Valid_label, Test_label, pernums_test, pernums_valid 这些是用来做什么的？

能给一组参数复现你paper中的结果吗？

evaluation
样本不均衡问题：我看代码里是每个 class 取了 300 个 sentence 吧。
但是我的 eval 中，所有样本倾向预测 class 0 , 请问你知道是什么原因吗？

----------segment metrics---------------
Best valid_UA: 0.2666
Best valid_WA: 0.09396
Valid Confusion Matrix:["ang","sad","hap","neu"]
[[ 34   0   0   0]
 [107   0   0   6]
 [ 70   0   0   2]
 [207   0   0  10]]
----------segment metrics---------------
*****************************************************************
310
Epoch: 310
Valid cost: 1.52
Valid_UA: 0.2666
Valid_WA: 0.09396
Best valid_UA: 0.2666
Best valid_WA: 0.09396
Valid Confusion Matrix:["ang","sad","hap","neu"]
[[ 18   0   0   0]
 [ 67   0   0   6]
 [ 54   0   0   2]
 [141   0   0  10]]
Test_UA: 0.2592
Test_WA: 0.0695
Test Confusion Matrix:["ang","sad","hap","neu"]
[[ 13   0   0   0]
 [ 58   0   0   2]
 [ 50   0   0   0]
 [131   0   0   5]]
*****************************************************************

下面是训练时的打印：

----------segment metrics---------------
valid_UA: 0.4773
valid_WA: 0.3968
Valid Confusion Matrix:["ang","sad","hap","neu"]
[[28  0  6  0]
 [ 3 59 22 29]
 [28 11 18 15]
 [63 34 52 68]]
----------segment metrics---------------
After epoch:9, step: 310, loss on training batch is 0.44, accuracy is 0.900.
train_UA: 0.8941
train_WA: 0.9
Confusion Matrix:["ang","sad","hap","neu"]
[[ 6  0  1  1]
 [ 0 15  0  1]
 [ 0  0  7  0]
 [ 0  0  1  8]]
+ [[ 1 -le 2 ]]

Can you help me to implement it on keras

Hey I tried to implement your paper in keras on EMODB database
where time step is 300
and 5 conv2d layers followed by one blstm and attention layer
training size is (339,300,40,30)
but not getting the same accuracy as yours training accuracy is only 20%
I don't know where I an doing wrong
can you please look at the code and let me know what I am doing wrong

inputs = Input(shape=(300, 40,3))
    CNN1=Convolution2D(128, 5, 3, activation='relu', border_mode='same', name='conv1', subsample=(1, 1))(inputs)
    MAX_POOL=MaxPooling2D(pool_size=(1, 4),border_mode='valid', name='pool1')(CNN1)
    BN1=BatchNormalization()(MAX_POOL)
    CNN2=Convolution2D(256, 5,3, activation='relu', 
    	border_mode='same', name='conv3a',
    	subsample=(1, 1))(BN1)
    MAX_POOL2=MaxPooling2D(pool_size=(1, 2),border_mode='valid', name='pool2')(CNN2)
    BN2=BatchNormalization()(MAX_POOL2)
    CNN3=Convolution2D(256, 5,3, activation='relu', 
    	border_mode='same', name='conv3b',
    	subsample=(1, 1))(BN2)
    DROP1=Dropout(.5)(CNN3)
    BN3=BatchNormalization()(DROP1)

    CNN4=Convolution2D(256, 5,3, activation='relu', 
    	border_mode='same', name='conv3c',
    	subsample=(1, 1))(BN3)
    BN4=BatchNormalization()(CNN4)
    CNN5=Convolution2D(256, 5,3, activation='relu', 
    	border_mode='same', name='conv3d',
    	subsample=(1,1))(BN4)
    BN5=BatchNormalization()(CNN5)
    DROP2=Dropout(.5)(BN5) 
    TD=TimeDistributed(Flatten(), name="Flatten")(DROP2)

    DENSE1=Dense(768, activation='linear', name='fc6')(TD)
    BLSTM=Bidirectional(LSTM(128,return_sequences=True,unit_forget_bias=True))(DENSE1)
    gru = AttentionLayer(name='attention')(BLSTM)
    DROP3=Dropout(0.5)(gru)
    DENSE2=Dense(64, activation='relu', name='fc7')(DROP3)
    BN3=BatchNormalization()(DENSE2)
    DROP4=Dropout(0.5)(BN3)
    DENSE3=Dense(7, activation='softmax')(DROP4)
    model = Model(input=inputs, output=DENSE3)

I have trained model using nadam optimizer which adam with nesterov moment

nadam = Nadam(lr=1e-04, beta_1=0.9, beta_2=0.999, epsilon=1e-08,schedule_decay=0.004)
model.compile(optimizer='nadam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train, callbacks=[tensorboard], batch_size=15,nb_epoch = 10,shuffle=True,
		validation_data=(X_val,y_val))

About IEMOCAP

Would you mind sending me a copy of IEMOCAP.I will appreciate it, thanks!

zscore.py中的问题

我尝试使用您的代码来训练具有八种不同情绪类别的模型。每个类我有192个文件，除了中性，有96个文件。
我运行代码时遇到以下错误。

Traceback (most recent call last):
File "/home/thoshith/speech-emotion-recognition-master/zscore_emodb.py", line 176, in
print(read_data_db())
File "/home/thoshith/speech-emotion-recognition-master/zscore_emodb.py", line 143, in read_data_db
traindata1[train_num * 300:(train_num + 1) * 300] = part
ValueError: could not broadcast input array from shape (300,40) into shape (0,40)

提前致谢

关于utils.py测试的问题

当我运行utils.py程序时，程序报错，请你帮我看一下是什么原因呢，谢谢：

##################################################################
/home/nchen/anaconda3/bin/python /media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py
/home/nchen/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/nchen/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
WARNING:tensorflow:From /media/nchen/Disk_D/TH/speech-emotion-recognition-master/crnn.py:239: BasicLSTMCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
WARNING:tensorflow:From /media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py:61: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

2019-04-23 09:55:58.768043: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2019-04-23 09:55:58.772963: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2019-04-23 09:55:58.801735: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key cnn/unit-1/bn1/beta/ExponentialMovingAverage not found in checkpoint
Traceback (most recent call last):
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key cnn/unit-1/bn1/beta/ExponentialMovingAverage not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key cnn/unit-1/bn1/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py:64) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "/media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py", line 154, in
evaluate()
File "/media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py", line 64, in evaluate
saver = tf.train.Saver(variable_to_restore)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in init
self.build()
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
restore_sequentially, reshape)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key cnn/unit-1/bn1/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py:64) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1556, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1830, in object_graph_key_mapping
checkpointable.OBJECT_GRAPH_PROTO_KEY)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 371, in get_tensor
status)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py", line 154, in
evaluate()
File "/media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py", line 72, in evaluate
saver.restore(sess,ckpt.model_checkpoint_path)
File "/home/nchen/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key cnn/unit-1/bn1/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /media/nchen/Disk_D/TH/speech-emotion-recognition-master/utils.py:64) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Process finished with exit code 1

非常感谢

Results statistics

Can you share the results statistics like ROC, accuracy etc.

模型过拟合问题，如何选取参数

你好。
我运行的模型在训练集在300 steps左右时acc = 90%，然而验证集Best UA只有39%，测试集UA也只有31%，请问这是参数选取的问题吗？我该如何选取参数？

我把 zscore.py 中的 filter_num=20 改成 filter_num=40，crnn.py中判断是否启用attention机制的if self.attention is not None: 改成 if self.attention:，然后按如下顺序运行代码

python zscore.py
python ExtractMel.py
python train.py --attention True
python utils.py --attention True

最后，这代码是文章AUTOMATIC SPEECH EMOTION RECOGNITION USING RECURRENT NEURAL NETWORKS WITH LOCAL ATTENTION的实现吗？我应该怎样复现文章的结果

train loss 下降，但valid loss 有下小降，然后大幅上升

数据集

前辈您好，我是语音情感方面的初学者，想尽快拿到IEMOCAP数据集，您可以分享下么

运行ExtractMel.py出错

Traceback (most recent call last):
File "ExtractMel.py", line 389, in
read_IEMOCAP()
File "ExtractMel.py", line 200, in read_IEMOCAP
train_data[train_num,:,:,0] = (part -mean1)/(std1+eps)
ValueError: operands could not be broadcast together with shapes (300,40) (20,)

请问这是什么原因呢？

xuanjihe / speech-emotion-recognition Goto Github PK

speech-emotion-recognition's Introduction

speech-emotion-recognition

Demo

Note

Author

Citation

speech-emotion-recognition's People

Contributors

Stargazers

Watchers

Forkers

speech-emotion-recognition's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs