- gensim
- keras
- pickle
- for each train file f in twitter_json,
- put value of 'text' key in a list X_train, do this for all lines
- Tokenize X_train
- Convert each text in X_train to sequences
- pad X_train
- dump X_train and Y_train using pickle
- do step 1 and 2 for test every test file
- load trainX, trainY, testX, testY using
loadTensorInput()
Each item is now a list of list - categorize trainY and testY to two classes
- build the neural net model using tflearn (LSTM RNN)
- activation='softmax'
- optimizer='adam'
- learning_rate=0.001
- loss='categorical_crossentropy'
- fit the model
- n_epoch=20
- save the model
- make sure that the dataset folder is inside the project and has name 'rumor'
- run preprocessData.py
- run RumorRNN.py
Train on 875 samples, validate on 118 samples
20 epochs
875/875 [==============================] - 15s - loss: 0.5089 - acc: 0.7646 - val_loss: 0.4852 - val_acc: 0.7542
Epoch 13/20
875/875 [==============================] - 15s - loss: 0.3863 - acc: 0.8274 - val_loss: 0.7699 - val_acc: 0.7203
Epoch 14/20
875/875 [==============================] - 15s - loss: 0.2909 - acc: 0.8720 - val_loss: 0.8753 - val_acc: 0.7373
Epoch 15/20
875/875 [==============================] - 15s - loss: 0.1825 - acc: 0.9314 - val_loss: 1.3211 - val_acc: 0.7119
Epoch 16/20
875/875 [==============================] - 15s - loss: 0.1228 - acc: 0.9543 - val_loss: 1.5710 - val_acc: 0.6695
Epoch 17/20
875/875 [==============================] - 15s - loss: 0.0728 - acc: 0.9794 - val_loss: 2.1107 - val_acc: 0.6525
Epoch 18/20
875/875 [==============================] - 15s - loss: 0.0792 - acc: 0.9749 - val_loss: 2.3427 - val_acc: 0.6695
Epoch 19/20
875/875 [==============================] - 15s - loss: 0.0710 - acc: 0.9783 - val_loss: 2.8942 - val_acc: 0.6356
Epoch 20/20
875/875 [==============================] - 15s - loss: 0.0675 - acc: 0.9794 - val_loss: 1.9913 - val_acc: 0.6695
118/118 [==============================] - 0s
Accuracy: 66.95%
Process finished with exit code 0