Name: Tajinder Singh
1. Use preprocessing1.py to convert TRAINING_DATA to TRAINING_FILE (spam_training.txt)
python3 preprocessing1.py SPAM_TRAINING_PATH spam_training.txt
2. Use preprocessing2.py to convert TEST_DATA to TEST_FILE (spam_test.txt)
python3 preprocessing2.py SPAM_TEST_PATH spam_test.txt
3. Then execute perceplearn.py to create MODEL_FILE (spam.nb) from TRAINING_FILE (spam_training.txt)
python3 perceplearn.py spam_training.txt spam.nb
4. Then execute percepclassify.py to create OUTPUT_FILE (spam.out) using TEST_FILE (spam_test.txt) and MODEL_FILE (spam.nb)
python3 percepclassify.py spam.nb < spam_test.txt > spam.out
5. The final output file is spam.out
1. Run postrain.py script to create MODEL_FILE (pos.model) from TRAINING_FILE (train.pos)
python3 postrain.py TRAININGFILE MODEL
2. Run postag.py script to tag testdata using MODEL_FILE (pos.model)
python3 postag.py pos.model < pos.blind.test > pos.test.out
3. The final output file is pos.test.out
1. Run nelearn.py script to create MODEL_FILE (ner.model) from TRAINING_FILE (ner.esp.train)
python3 nelearn.py ner.esp.train ner.model
2. Run netag.py script to tag TEST_DATA (ner.esp.blind.test) using MODEL_FILE (ner.model)
python3 netag.py ner.model < ner.esp.blind.test > ner.esp.test.out
3. The final output file is pos.test.out
Accuracy: 94.810
Precision, recall and F-score for each of the named entity types for the named entity recognizer, and the overall F-score:
Overall Accuracy: 92.02
LOC :
Precision: 0.7423
Recall: 0.6382
F-Score: 70.1
PER:
Precision: 0.8225
Recall: 0.6563
F-Score: 74.5
ORG:
Precision: 0.820
Recall: 0.62
F-Score: 71.75
MISC:
Precision: 0.66
Recall: 0.4879
F-Score: 58.4
Overall F-score: 68.68