- dt: date (Reported date?)
- subject
- url: html file name
- comments: comments of this email (with html tags)
- content: (~ email_body, email_from, email_replyto, email_timestamp)
- scam_type: types (ex. next-to-kin, 419)
- email_body: email content (text)
- email_from: Name
- email_replyto:
- email_timestamp: email date (ex. Fri, 31 Jan 2014 22:10:10 +0700)
- email_subject: subject (dup with 2)
- Train / Val (2014-2018, 60)
- Test (2019, 4)
{
"url": "00810674.1.htm",
"email_body": "FBI Headquarters ...",
"scam_type": "419",
"email_from": {
"name": "fbi ag***t",
"email": ""
},
"email_timestamp": "31 Dec 2017 17:48:34 +0200",
"date": "20180101",
"email_replyto": {
"name": "",
"email": "atm****@rep***tative.com"
},
"id": "20180101_1",
"subject": "Re: ..."
}
python main.py --mode train > train_doc2vec.log &
python main.py --mode test > test_clustering.log &
Ref. [1] 419scam.org