- Gmail Authorization and getting spam and non spam mails from Gmail.
- Putting all of them inside HDFS.
- Using Spark MLlib's Naive Bayes Model generator to generate spam model and classification of test mail.
- com.david.utilities.gmail.GmailAuthorize.java authorizes you to use your gmail account.
- Put your client_secret.json in the src/main/resources folder.
- com.david.utilities.gmail.GmailUtility.java gets messages from gmail.
For more info about Gmail Api: https://developers.google.com/gmail/api/?hl=en
- Start your Hadoop single node or cluster.
- Put the core-site.xml and hdfs-site.xml in src/main/resources folder.
- com.david.utilities.hdfs.HdfsUtility.java puts messages into HDFS.
For more info about Hadoop and how to setup your cluster go to my blog ;) : http://daviddecoding.com/blog/tutorial/installing-hadoop/
Using Spark MLlib's Naive Bayes Model generator to generate spam model and classfication of test mail
- com.david.utilities.spark.SparkUtilities.java uses Spark MLlib to generate a Naive Bayes Model.
- Then it classifies test mail as Spam or Non-Spam.
For more info about Spark MLlib: http://spark.apache.org/docs/latest/mllib-naive-bayes.html
https://www.mapr.com/blog/comparing-kill-mockingbird-its-sequel-with-apache-spark http://www.programcreek.com/java-api-examples/index.php?api=org.apache.spark.mllib.classification.NaiveBayes