GithubHelp home page GithubHelp logo

freiz / terminator Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 2.0 7.17 MB

Promising spam filtering library making use of combined machine learning algorithms, written in C++

License: MIT License

C++ 98.18% Makefile 1.82%
machine-learning spam-filter spam-filtering-library

terminator's People

Contributors

freiz avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

moldyreed sbond75

terminator's Issues

Classification performance.

Either I don't understand how to drive this lib. Or it's not operating correctly. Probably more "a" hopefully.

So here is what I see when I train a decent size of spam and ham email from my personal folders:
graph

This is the sorted scores of all the spam and inbox email that I trained the filter on. The DB file is 23MB which sounds plausible. There doesn't seem to be a significant difference between the two groups of email. Basically my code parses the email into a list of words, removing all the headers and HTML encoding. Then it passes that as the content std::string to Terminator::Train with the spam bool set. Then I went back and called Predict on the same email to make the graphs in the image.

I did notice that the classifier_weights_ seem to be weird:
classifier_weights_[0]=-1.25549e+65
classifier_weights_[1]=0
classifier_weights_[2]=0
classifier_weights_[3]=0
classifier_weights_[4]=0
classifier_weights_[5]=0
classifier_weights_[6]=0
classifier_weights_[7]=0

That looks wrong... but it's what gets saved and reloaded... maybe that's part of my issue?

Terminator::Vectorization crash

I was training a whole bunch of email and one of them has a length of 3 bytes.

In Terminator::Vectorization it uses an unsigned variable for 'len' so 'len - NGRAM' evaluates to -1 which wraps around to max unsigned.

My short term fix is to change len and i to int64_t. So negative numbers exit the loop correctly.

README.md spelling and grammar errors.

Some examples: embeded, precison, navie, memroy, exsiting, perfomance, persistance

And "those need adaptive model" should be "those that need adaptive models".

"implementation are described" -> "implementation is described"

"The only dependencies" -> "The only dependency".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.