arafat877 / antivirus-1 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from mik854e/antivirus
Virus Detection using Bayesian Analysis.
This project forked from mik854e/antivirus
Virus Detection using Bayesian Analysis.
==================== Antivirus ==================== An antivirus program written in Java that can scan a file and detect if it is a virus using Bayesian analysis. USAGE -------------------- There are 5 buttons: 1) Open Directory - Choose a directory. If "virusdb.ser" exists in the directory, the previous save state will automatically be loaded. Otherwise, a new database will be created at runtime. 2/3) Learn Benign Files/Viruses - Choose a directory containing the known viruses/benign files in order to train the program. 4)Clear Database - Clears the current working database and chosen directory. No files will be deleted. 5) Scan File - Choose a file. The program will then scan the file and calculatethe ratio of virus/benign based on the PROBABILITY CALCULATION method below. Then, the program predicts whether the file is a virus or not based on the ratio. In order to use the program, you have to train it. Start by clicking "Learn Benign Files" and "Learn Viruses." These buttons will prompt you to choose a directory, in which the known viruses/normal files are stored. Then, the program will scan the files and count the n-grams for each file (my program uses 4 character sequences). When the program is learning, there will be no output until the end. For some reason, it waits until the end of the learning to print anything to the console. It may take up to 5 secs for the program to finish and it will prompt you when it is done. On exit, the program will ask you if you want to save. If you want to save, you must first choose a directory by clicking "Open Directory." The serialized data will be saved as "virusdb.ser" in the chosen directory. The top-right panel contains the current directory as well as the number of files that have been used to train the program in the current session. PROBABILITY CALCULATION ----------------------- I calculated probabiilites using this method: http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Document_Classification When a file is scanned, I compute the natural log of the ratios. The formula is as follows: ln[(p(virus|file)/p(not virus|file)] = sum[p(word|virus)/p(word|not virus)] If the sum of the logs is greater than 0, then the file is a virus. If the sum is less than 0, then the file is benign. N-grams that have not been seen in the training phase are skipped. Overall, this method is okay at categorizing files. There are quite a few fals negatives, meaning that virus files are classified as benign. I believe that this is caused by the unevenness of the two training directories. Although there are more virus files, there are more n-grams in thebenign directory. Therefore, the counts are generally higher in the benign hashtable, skewing the results a bit towards the benign side in cases where viruses OTHER INFO ---------------------- When the state is saved as "virusdb.ser", the VirusDB object is serialized. VirusDB contains two hash tables, one virus and one benign files, a list of thefiles used for training, the number of files used for training, and the directory the file was saved in.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.