GithubHelp home page GithubHelp logo

albertusk95 / nips-challenge-plagiarism-detection-vsm Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 406 KB

Global NIPS Paper Implementation Challenge - Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

License: MIT License

Python 100.00%
plagiarism-detection vector-space-model unigram bigram trigrams cosine-similarity jaccard-similarity natural-language-processing nips-challenge

nips-challenge-plagiarism-detection-vsm's Introduction

Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

Global NIPS Paper Implementation Challenge

I implemented the paper based on the research methodology

Original Paper

https://arxiv.org/pdf/1412.7782.pdf

Main Goal

Develope an effective plagiarism detection tool for text based assignments by comparing unigram, bigram, and trigram of vector space model with cosine and jaccard similarity measure

Programming Tools

  • Python 2.7
  • scikit-learn
  • NLTK

Files

Several important files / directories:

  • main.py

    Main file containing the whole source code

  • docs

    A directory containing students answer. Each answer is stored in a document having specified file name, namely assignment_index. The word assignment is fixed and word index is an integer that will be incremented each time a new student is added

  • combined_docs

    Each student answer will be combined into one document called MASTER Document. The detection processes will be done using this combined document

To Run

To run the program, execute the following command:

python main.py

Methodology

  • Combining students answer into one single answer file (MASTER DOCUMENT)

  • Extract unique words (unigram, bigram, trigram) from the MASTER DOCUMENT

  • Eliminate stopwords

  • Compute Document Frequency (DF) and Inverse Document Frequency (IDF) for each term

  • Compute TF-IDF Weight Vector for each document

  • Compare each pair of assignment using Cosine Similarity

  • Compare each pair of assignment using Jaccard Similarity


Albertus Kelvin
Bandung Institute of Technology

Code was developed on January 20th, 2018
Code was made publicly available on January 31st, 2018

nips-challenge-plagiarism-detection-vsm's People

Contributors

albertusk95 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

umarkotak

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.