GithubHelp home page GithubHelp logo

fakeblogdetection's Introduction

LowBow Character based N-gram author attribution:
================================================

Paper: 
	http://aclweb.org/anthology-new/P/P11/P11-1030.pdf
Code: 
	KernelSmoothingSample [Folder]
Dataset:
	1. http://www.cs.utexas.edu/users/sindhu/acl2010
	CompressedDataset [Folder]: Each of below contains a parameters.txt file which contains the details of parameters set while obtaining this dataset.
		ProcessedDataset_2_0.2.zip
		ProcessedDataset_2_0.2_csv.zip
		ProcessedDataset_5_0.2.zip
		Dataset.zip
		Datasubset.zip
	2. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/


TFIDF with SVM:
==============

Code:
	ExtractDataFromBlogs [Folder]: Code used to get the valid list of RSS feeds for the given blog links (html data)
	ExtractBlogData [Folder]: Extract data from RSS feeds and save them to xml files.
	store_data [Folder]: Read the xml files containing Bad Blog Data and store them into SQL database.
	StoreDataIntoDB [Folder]: Store the xml files for Good Blog Data into SQL database.
	TFIDF_to_SVD [Folder]: Code used to read the TFIDF vectors from the Bad blog data obtained using Rapidminer and do SVD on it.	 

Dataset:
	CompressedDataset [Folder]:
		ProcessedData2012_04_23.sql.7z: Contains the SQL Dump for blog data that was stored in mysql database.
		ProcessedDataBlogPosts20120425.sql.7z: Contains the SQL Dump for blog data (each blog post is a seperate row) that was stored in mysql database.
		blogDataSet.tar.gz: XML files for the RSS feeds corresponding to bad blogs.
		goodBlogDataSet.7z: XML files for the RSS feeds corresponding to good blogs.
		Rapidminer_Repository: Rapidminer process and dataset used for building the model.
		good_blogs_feed_list.txt, good_blogs_list.tsv: Good blog rss feed url list.
		SampleBlacklist.xlsx: Bad blog rss feed url list.

fakeblogdetection's People

Contributors

b-anand avatar

Watchers

James Cloos avatar Jingsi Zhu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.