wikitalker's Introduction

WikiTalker

A Toolkit to parse and analyse Wikipedia talk pages.

Please use the WikiTalker/sample.json file to dump the dataset in MongoDB. Make sure you have MongoDB installed and running on your system.

Depending upon the values you used in the WikiTalker/analyzer.py file, the data extraction commands (to fetch the saved data from the MongoDB server) will be as follows.

This is your MongoDB shell:

> use <name_of_wiki_database>
> db.<name_of_wiki_collection>.find({"id": <wiki_article_id>});
> db.<name_of_wiki_collection>.find({"revision_id": <comment_revision_id>});

If you do not change the WikiTalker/analyzer.py file and run it, all database and collection names will be as mentioned in the code. You can then run these commands.

This is your MongoDB shell:

> use mywikidump
> db.sample.find({"id": 1});
> db.sample.find({"revision_id": "901589438"});

Grawitas is a lightweight, fast parser for Wikipedia talk pages that takes the raw Wikipedia-syntax and outputs the structured content in various formats.

Methods to be implemented

Extract the list of editors
Find top editors
Find the sentiments of each comments
Comments by day/month/year
Comments during a given duration
Creating the discussion tree Here
Common editors in a set of talk pages

wikitalker's People

descentis / wikitalker Goto Github PK

wikitalker's Introduction

WikiTalker

A Toolkit to parse and analyse Wikipedia talk pages.

Methods to be implemented

wikitalker's People

Contributors

Stargazers

Watchers

Forkers

wikitalker's Issues

bug in input function of util.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs