GithubHelp home page GithubHelp logo

inst737's Introduction

This is for class project in INST737.

Data source

source: https://data.stackexchange.com

Brief explanation of database is here: https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede

Example data files to work with

For project ideatons, please refer to the example files (.csv). The brief description of each file can be found as below:

  • posts

Contains information of each post (a thread of a question & answer(s))

  • Users

Contains information of each user in Stack Overflow

  • Comments

Contains information of each comment attached to either a question or an answer

  • Badges

Contains information of each badge awarded to a user for the one's contribution to the site

  • Tags

Contains information of each tag associated with a post. A post may have multiple tags.

These datasets are taken from Stack Overflow only.


Below are the list of papers that you might find helpful for brainstorming.

Reference papers

<<<<<<< HEAD

  • Discovering values~~ =======
  • From the paper titled: Discovering values~~~

27 features used

95d0fd8e19aeab79133bff6370915452339cb747

Questioner features (SA), 4 features total: questioner repu- tation, # of questioner’s questions and answers, questioner’s percentage of accepted answers on their previous questions.

Activity and Q/A quality measures (SB ), 8 features total: # of favorites, # of page views, # positive and negative votes on question, # of answers, maximum answerer reputation, high- est answer score, reputation of answerer who wrote highest- scoring answer,

Communityprocessfeatures(SC),8featurestotal:average answerer reputation, median answerer reputation, fraction of sum of answerer reputations contributed by max answerer reputation, sum of answerer reputations, length of answer by highest-reputation answerer, # of comments on answer by highest-reputation answerer, length of highest-scoring an- swer, # of comments on highest-scoring answer.

Temporal process features (SD), 7 features total: average time between answers, median time between answers, min- imum time between answers, time-rank of highest-scoring answer, wall-clock time elapsed between question creation and highest-scoring answer, time-rank of answer by highest- reputation answerer, wall-clock time elapsed between ques- tion creation and answer by highest-reputation answerer.

NLP analyses examples

For text summarization&classification

-Extracting Sentence Segments for Text Summarization: A Machine Learning Approach https://dl.acm.org/citation.cfm?id=345566

-BOOK: Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms https://dl.acm.org/citation.cfm?id=1105708

-Thumbs up?: sentiment classification using machine learning techniques https://dl.acm.org/citation.cfm?id=1118704

<<<<<<< HEAD

Other NLP analyses (not in papers, but in articles or other sources)

=======

Other NLP analyses (not in papers, but in articles or other sources)

95d0fd8e19aeab79133bff6370915452339cb747

  1. 2015 presidential debate link

  2. Topic modeling of Stack Overflow questions link

General ML methods / Techniques

  • Which algorithm to solve my problem? link

Stack OVerflow papers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.