GithubHelp home page GithubHelp logo

ngts-aus / humanizr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from networkdynamics/humanizr

0.0 1.0 0.0 5.53 MB

Shell 0.14% Makefile 0.26% C 6.26% C++ 8.39% MATLAB 0.04% Python 12.74% HTML 5.01% Groff 67.16%

humanizr's Introduction

Humanizr: Bringing the humanity back to Twitter

Humanizr is a software library and off-the-shelf tool for classifying Twitter accounts as belonging to either an individual or an organization (e.g., a corporation or social group).

Why you need Humanizr:

Humanizr is an essential tool for researchers working on Twitter who want perform studies such as:

  • Model human populations through their content and social connections
  • Use the content of Twitter accounts to predict phenomenon such as elections, epidemics, or rumor-spreading
  • Observe the dynamics of how information flows between accounts over time.

Humanizr lets you clearly identify which account belong to individual people and those representing organizations. Without Humanizr, all Twitter accounts appear the same, which runs the significant risk when modeling people-specific phenomena of incorporating a noise from non-human accounts. For example, a study looking for flu-related keywords could be mis-influenced by News-media accounts reporting about flu symptoms elsewhere; or, when trying to predict elections by candidate interest, a study could be thrown off by news reporting of the candidates. Humanizr solves these problems by identifying individuals' Twitter accounts.

Alternately, Humanizr provides an unparalleled view in organizations and how they they behave. Humanizr enables researchers to study how individuals and organizations interact, all the way from examining how content created by organizations reaches individuals to how individuals converse with organizations (e.g., local clubs or small businesses).

Requirements:

  • NumPy>=v1.6.0

Installation and Usage:

In the top level directory, run:

./install.sh [--user]

Once installed, to use the classifier

./classify_organizations.sh [-h] [-o] tweet_dir
 -h Help. Display this message and quit
 -o Output_path/filename. If no path is specified, default is current_directory/output.tsv
 tweet_dir Directory to tweet JSON files</code>

The directory of tweet JSON files must be a path to a directory where files contain tweet JSON objects in the Twitter format. The objects can be spread across any number of files, and one file can contain any number of JSON objects (one per line), but each object can only contain one tweet. In other words, each tweet will have its own JSON object including tweet and user information as returned by the Twitter REST API.

The output of the classifier is a two-column .tsv file where for each line, the first column is a user ID, and the second column is the classification (org = organization/per = individual).

Reference

For more information on how Humanizr works, see our paper in ICWSM 2015. We kindly ask that if you use Humanizr in an academic work, that you please cite the following reference:

@inproceedings{McCorriston2015Organizations,
    title={Organizations are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter},
    author={James McCorriston and David Jurgens and Derek Ruths},
    booktitle={Proceedings of the 9th International AAAI Conference on Weblogs and Social Media (ICWSM)},
    year={2015}
}

humanizr's People

Contributors

jmccorriston avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.