GithubHelp home page GithubHelp logo

chipper's Introduction

Twitter Text Extraction

A fast screen name, hashtag, url extraction and tokenizer for tweets.

API

Chipper
  #users     => [Array]
  #hashtags  => [Array]
  #urls      => [Array]
  #tokens    => [Array]

  #skip_users
  #skip_hashtags
  #skip_tokens
  #skip_token_pattern

Usage

require 'chipper'

Chipper.skip_users(%w(youtube msn))
Chipper.skip_hashtags(%w(abc24 cnn))
Chipper.skip_tokens(%w(story tv why that get from your))
Chipper.skip_token_pattern '^vimeo$'

tweet = "hi @youtube, could we get #cnn videos so i can #watch it on my @apple tv http://t.co/HM7XoimM"
Chipper.users(tweet)    #=> ["@apple"]
Chipper.hashtags(tweet) #=> ["#watch"]
Chipper.urls(tweet)     #=> ["http://t.co/HM7XoimM"]

# n-gram tokenizer, returns a list of tokens partitioned by stop words, punctuation, urls and hashtags.
Chipper.tokens(tweet)   #=> [["could"], ["get"], ["videos"], ["can"]]

# single method that does all of the above and returns a hash.
Chipper.entities(tweet)

Gotchas

  • skips tokens shorter than 3 characters

  • only handles t.co urls

Updating version

  • update ext/src/version.h

  • rake gemspec

License

Creative Commons Attribution - CC BY

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.