GithubHelp home page GithubHelp logo

biffle-prototype's Introduction

biffle-prototype

Prototype for Biffle, a recommendation engine for Developer news

Components:
master-shell-script: Controller for scripts below
profile-parse: Parse LinkedIn profiles and insert them into MongoDB.
SO-tag-download: Download StackOverflow tags for users and add them to the user object
add-wordclouds: Take all of user's tags and create a wordcloud for that individual user, then save it to the user's object
search-terms-mongo: Import a data file into MongoDB that contains all of the terms that Biffle 'understands' (currently 100 Big Data database names)
search-gen-for-articles: Generate PHP files based on terms that Biffle understands
parse-and-download: Download and parse news articles and websites.
make-recommendations: Make article recommendations. Currently recommends using ElasticSearch relevance score based on all words in user's word cloud (not just 100 database names)
send-recommendations: Sends recommendations to users via email
utils/3gram-keyword-dump: Dump all words in a user's wordcloud
utils/add-tweets: Add Tweets to a user's object
utils/SO-all-user-download: Download entire StackOverflow database of users and their email hashes
utils/technorati-scraper: Download URLs for 40,000+ tech blogs from Technorati
bifflescraper/*: Scrapy implementation of Biffle scraper tool


Schemas

articles

{
"_id": MongoDB ID
"q": "big data mongodb health care"
"sc": "score"
"c": "code"
"sd": "search date"
"pubd": "publish date" (guessed date)
"procd": "processed date"
"url" "article url"
"t": "article title"
"abs": "summary text"
"sr": "article source"
"k": keyword list
"f": filename of downloaded full article
"m": metadata (retweets, etc.)
}

webpages

{
"_id": MongoDB ID
"q": "query"
"nr": "number of total results returned from search query"
"url": "webpage url"
"t": "webpage title"
"md": "meta description content tag"
"mk": "meta keywords"
"abs": "webpage summary"
"s": "webpage score"
"v": "version??",
"k": "keywords in webpage",
"f": "file path on disk"
}

topics - Not Implemented (list of topics)

{
   "big data": [ "mongodb", "hbase", "infiniDB" ….] 
   "cloud computing": ["sss", "sdfds"]
}

industries

{ "in": ["healthcare", "transportation", …] }

operations - Not Implemented

{"op": [ "deployment", "security", ,..] }

recommended_articles

{
"_id": MongoDB ID
"uid": user id
"aid": article id
"rt": recommend_datetime
"uk": user_keywords_list
"pk": presented_keywords
}

recommended_webpages

{
"_id": MongoDB ID
"uid": user id
"wid": webpage id
"rt": recommended_datetime
"uk": user_keywords_list
"pk": presented_keywords
}

user_clicks

{
   "_id": MongoDB ID,
   "uid": 123,
   "aid": article id (if article was clicked)
   "wid": webpage id (if webpage was clicked)
   "ad_url": url of ad (if ad was clicked)
   "ct": date/time of click
}

users

{
  "_id": MongoDB ID,
  "lid": linkedin unique ID,
  "e": [email protected],
  "n": Aki Balogh,
  "ln": linkedin interests (pulled from profile summary, job summary and skills)
  "in": "computer software",
  "k": ["Greenplum", "InfiniDB"]
}

so_users

{  
   "_id": MongoDB ID,
   "sid": StackOverflow ID,
   "dn": "akibalogh",
   "eh": "2dd0d3404eed2283b5307d16cec68896",
   "l": "Cambridge, MA",
   "w": "linkedin.com/in/akibalogh"
}

tech_blogs

{  
   "_id": page number of blog on Technorati, (i.e. '1' for http://technorati.com/blogs/directory/technology/page-1)
   "u": list of blog URLs on page
}

biffle-prototype's People

Contributors

akibalogh avatar deepsourcebot avatar

Watchers

 avatar  avatar Pablo Fernandez avatar James Cloos avatar Victor Hong avatar Timothy Tufts avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.