GithubHelp home page GithubHelp logo

recommandation-engine's Introduction

Recommendation Engine

Dependencies:

    MongoDB: https://www.mongodb.org/
    PyMongo: https://github.com/mongodb/mongo-python-driver
    [see bellow]


Project Primary Structure:

-- data_processing
   Place to keep all the files related to data processing, from text parsing/filtering to the
   communication with the database.

   - mongo.py - clean wrapper for all operations involving the database

-- recommendation_engine
   -- classic collaborative filtering techniques
   TODO: add implementation

   -- Collaborative Topic Modelling (Blei 2011)
   TODO: add implementation
   https://www.cs.princeton.edu/~chongw/papers/WangBlei2011.pdf

-- eval
   Place to keep scripts for the engine's evaluation
   TODO: review evaluation methods
   TODO: add test results

-- gui
   [Optional] Engine Graphic interface
   Tools for building user graphs, data plots and the like.



Database Setup:

    To install mongodb:
    Download archive from: https://www.mongodb.org/downloads

        On Linux:
        Extract archive somewhere and it's ready to go.
        The interesting files are in ./bin:
            mongo (the console) - which you can use to inspect the database
            mongod (the actual database server)

        I suggest creating a sym-link on /usr/bin at least for these binaries.
        mongoimport, mongoexport and mongorestore are also useful.

        To start the mongo database:
        sudo mongod --dbpath [PATH_TO_WHERE_YOU_WANT_TO_STORE_THE_DATA]
        This process has to always run while you use the database.

        If you want to play a bit with the data, you can do so using the mongo console.
        eg:
        >    use yelp
        >    show collections
        Here "yelp" is the name of the database.
        Right now it's just an empty database, but we are about to add our dataset.

        Note: There are also some gui apps for mongo, I can't recommend any though.

    Dataset:
        Now, you need to get the dataset from: http://www.yelp.com/dataset_challenge
        You'll see it's an archive of json files, which happens to be a mongo-friendly format.

        Import them in the database is as easy as:
        mongoimport --db yelp --collection users yelp_academic_dataset_user.json
        mongoimport --db yelp --collection business yelp_academic_dataset_business.json
        mongoimport --db yelp --collection tip yelp_academic_dataset_tip.json
        mongoimport --db yelp --collection review yelp_academic_dataset_review.json

    Note:
        In the mongo console, I suggest setting up indexes for the fields you are most likely to
        use in your queries:

        e.g:
        > use yelp
        > db.users.createIndex({user_id:1})
        > db.review.createIndex({user_id:1, review_id:1})
        > db.business.createIndex({business_id:1}
        > db.tip.createIndex({user_id:1, business_id:1})
        > db.checkin.createIndex({business_id:1})

    More on how to use mongo:
        http://docs.mongodb.org/manual/core/introduction/

    To install pymongo:
        easy_install pymongo

        PyMongo is the python interface with the mongo database.
        I have written a quick wrapper for it in data_processing/mongo.py
        You can test it works as: python ./data_processing/mongo.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.