GithubHelp home page GithubHelp logo

iukekini / backend_go_sample Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 38.52 MB

Parses reviews from an auto site then uses machine learning to sort them. Prints the top 5 results.

Go 6.75% Makefile 0.33% HTML 92.91%
classifier positivity backend parse golang

backend_go_sample's Introduction

Backend Coding Assessment

How to Run the Solution

  1. Install Go you can find the instructions here

    Install GCC, needed for the testing

    Make sure to set your GOROOT Directory Instructions Here

  2. Install Dep You can find Instructions here.

    *Dep is used for the package management in this application.

  3. Download code into the correct directory.

    Go is a little picky about where code is. It wants to be in your go root directory and the code needs to be in the following path for this project.

    $GOROOT/src/github.com/Iukekini/backend-coding-assessment-Iukekini-1052

  4. Build the Project and test project.

    I setup a make file to do all the dependency loading / building / testing.

    make

    If you want to run it manually, here are the commands it will run.

     dep ensure
     go build -o podium-backend-assessment -v
     go test -v ./...
    
  5. Run the application.

    ./podium-backend-assessment

    Results Notes The results are laid out in a table wiht the follow columns

    • Probability - The is the probability that the classifier put the review in the right class (1-5).
    • Rating - This is the class returned by the classifier
    • User - User that authored the review
    • Visit Type - Service / Sales / Used
    • Score - this is the score the user gave the review
    • Date - Date of Visit
    • Review - This is the title of the review. I didn't include the body as it was too long to display nicely.

If you want to see more reviews or pull more data (parse more pages) you can adjust that from the config.json file.

How I determined "Overly positive" reviews

In order to rank the reviews based on their positivity. I setup a Bayes classifier. I used a set of amazon reviews to train the classifier on what a positive review looked like. The classifier has 5 classes based on the 5 stars of an amazon review. After the classifier was trained I checked each of the reviews that I had parsed from the site against the classifier. I took the result and used that to sort the reviews and pick the highest rated 3 reviews to show.

notes

The classifier training data was not perfect for this scenario. Since an amazon review is more love / neutral / hate type of review. The classifier had a harder time picking between a good review and an over the top review. This problem could be solved by creating a set of training data that better represented this problem.

Problems / Questions / Frustrations

Please feel free to open an issue.

Open Source References

goml for the classification algorithm

go-config for the Config loading and management

Testify Some add ons for the go test suite. Enables assert and panic checks.

Log15 for the Logging

goquery like jquery but for go. Used it for searching parsing the webpages.

Training Data I used the amazon review csv to train the classifier. I only used the first 4k rows

Dep for package management

backend_go_sample's People

Contributors

iukekini avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.