GithubHelp home page GithubHelp logo

yelpdataextractor's Introduction

YelpDataExtractor

This utility is used to convert yelp data in json to csv data which can be imported to the blu cloud.

Software Required

  1. JDK 7
  2. Eclipse IDE

Input Data

Yelp Academic dataset can be downloaded from https://www.yelp.com/academic_dataset. Unzip the data and place it at location @ YELP_DATA_FOLDER

Project Structure

src :- main java source of the project lib : jar library dependencies for the project blu_data_files : csv blu data files used to upload the data to blu cloud

Installation Instructions

  1. Import the project as Java Project
  2. Include the jar libraries present @ lib folder to the classpath of the project
  3. Update ReadYelpData.java , set YELP_DIR_PATH point to the full path of the YELP_DATA_FOLDER
  4. Update ReadYelpData.java , set YELP_OUTPUT_DIR_PATH point to the full path of the output folder
  5. Run ReadYelpData as a Java Application through Eclipse. It will generate csv data files in YELP_OUTPUT_DIR_PATH folder which can be used to upload data to blu cloud.
  6. When execution for ReadYelpData is complete it will also print the SQL DDL for the tables in the output console of Eclipse. Copy this DDL annd use it to create tables in blu cloud.

RUN OUTPUT

Number of businesses : 11537

Number of Categories : 508

Number of Checkins : 262764

Number of Users : 43873

Number of Reviews : 229907

CREATE TABLE USER ( USER_ID VARCHAR(22) ,NAME VARCHAR(30) ,TYPE VARCHAR(10) ,FUNNY_VOTES INT,USEFUL_VOTES INT,COOL_VOTES INT,REVIEW_COUNT INT,AVERAGE_STARS DOUBLE);

User Data Written

CREATE TABLE REVIEW ( USER_ID VARCHAR(22) ,BUSINESS_ID VARCHAR(22) ,REVIEW_ID VARCHAR(22) ,TYPE VARCHAR(10) ,FUNNY_VOTES INT,USEFUL_VOTES INT,COOL_VOTES INT,STARS INT,REVIEW_DATE DATE);

Review Data Written

CREATE TABLE BUSINESS_TO_CHECKIN ( BUSINESS_ID VARCHAR(22) ,TYPE VARCHAR(10) ,DAY VARCHAR(10),HOUR VARCHAR(10),COUNT INT);

CheckIn Data Written

CREATE TABLE CATEGORY ( ID INT ,NAME VARCHAR(50));

Category Data Written

CREATE TABLE BUSINESS ( BUSINESS_ID VARCHAR(22) ,CITY VARCHAR(30) ,NAME VARCHAR(50) ,STATE VARCHAR(50) ,TYPE VARCHAR(10) ,LONGITUDE DOUBLE,LATITUDE DOUBLE,REVIEW_COUNT DOUBLE,STARS INT);

CREATE TABLE BUSINESS_TO_CATEGORY ( BUSINESS_ID VARCHAR(22) ,CATEGORY_ID INT);

Business Data Written

Processing Complete

yelpdataextractor's People

Contributors

shefali-dubey avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.