GithubHelp home page GithubHelp logo

assignment-airtribe's Introduction

Assignment-Airtribe

Problem Statement:

Recursively crawl https://stackoverflow.com/questions using Node.js based crawler, harvest all questions on Stack Overflow and store them in a database of your choice.

What do you need to store?

  1. Every unique URL (Stack Overflow question) you encountered.
  2. The total reference count for every URL (How many time this URL was encountered).
  3. Total # of upvotes and total # of answers for every question.
  4. Dump the data in a CSV file when the user kills the script.

Things you should keep in mind:

  1. Maintain a concurrency of 5 requests at all times. Refrain from using throttled-request package to limit concurrency.
  2. Your solution needs to be asynchronous in nature.
  3. If you are using request.js, do not use its connection pool to throttle # of requests.
  4. You can use cheerio or similar library for HTML parsing.

My Approach:

I solved this problem using the following steps:

  • I pushed all the questions on the first page in an array.
  • Then, I iterated through the array recursively and popped the urls from the array and saved it into the database along with it's url, number of upvotes, total answers and the name of the question.
  • When the script is terminated, all the questions which are stored in the database is saved into a CSV File.

Usage

 
  Fork the repository
  Open the terminal and write git clone https://github.com//Assignment-Airtribe.git
  cd Assignment-Airtribe
  npm install
  create a .env file and copy the contents of config.env in it.
  npm start
  

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.