GithubHelp home page GithubHelp logo

hi-imcodeman / flipkart-scraper Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 2.0 699 KB

This package will help us to scrape all Flipkart products through Flipkart affiliate API

Home Page: https://hi-imcodeman.github.io/flipkart-scraper

License: MIT License

JavaScript 1.77% TypeScript 98.23%
flipkart flipkart-api scraper typescript crawler flipkart-scraper flipkart-products

flipkart-scraper's Introduction

NPM

Flipkart Scraper

This package will help us to scrape all Flipkart products through Flipkart affiliate API.

Please refer API Documentation here.

See the Examples here

Installation

Install using 'npm'

npm i flipkart-scraper

Install using 'yarn'

yarn add flipkart-scraper

Usage

import { FlipkartScraper } from "flipkart-scraper";

const scraper = new FlipkartScraper(
  "<Affiliate-Id-Here>",
  "<Affiliate-Token-Here>"
);

// 'data' event handler
scraper.on("data", (data) => {
  console.log(data.products.length);
});

// Start the scraper
scraper.start();

Class: FlipkartScraper

This module will help us to scrape all the Flipkart products.

constructor(affiliateId, affiliateToken, [options])

This will create instance of the FlipkartScraper class with required authentication params.

  • options {object}
    • concurrency? {number} Number for parallel processing in the queue, Default set to 2
    • maxRequest? {number} Maximum request to Flipkart affliate server, Default set to 0 - means unlimited
    • maxPage? {number} Maximum number of pages to scrape per category, Default set to 0 - means unlimited

Example

const scraper = new FlipkartScraper(
  "<Affiliate-Id-Here>",
  "<Affiliate-Token-Here>",
  {
    /**
     * It will make 5 parallel request to Flipkart.
     * This is optional param, default is set to 2
     **/
    concurrency: 5,
    /**
     * It will make only 500 request to Flipkart. After that program ends.
     * This is optional param, default is set to 0 means unlimited
     **/
    maxRequest: 500,
    /**
     * Maximum 3 request per category
     **/
    maxPage: 3,
  }
);

scraper.start([categoriesToScrape])

This method will start scraping through Flipkart affiliate API.

  • categoriesToScrape? {string[]} Pass the list of categories that you want to scrape. Default set to [] which means all categories.

Example

scraper.start(["telivision", "mobiles"]); // It will scrape only specified categories

scraper.stats(showAsNumbers?=false)

This method will show the stats of scraper. By default stats will come as numerals like (3.1k, 1.45GB)

  • showAsNumbers? {boolean} Stats will return as number instead of numerals.

Sample Stats

/*
{
  "startTime": "2021-02-21T08:14:29.445Z",
  "endTime": undefined, // endTime will be available once scraping finished
  "status": "inprogress",
  "concurrency": 30,
  "waitingRequests": 0,
  "productsCount": "19.54k",
  "elapsed": "0:00:06 10ms",
  "durationPerMillionProducts": "0:05:08 558ms",
  "productsPerSec": "3.25k products/sec",
  "avgResponseTime": "147ms",
  "requestPerSec": "7/sec",
  "requestedCount": "43.00",
  "processedCount": "41.00",
  "errorCount": "0.0",
  "retryCount": "0.0",
  "retryHaltCount": "0.0",
  "pendingCategory": 2,
  "completedCategory": 1,
  "downloadSize": "84.78MB",
  "downloadedSpeed": "14.11MB/sec",
  "info": {
    "pendingCategories": [
      {
        "category": "mens_clothing",
        "startTime": "2021-02-21T08:14:29.602Z",
        "noOfPages": 18,
        "elapsed": 5668,
        "totalProducts": 9000
      },
      {
        "category": "mobiles",
        "startTime": "2021-02-21T08:14:29.603Z",
        "noOfPages": 13,
        "elapsed": 5771,
        "totalProducts": 6500
      }
    ],
    "completedCategories": [
      {
        "category": "laptops",
        "noOfPages": 10,
        "totalProducts": 4500,
        "elapsed": 3878
      }
    ]
  }
}
*/

events: 'response'

Emitted when successful HTTP response from the Flipkart Affiliate server.

Example

// 'response' event handler
scraper.on("response", (response) => {
  console.log(response);
});

events: 'data'

Emitted when products returned from Flipkart affiliate API.

Example

// 'data' event handler
scraper.on("data", (data) => {
  console.log(data.apiData.products);
});

events: 'categoryCompleted'

Emitted when all products scraped for the category.

Example

// 'categoryCompleted' event handler
scraper.on("categoryCompleted", (completedCategoryInfo) => {
  console.log(completedCategoryInfo);
});

events: 'finished'

Emitted when scraper finished.

Example

// 'finished' event handler
scraper.on("finished", (info) => {
  console.log(info);
});

events: 'error'

Emitted if any errors occured.

Example

// Triggered if any error occured
scraper.on("error", (error) => {
  console.error(error);
});

events: 'retry'

Emitted if any retry occured.

Example

// Triggered if any retry occured
scraper.on("retry", (retryInfo) => {
  console.log(retryInfo);
});

events: 'retryHalted'

Emitted if any retries failed for 10 times.

Example

// Triggerd when retry failed 10 times
scraper.on("retryHalted", (retryHaltInfo) => {
  console.error(retryHaltInfo);
});

flipkart-scraper's People

Contributors

asrafalih avatar athena2207 avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

flipkart-scraper's Issues

Provide life cycle methods

Acceptance

  • Able to pause and resume
  • Able stop the scraping
  • Terminate the scraping if paused more than 24 hours

Provide stats for scraping data

Acceptance

  • Total No. of products scraped
  • Elapsed time
  • Products scraper per second
  • Aprox. Time to scrape 1 Million products
  • Current concurrency
  • Avg. Response API call time
  • Completed categories with product count, elapsed time, no. of pages
  • Ongoing categories with product count
  • Should have a option to get all the stats as numbers otherwise human readable format like (3.1K, 1H 20M)

Implement CLI support

Acceptance

  • Provide CLI support
  • Download as CSV file with major fields
  • Limit products counts to download (Max.: 10K, Default: 1K)
  • Download criteria should be like keyword, category, price range
  • Show list of root categories

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.