GithubHelp home page GithubHelp logo

prachi911 / amazon-scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from moffi-bit/amazon-scraper

0.0 0.0 0.0 129 KB

Simple Amazon scraper with some cool functionality.

License: GNU General Public License v3.0

Python 100.00%

amazon-scraper's Introduction

Amazon-Scraper: Find your perfect item!

Amazon Scraper

Functionality: scraping multiple amazon web pages for your item, setting a price range for your item, csv/json output, and more!

Technology Needed:

Python Dependencies Needed:

  • bs4 (BeautifulSoup4)
  • requests
  • lxml
  • rich

How to use (NOTE: -i or --item and -n or --num are required fields):

Get the repository:

mkdir "Amazon Scraper"
cd "Amazon Scraper"
git clone https://github.com/Moffi-bit/Amazon-Scraper.git

Install the dependencies:

py -m pip install bs4 requests lxml rich

If you encounter issues trying to run commands using "py", you may have to use "python" or "py3" instead. Your system environment PATH variable may also be the issue.

Moving into the cloned repository:

cd Amazon-Scraper

Usage:

usage: scrape.py [-h] [-i ITEM [ITEM ...]] [-l LOWER] [-u UPPER] [-n NUM] [-o OUT] [-c]
Note: Adding -c to the arguments will cause the program to print the cheapest item at the end of scraping

Individual Commands:

-i or --item:

py scrape.py -i xbox 

OR

py scrape.py --item xbox 

Tells the program that the item you're looking for is a xbox.

-l or --lower:

py scrape.py -l 50

OR

py scrape.py --lower 50

Tells the program that the price minimum (lower bound) is 50.

-u or --upper:

py scrape.py -u 500

OR

py scrape.py --upper 500

Tells the program that the price maximum (upper bound) is 500.

-n or --num:

py scrape.py -n 100

OR

py scrape.py --num 100

Tells the program that the number of item links you want to pull data from is 100.

-c:

py scrape.py -c

Tells the program that you want it to output the cheapest item after it's scraped all links.

-o or --out:

py scrape.py -o test

OR

py scrape.py --out test

Tells the program that you want the product information to be written to a csv/json named test. If this argument is not provided the default csv/json the information will go to is: "out.csv"/"out.json"

Examples of Possible Run Commands:

Get all items within a price range (USD):

py scrape.py -i xbox s -l 200 -u 400 -n 100

Get all items above a price (USD):

py scrape.py -i yoga mats -l 10 -n 150

Get all items below a price (USD):

py scrape.py -i playstation -u 400 -n 100

Get all items no matter the price:

py scrape.py -i car tires -n 100

Get the cheapest item of the items scraped and write the information to a csv/json named "gfxcards":

py scrape.py -i rtx 3090 -n 50 -c -o gfxcards

CSV:

Format:

title,price,rating,reviews,availability,url

The CSV contains ALL of the relevant items scraped

JSON:

Format:

Amazon Scraper

Future Improvements

  • Pulling product information
  • CSV Output and functionality for choosing which CSV the data goes to
  • Multiple page scraping
  • Return the cheapest item
  • Dynamic headers (Special thanks to @mumanye for adding this functionality)
  • Using proxies?
  • JSON Output and functionality for choosing which JSON the data goes to
  • Improve the consistency of finding product information
  • Adding class functionality so you do not have to use args (e.g look at demo.py for a code example)
  • Functionality so the user can choose to scrape additional n number of links (updated: this is now done through the CLI)
  • Individual folders for all of these csvs and jsons (to keep the directory clean)
  • Processing product information using multiple threads to increase speed

Please report any issues/bugs you come across when using the scraper! Always looking to receive feedback, what I should add, and make improvements! You can tell me by creating a new issue for the repository.

amazon-scraper's People

Contributors

moffi-bit avatar mumanye avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.