GithubHelp home page GithubHelp logo

nathaniafernandes / amazon-web-scraping Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 45 KB

Developed a web scraping tool using Python's BeautifulSoup library to extract essential product data from Amazon.

Python 100.00%

amazon-web-scraping's Introduction

Web Scraping Project with Python

This project provides Python scripts for web scraping product information from Amazon. The project consists of two main scripts:

  1. scrape.py: This script scrapes product details, such as product titles and prices, from Amazon search results and saves the data to an Excel file.

  2. brand.py: This script extends the functionality of the previous script by scraping additional information, specifically the brand of each product, and adds it to a new Excel file.

Features

  • User-Agent Rotation: The scripts rotate through a list of user-agent strings to mimic different browsers, making the scraping process more like human behavior.

  • Random Delay: A random delay is introduced between requests to avoid overloading the server with too many requests in a short time.

Prerequisites

Before running the scripts, ensure you have the following prerequisites:

  • Python 3.x (Python 3.3 or later is recommended)
  • virtualenv (optional but recommended for managing dependencies)

Create a virtual environment (Optional):

  • python -m venv venv

  • Activate the virtual environment:

    On Windows: venv\Scripts\activate

    On macOS and Linux: source venv/bin/activate

Install project dependencies:

  • pip install requests beautifulsoup4 openpyxl pandas

  • Or to install the dependencies from the requirements.txt file, you can use the following command: pip install -r requirements.txt

Run the scripts as needed:

  1. To run scrape.py, use:
  • python scrape.py
  • Input your URL via the console.

This script scrapes product titles and prices from Amazon search results and saves the data to an Excel file named products.xlsx.

  1. To run brand.py, use:
  • python brand.py

This script extends the functionality by scraping brand information and adding it to an Excel file named final.xlsx.

Deactivate the virtual environment when you're done:

  • deactivate

Project Structure

  1. scrape.py: The main Python script for scraping product titles and prices.
  2. numprod.py: The script for scraping brand information and updating the Excel file.
  3. README.md: This documentation file.

Acknowledgments

  1. Requests - For making HTTP requests.
  2. Beautiful Soup - For parsing HTML content.
  3. openpyxl - For working with Excel files.
  4. pandas - For data manipulation and handling DataFrames.

amazon-web-scraping's People

Contributors

nathaniafernandes avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.