This project provides Python scripts for web scraping product information from Amazon. The project consists of two main scripts:
-
scrape.py
: This script scrapes product details, such as product titles and prices, from Amazon search results and saves the data to an Excel file. -
brand.py
: This script extends the functionality of the previous script by scraping additional information, specifically the brand of each product, and adds it to a new Excel file.
-
User-Agent Rotation: The scripts rotate through a list of user-agent strings to mimic different browsers, making the scraping process more like human behavior.
-
Random Delay: A random delay is introduced between requests to avoid overloading the server with too many requests in a short time.
Before running the scripts, ensure you have the following prerequisites:
- Python 3.x (Python 3.3 or later is recommended)
virtualenv
(optional but recommended for managing dependencies)
Create a virtual environment (Optional):
-
python -m venv venv
-
Activate the virtual environment:
On Windows:
venv\Scripts\activate
On macOS and Linux:
source venv/bin/activate
-
pip install requests beautifulsoup4 openpyxl pandas
-
Or to install the dependencies from the requirements.txt file, you can use the following command:
pip install -r requirements.txt
- To run scrape.py, use:
python scrape.py
- Input your URL via the console.
This script scrapes product titles and prices from Amazon search results and saves the data to an Excel file named products.xlsx.
- To run brand.py, use:
python brand.py
This script extends the functionality by scraping brand information and adding it to an Excel file named final.xlsx.
deactivate
- scrape.py: The main Python script for scraping product titles and prices.
- numprod.py: The script for scraping brand information and updating the Excel file.
- README.md: This documentation file.
- Requests - For making HTTP requests.
- Beautiful Soup - For parsing HTML content.
- openpyxl - For working with Excel files.
- pandas - For data manipulation and handling DataFrames.