This Python project allows you to scrape product information from Flipkart based on your search query and the number of pages you want to scrape. It utilizes the Scrapy framework for efficient and structured web scraping.
- Scraping Flipkart: The script scrapes product details (e.g., name, price, rating) from Flipkart based on your search query.
- Customizable Page Limit: You can specify the number of pages to scrape for more extensive results.
- Structured Data: The scraped data is organized into structured output formats (e.g., JSON, CSV).
Before running the script, make sure you have the following installed:
- Python 3.x
- Scrapy framework (install via
pip
)
- Clone this repository to your local machine:
git clone https://github.com/yourusername/flipkart-scrapy.git
- Navigate to the project directory:
cd flipkart
-
Modify the Scrapy spider, located in
flipkart/spiders/flipkart_spider.py
, to include your search query and any additional details you want to scrape. -
In the project's root directory, run the Scrapy spider with the following command:
scrapy crawl flipkart -a search_query="your_query" -a pages_to_scrape=5 -o output.json
- Replace
your_query
with the search query you want to use. - Adjust
pages_to_scrape
to specify the number of pages to scrape. - The scraped data will be saved in the specified output file (
output.json
in this example).
You can customize the spider by modifying the spider script to extract specific information or add more scraping functionalities.
If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/fooBar
). - Make your changes.
- Commit your changes (
git commit -am 'Add some fooBar'
). - Push to the branch (
git push origin feature/fooBar
). - Create a new Pull Request.
This project is licensed under the MIT License - see the LICENSE.md file for details.