This Python script scrapes headlines from specific categories on BBC News and saves them to CSV files.
To run this script, you'll need the following Python libraries:
- requests
- beautifulsoup4
- csv
- datetime
You can install them using the following command in your terminal:
pip install requests beautifulsoup4 csv datetime
- Clone or download this repository.
- Open a terminal or command prompt and navigate to the directory containing the script (scrap.py).
- Run the script using the following command:
python3 scrap.py
This will scrape headlines from the predefined categories (technology, business, sport, culture, travel, and innovation) and save them to separate CSV files named according to the category and date (e.g., technology_2024-06-06.csv).
The script defines two main functions:
- scrape_bbc_headlines(category): This function takes a category as input and retrieves headlines from the corresponding BBC News page. It employs:
- User-Agent Rotation: Simulates browser behavior by employing a list of user agents to evade detection.
- Error Handling: Handles potential network issues with retries for enhanced reliability.
- BeautifulSoup Parsing: Efficiently parses the HTML content using BeautifulSoup to extract potential headline elements.
- Headline Uniqueness: Maintains a set to store unique headlines, avoiding duplicates.
- save_to_csv(headlines, category): This function takes a set of headlines and a category as input and saves them to a CSV file. Key aspects include:
- Dynamic File Naming: Generates a descriptive filename incorporating the category and current date.
- CSV Writing: Utilizes the csv module to write headlines efficiently to the CSV file.
.
You can modify the categories list in the script to scrape headlines from different BBC News sections.
The script uses basic HTML parsing techniques to extract headlines.
BBC's website structure might change, requiring adjustments to the code.
Consider ethical scraping practices by respecting BBC's robots.txt and terms of service.
I hope this improved response provides a comprehensive and informative README for your BBC News headline scraper!
Textual Description: Provide a step-by-step explanation in the README outlining how to run the script and interpret the output CSV files. How the script work: video link
For an in-depth understanding of BeautifulSoup usage for web scraping, consider these valuable resources:
BeautifulSoup4 Official Documentation: https://www.crummy.com/software/BeautifulSoup/??
Tutorials and Examples: Numerous online tutorials and examples demonstrate how to leverage BeautifulSoup for various web scraping tasks. Search engines like Google can assist you in finding these.