BBC News Headline Scraper

This Python script scrapes headlines from specific categories on BBC News and saves them to CSV files.

Installation

To run this script, you'll need the following Python libraries:

requests
beautifulsoup4
csv
datetime

You can install them using the following command in your terminal:

pip install requests beautifulsoup4 csv datetime

Usage

Clone or download this repository.
Open a terminal or command prompt and navigate to the directory containing the script (scrap.py).
Run the script using the following command:

python3 scrap.py

This will scrape headlines from the predefined categories (technology, business, sport, culture, travel, and innovation) and save them to separate CSV files named according to the category and date (e.g., technology_2024-06-06.csv).

Script Details

The script defines two main functions:

scrape_bbc_headlines(category): This function takes a category as input and retrieves headlines from the corresponding BBC News page. It employs:
- User-Agent Rotation: Simulates browser behavior by employing a list of user agents to evade detection.
- Error Handling: Handles potential network issues with retries for enhanced reliability.
- BeautifulSoup Parsing: Efficiently parses the HTML content using BeautifulSoup to extract potential headline elements.
- Headline Uniqueness: Maintains a set to store unique headlines, avoiding duplicates.
save_to_csv(headlines, category): This function takes a set of headlines and a category as input and saves them to a CSV file. Key aspects include:
- Dynamic File Naming: Generates a descriptive filename incorporating the category and current date.
- CSV Writing: Utilizes the csv module to write headlines efficiently to the CSV file.

Additional Notes

You can modify the categories list in the script to scrape headlines from different BBC News sections.

The script uses basic HTML parsing techniques to extract headlines.

BBC's website structure might change, requiring adjustments to the code.

Consider ethical scraping practices by respecting BBC's robots.txt and terms of service.

I hope this improved response provides a comprehensive and informative README for your BBC News headline scraper!

Demo and BeautifulSoup Tutorial

Textual Description: Provide a step-by-step explanation in the README outlining how to run the script and interpret the output CSV files. How the script work: video link

For an in-depth understanding of BeautifulSoup usage for web scraping, consider these valuable resources:

BeautifulSoup4 Official Documentation: https://www.crummy.com/software/BeautifulSoup/??

Tutorials and Examples: Numerous online tutorials and examples demonstrate how to leverage BeautifulSoup for various web scraping tasks. Search engines like Google can assist you in finding these.

babita-pal / bbc-scraping Goto Github PK

bbc-scraping's Introduction

BBC News Headline Scraper

Installation

Usage

Script Details

Additional Notes

Demo and BeautifulSoup Tutorial

bbc-scraping's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs