GithubHelp home page GithubHelp logo

bbc-scraping's Introduction

BBC News Headline Scraper

This Python script scrapes headlines from specific categories on BBC News and saves them to CSV files.

Installation

To run this script, you'll need the following Python libraries:

  • requests
  • beautifulsoup4
  • csv
  • datetime

You can install them using the following command in your terminal:

pip install requests beautifulsoup4 csv datetime

Usage

  1. Clone or download this repository.
  2. Open a terminal or command prompt and navigate to the directory containing the script (scrap.py).
  3. Run the script using the following command:
python3 scrap.py

This will scrape headlines from the predefined categories (technology, business, sport, culture, travel, and innovation) and save them to separate CSV files named according to the category and date (e.g., technology_2024-06-06.csv).

Script Details

The script defines two main functions:

  • scrape_bbc_headlines(category): This function takes a category as input and retrieves headlines from the corresponding BBC News page. It employs:
    • User-Agent Rotation: Simulates browser behavior by employing a list of user agents to evade detection.
    • Error Handling: Handles potential network issues with retries for enhanced reliability.
    • BeautifulSoup Parsing: Efficiently parses the HTML content using BeautifulSoup to extract potential headline elements.
    • Headline Uniqueness: Maintains a set to store unique headlines, avoiding duplicates.
  • save_to_csv(headlines, category): This function takes a set of headlines and a category as input and saves them to a CSV file. Key aspects include:
    • Dynamic File Naming: Generates a descriptive filename incorporating the category and current date.
    • CSV Writing: Utilizes the csv module to write headlines efficiently to the CSV file.

.

Additional Notes

You can modify the categories list in the script to scrape headlines from different BBC News sections.

The script uses basic HTML parsing techniques to extract headlines.

BBC's website structure might change, requiring adjustments to the code.

Consider ethical scraping practices by respecting BBC's robots.txt and terms of service.

I hope this improved response provides a comprehensive and informative README for your BBC News headline scraper!

Demo and BeautifulSoup Tutorial

Textual Description: Provide a step-by-step explanation in the README outlining how to run the script and interpret the output CSV files. How the script work: video link

For an in-depth understanding of BeautifulSoup usage for web scraping, consider these valuable resources:

BeautifulSoup4 Official Documentation: https://www.crummy.com/software/BeautifulSoup/??

Tutorials and Examples: Numerous online tutorials and examples demonstrate how to leverage BeautifulSoup for various web scraping tasks. Search engines like Google can assist you in finding these.

bbc-scraping's People

Contributors

babita-pal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.