GithubHelp home page GithubHelp logo

johnbumgarner / newshound Goto Github PK

View Code? Open in Web Editor NEW
29.0 14.0 3.0 29 KB

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

article-extracting article-extractor data-science datascience data-extraction text-mining news news-aggregator python3 python-newspaper

newshound's Introduction

Currently under development. BETA will be released soon.

########### ########### ########### ########### ###########

NewsHound


PyPI  

GitHub issues  GitHub pull requests  newshound  Downloads 

Description

NewsHound is a Python 3 module that was designed to perform high quality news and article extraction for sources in multiple languages.

For instance NewsHound cleanly parses article content from the BBC in English, the Dainik Bhaskar in Hindi, the People's Daily in Chinese, the Malayala Manorama in Malayalam and the Khaosod in Thai.

The builtin extraction architecture is designed to systematically parse specific data elements from the underlying navigation structure of either an online web page or an offline file containing HTML content.

These data elements are:

  • Title/Headline
  • Description/Summary
  • Keywords
  • Name(s) of Author(s)
  • Main Text/Content
  • ISO Language
  • Language Name
  • Published Date
  • Modified Date
  • Canonical HREF
  • Top Image HREF

Installation

NewsHound requires Python >=3.6. This package can be installed using pip3.

pip3 install newshound

Usage and Documentation

For detailed information on NewsHound please refer to the documentation.

Predefined Extraction

The maintainers of NewsHound have developed and tested multiple predefined extraction modules for various news sources around the world. These specific extractors were developed to ensure consistent and accurate parsing from the news sources being queried. Additional sources will be added periodically to this predefined extraction list.

Development

If you would like to contribute to the NewsHound project please read the contributing guidelines.

Items currently under development:

  • TDB after BETA release

Issues

This repository is actively maintained. Feel free to open any issues related to bugs, coding errors, broken links or enhancements.

You can also contact me at John Bumgarner with any issues or enhancement requests.

Sponsorship

If you would like to contribute financially to the development and maintenance of the NewsHound project please read the sponsorship information.

newshound's People

Contributors

johnbumgarner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

newshound's Issues

how to use

Newspaper3k has been unable to read the news content, would like to try newshound

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.