gutenberg's Introduction

Gutenberg

Overview

This package contains a variety of scripts to make working with the Project Gutenberg body of public domain texts easier.

The functionality provided by this package includes:

Downloading texts from Project Gutenberg.
Cleaning the texts: removing all the crud, leaving just the text behind.
Making meta-data about the texts easily accessible.

The package has been tested with Python 2.6, 2.7 and 3.4

Installation

This project is on PyPI, so I'd recommend that you just install everything from there using your favourite Python package manager.

If you want to install from source or modify the package, you'll need to clone this repository:

This package depends on Berkeley DB so you'll need to install that:

Now, you should probably install the dependencies for the package and verify your checkout by running the tests.

Usage

Downloading a text

Looking up meta-data

Title and author meta-data can queried:

Note: The first time that one of the functions from gutenberg.query is called, the library will create a rather large database of meta-data about the Project Gutenberg texts. This one-off process will take quite a while to complete (18 hours on my machine) but once it is done, any subsequent calls to get_etexts or get_metadata will be very fast.

Limitations

This project deliberately does not include any natural language processing functionality. Consuming and processing the text is the responsibility of the client; this library merely focuses on offering a simple and easy to use interface to the works in the Project Gutenberg corpus. Any linguistic processing can easily be done client-side e.g. using the TextBlob library.

Recommend Projects

ageitgey / gutenberg Goto Github PK

gutenberg's Introduction

Gutenberg

Overview

Installation

Usage

Downloading a text

Looking up meta-data

Limitations

gutenberg's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs