GithubHelp home page GithubHelp logo

document-generator-backend's Introduction

Document Generator

Document Generator is a project to generate randomized documents based on content retrieved from Wikipedia with paraphrasing. The generated documents may be used in various instances, most notably for uploading to CourseHero as study materials.

Installation

Use the package manager pip to install dependencies.

You will also need to download nltk modules. The script nltkdownload.py will automatically download all the nltk packages you need.

The command below will install all dependencies in requirements.txt and install the modules from nltk

For MacOS:

python3 -m pip install -r requirements.txt && python3 nltkdownload.py

If there are issues installing lxml on MacOS the issue may be that xcode command tools is not installed. xcode command tools can be installed with the following command:

xcode-select --install

Usage

When executed with no further arguments, the program will generate one batch of documents with randomized pages from Wikipedia.

usage: generate_documents.py [-h] [-t TITLE] [-n NUMBER] [-b BATCH]
                             [-c CLASS_NAMES] [-s SENTENCES]

optional arguments:
  -h, --help            show this help message and exit
  -t TITLE, --title TITLE
                        The wikipedia page title to use for generating
                        documents (default: None)
  -n NUMBER, --number NUMBER
                        The number of documents to generate in a batch
                        (default: 10)
  -b BATCH, --batch BATCH
                        The number of batches to generate (default: 1)
  -c CLASS_NAMES, --class_names CLASS_NAMES
                        The class names to use for generating documents.
                        You should parenthesize the class name if it contains white spaces.
                        (default: ['CS 1', 'CS 2', 'CS 3', 'CS 4', 'CS 5', 'CS
                        6', 'CS 7', 'CS 8', 'CS 9', 'CS 10'])
  -s SENTENCES, --sentences SENTENCES
                        The number of sentences to use for each chunk
                        (default: 25)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

document-generator-backend's People

Contributors

rvaidun avatar ytinyui avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.