GithubHelp home page GithubHelp logo

pcfg's Introduction

Probabilistic Context-Free Grammar Parser

It is an implementation of Probabilistic Context-Free Grammar Parser. pcfg_parser.py reads lines from stdin, and outputs parsed trees.

Usage

Here is an example usage of this parser.

echo "the man saw the dog with the telescope" | python pcfg_parser.py counts_file.sample
["S", ["NP", ["DET", "the"], ["NOUN", "man"]], ["VP", ["VERB", "saw"], ["NP", ["NP", ["DET", "the"], ["NOUN", "dog"]], ["PP", ["ADP", "with"], ["NP", ["DET", "the"], ["NOUN", "telescope"]]]]]]

This parser outputs the parsed tree under the probabilistic model derived from the passed count file. A parsed tree is represented as a nested list in JSON format.

Counts File

If you want to train your own model, you need to prepare a counts file which contains the frequency of each word and rule in your training corpus. The format of counts file is as follows:

1 NONTERMINAL ADVP+DET
14 UNARYRULE NP+NOUN Stocks
37 BINARYRULE VP VERB NP+PRON

The first and second columns represents the number of counts and the type of entry, respectively. There are three types of entry and three formats for each type.

NONTERMINAL

This is a type for nonterminals such as S, VP, NP and so on. This type of entry has one more column for a specific nonterminal symbol.

UNARYRULE

This is a type for unary rules, which means assignments of nonterminals for terminal words. e.g. VERB -> said, VERB -> look, ADP -> after and NOUN -> point. This type of entry has two more columns: nonterminal rule and terminal symbol.

BINARYRULE

This is a type for binary rules such as S -> NP + VP. This type of entry has tree more columns: the leftmost item can be composed of the next item followed by the last item.

pcfg's People

Contributors

usami avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.