GithubHelp home page GithubHelp logo

gramstat's Introduction

Gramstat : Grammar Statistics

by Tim Henderson ([email protected])

What?

Computes statistics about the structure of a grammar based on a set of operational inputs.

Usage

usage: stat.py [Options] [FILE]+
Explanation

    Generates statistics on parse trees. "[FILE]+" is a list of files containing
    serialized parse trees. The format for the parse tree is a pre-order
    enumeration.

    grammar

        nodes := nodes node
        nodes := node
        node := NUM COLON STRING NEWLINE

        COLON = r':'
        NUM = r'[0-9]+'
        STRING = r'.+$'
        NEWLINE = "\\n"

        NB: Whitespace is signficant, but STRING matches whitespace (except for
            newline).

    eg.

        2:root
        2:left side
        0:x
        1:y
        0:z
        3:right side
        0:a
        0:b
        0:c

    corresponds to
                            root
                            /  \
                            /    \
                    left side      right side
                    /    \         /    |    \
                   x      y       a     b     c
                        |
                        z

Options

    -h, help                            print this message
    -v, version                         print the version
    -g, grammar=<file>                  supply a known grammar to annotate
    -o, outdir=<directory>              supply a path to a non-existant
                                        directory
                                        [default: ./gramstats]
    -i, imgs=<bool>                     generate images
                                        [default: true]
    -t, tables=<bool>                   generate statistic tables (as csv files)
                                        [default: true]
    -a, artifacts                       list what artifacts `stat.py` can
                                        generate
    -A, artifact=<artspec>              generate a specific artifact only.
                                        Multiple '-A' flags allowed.
                                        [overrides -o,-i, and -t]
    -T, usetables=<directory>           look for pre-existing statistic tables
                                        in this directory. With this option
                                        no other files are required, however
                                        if more examples are given the tables
                                        are updated. The new tables will only
                                        overwrite the old tables if
                                        "-o <dirname>" == "-T <dirname>"
    -s, stdin                           accept ASTs on standard in. With blank
                                        lines seperating trees. If files are
                                        supplied with this flag it will be an
                                        error.

Specs

    <file>                              the path to a file
    <directory>                         the path to a directory.
    <bool>                              either "true" or "false"
    <artspec>                           <artifact>:<file>
    <artifact>                          an artifact in the list generated by
                                        --artifacts

gramstat's People

Contributors

timtadh avatar rxl211 avatar

Stargazers

 avatar John Gunderman avatar  avatar

Watchers

 avatar James Cloos avatar  avatar

gramstat's Issues

Merge ASTs into a combined AST for an estimated grammar.

Right now if there is no grammar specified (and since grammar specification is unimplemented there is no grammar specified) one has no idea what that grammar should be like. By merging ASTs we may be able to divine an estimate of what the grammar was based on these trees.

Allow Artifacts to Depend on other Artifacts

Create a Dependency structure between artifacts. This could get pretty harry (like it did in hackerframe) but it is essential to avoid making duplicate computations.

The dependency graph must be acyclic. Make sure to assert this property and raise an error if it fails.

Symbol Count Histograms

Produce histograms for the symbol count tables. Fit the histograms to the Normal Distribution.

Production Coverage

For each AST create a "production coverage" estimate from the real grammar. Depends on issue #7

Create a new STDIN format

create a new standard input format for ./stat.py. Currently the format is very limiting and only allows syntax trees to be passed to the program. Now that gramstat can take other things like coverage information this information needs to be expressible on stdin.

proposal: use a json format

eg.

[
  a list of {
    'filename':str, 
    'ast':str, 
    'coverage':[ 
      a list of {
        'filename':str, 
        'total lines':int, 
        'excuted lines':[list of int]
      }
    ]
  }
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.