GithubHelp home page GithubHelp logo

iatorm / grime Goto Github PK

View Code? Open in Web Editor NEW
13.0 1.0 0.0 74 KB

A language for matching two-dimensional patterns, based on Boolean grammars.

License: MIT License

Haskell 99.24% Shell 0.76%
haskell two-dimensional programming-language pattern-matching

grime's Introduction

Grime

Introduction

Grime is a two-dimensional pattern matching language based on Boolean grammars. The basic idea is to construct rectangular patterns from smaller components, and check whether they are found in the input matrix.

You can try Grime online, courtesy of Dennis from PPCG. Be sure to check the tutorial too.

Usage

A Grime program is called a grammar, and the correct file extension for a grammar is .gr, although this is not enforced. The program is called like

runhaskell grime.hs [options] grammarfile matrixfile

where matrixfile is a file containing the character matrix. For example, the digits program would be called like

runhaskell grime.hs digits.gr digit-matrix

If you want a speed boost, you should compile Grime with the command

ghc -O2 grime.hs

and then run the resulting executable:

grime [options] grammarfile matrixfile

There's also a bash script rungrime that compiles Grime if needed and then calls the executable with the provided arguments.

By default, the interpreter prints the first match it finds, but this can be controlled using the option flags:

  • -e: match only the entire matrix, print 1 for match and 0 for no match.
  • -n: print the number of matches, or the entire matrix if -e is also given.
  • -a: print all matches.
  • -p: print also the positions of the matches, in the format (x,y,w,h).
  • -s: do not print the matches themselves.
  • -b: include borders of width 1.
  • -d or -d0: print debug information.
  • -d1: print logs from the matcher.

These can be inserted from the command line, or in the beginning of any line of a grammar file, separated from the rest of the line by a backtick `.

Syntax and Semantics

A Grime grammar consists of one or more definitions, each on a separate line. Each of them defines the value of a nonterminal, and one of them must define the anonymous toplevel nonterminal. The syntax of a definition is either N=E or E, where N is an uppercase letter and E is an expression.

Expressions are constructed as follows.

  • Any character escaped with \ matches any 1x1 rectangle containing that character.
  • . matches any single character.
  • b matches a 1x1 rectangle outside the character matrix. It's only relevant if the -b option is used, or if the input is not rectangular.
  • e matches a 0-height or 0-width segment of the input's edge.
  • $ always matches.
  • f matches any 0-height (flat) rectangle, and t matches any 0-width (thin) rectangle.
  • The pre-defined character groups are digit, uppercase, lowercase, alphabetic, alphanumeric, and symbol.
  • New character classes can be defined by the syntax [a-prt-w,d-gu]. The letters on the left are included, and those on the right are excluded, so this matches exactly the letters abchijklmnoprtvw. If the left side is empty, it is taken to contain all characters and the border. The comma can be omitted if the right side is empty. The characters [],-\ must be escaped with \. \b matches a border.
  • An unescaped uppercase letter is a nonterminal, and matches the expression it is assigned.
  • _ is the anonymous toplevel nonterminal.
  • If P and Q are expressions, then PQ is just their horizontal concatenation, and P/Q is their vertical concatenation, with P on top.
  • P Q is like PQ, but with lower precedence than /. This sometimes saves parentheses.
  • P+ is one or more Ps aligned horizontally, and P/+ is the same aligned vertically.
  • The Boolean operations are denoted P|Q, P&Q and P!. Exclusive OR is P~Q, and difference is P-Q.
  • P? is shorthand for t|P, P/? for f|P, P* for t|P+, and P/* for f|P/+.
  • P{a-b,c-d}, where abcd are nonnegative integers, is a size constraint on P. If P is a character class, then the expression matches any mxn rectangle containing only those characters, provided that m is between a and b inclusive, and n is between c and d inclusive. In other cases, the expression matches any rectangle that has the correct size and that P also matches. If a or c are omitted, they are taken to be 0, and if b or d are omitted, they are infinite. If the hyphen between a and b is omitted, then we use the same number for both ends of the interval. If the entire c-d part is omitted, both axes are constrained. To clarify, {-b} is equivalent to {0-b,0-b}, and {a-,c} is equivalent to {a-infinity,c-c}. The } can be omitted.
  • P:a-b,c-d} is a grid specifier. It matches a rectangle that can be divided into an mxn grid of matches of P, where m is between a and b inclusive, and n is between c and d inclusive. The ranges and } can be omitted, with the same defaulting behavior as in the size constraint, except that the beginning of a range defaults to 1 instead of 0.
  • P#a-b} is a counting specifier. It matches a rectangle that contains between a and b possibly overlapping matches of P, inclusive. The } is optional, a defaults to 1 and b to infinity.
  • PoS, where S is a (greedily parsed) string of the characters 01234567, optionally terminated with }, is an orientation modifier. Each of the characters stands for a rotation (0123) or reflection (4567), which is applied to the expression P. The transformed expressions are combined with Boolean disjunctions. The characters OXNTKHADC encode convenient classes of orientations. The character F fixes an expression, so that its orientation cannot be modified.
  • <P> is a context bracket. It matches any rectangle r that's contained in a larger rectangle that matches P. The rectangle r can be matched by a digit in P; this is 0 by default, and increases with the nesting depth (0 matches the anchor of the innermost brackets, 1 the next one etc).

You can add the prefix ^ to any infix operator or chain of postfix operators to raise its precedence higher than all other operators, or v to lower it; for example, \a\b^|\cv+ is parsed as (\a(\b|\c))+. If i is any infix operator apart from concatenation or /, and p a chain of postfix operators, then PipQ is a shorthand for (PiQ)p; for example, \a|+\b is equivalent to (\a|\b)+. Double quotes "" are used for swapping the \-escaped status of all characters inside them, except for the characters "\/. They allow text grids like "this/is a/grid". Non-literal syntax elements, like nonterminals, operations and parentheses, can be used in quoted expressions by escaping them. Note that quotes define syntax elements, so "ab/cd"+/E will be parsed as (\a\b/\c\d)+/E. All open parentheses, brackets and quotes are implicitly closed at the end of a line. Lines beginning with | are comments, and the parser ignores them.

Notes

Grime does allow paradoxical definitions of the form A=A!; these will result in no match. In general, the matching engine can resolve any grammar where no match of a single nonterminal on a single rectangle depends on itself (although for complex grammars, this can take a while). A nontrivial example of a resolvable grammar is

A=E?\a
B=[ab]+&A!
E=A|B
a`A

Given a one-dimensional string of as and bs, this grammar will match every substring that ends in a. Note how the nonterminal A is self-referential in more than one way (via E directly, and via E through B), but since the E in A must be a proper substring, the system can always be resolved.

Grime supports non-rectangular inputs; the rows are simply aligned to the left, and the gaps can be matched using b.

grime's People

Contributors

iatorm avatar ilkka-torma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

grime's Issues

Can no longer compile

Here is what I'm seeing when trying to compile.

Newer source files detected, compiling.
[1 of 5] Compiling PrattParser      ( PrattParser.hs, PrattParser.o )
[2 of 5] Compiling Expression       ( Expression.hs, Expression.o )
[3 of 5] Compiling Matcher          ( Matcher.hs, Matcher.o )
[4 of 5] Compiling Parser           ( Parser.hs, Parser.o )

Parser.hs:251:56: error:
    Data constructor not in scope: HPlus :: Expr -> Expr

Parser.hs:252:65: error:
    Data constructor not in scope: VPlus :: Expr -> Expr
./rungrime: line 13: ./grime: No such file or directory

This is with the latest haskell https://haskell.org/platform/download/8.0.2/haskell-platform-8.0.2-unknown-posix--full-x86_64.tar.gz on Fedora 24

Context brackets ignore borders

The b option does not affect context brackets, which only search inside the original input grid. This should probably be changed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.