GithubHelp home page GithubHelp logo

akiarie / thompson-regex Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 0.0 66 KB

A simple regex-to-golang compiler based on Thompson's construction algorithm.

License: MIT License

Go 100.00%
compilers golang thompson-construction

thompson-regex's Introduction

A Thompson-style Regex-to-Golang Compiler.

This repo contains a command-line app that takes in a simple regular expression (letters and digits with no spaces and the operators "|", "*" and "+" together with parenthesization) and outputs the source to a Go program which parses inputs for matches to the same expression.

It is intended for educational purposes and not for use in any production system.

Usage.

We will use the example from Thompson's paper for demonstrative purposes. Executing

./thompson-regex 'a(b|c)*d'

should produce output like the below (after running go fmt):

package main

import (
	"fmt"
	"log"
	"os"
)

/* trimmed some stuff up here */

func main() {
	if len(os.Args) != 2 {
		log.Fatalln("must supply input string")
	}
	input := []rune(os.Args[1])

	expmatcher := concat{
		concat{
			char('a'),
			closure{
				or{
					char('b'),
					char('c'),
				},
				0,
			},
		},
		char('d'),
	}

	matches := []string{}
	for i := 0; i < len(input); {
		if ok, n := expmatcher.match(input[i:]); ok {
			matches = append(matches, string(input[i:i+n]))
			i += n
			continue
		}
		i++
	}

	fmt.Printf("%q\n", matches)
}

Additional output languages can be added here. For example,

./thompson-regex 'a(b|c)*d' -l python3
import sys

# trimmed a ton of stuff from here

if len(sys.argv) != 2:
    print("must supply input string")
    sys.exit()
inputstr = sys.argv[1]

exprmatcher = ((Char('a') + (Char('b') | Char('c')) ** 0) + Char('d'))

matches = []

i = 0
while i < len(inputstr):
    ismatch, n = exprmatcher.match(inputstr[i:])
    if ismatch:
        matches += [inputstr[i:i+n]]
        i += n
    else:
        i += 1

print(matches)

Purpose.

Ken Thompson's famous paper on implementing regular expressions is surprisingly simple and yet extremely sophisticated at the same time. The "heart" of the method he proposed was the use of a stack in order to compile the expression to code that parses it, which in this case was IBM 7094 object code.

I find this idea beautiful, and wanted a chance to play around with it for myself, as well as demonstrate to anyone interested the way the algorithm works. Since the aim is demonstration rather than a concrete (or usable) implementation, it makes sense to compile to a high-level language, which is why I have started by targeting Go.

Note on terminology.

I have used the term assembler to refer to the portion of the code that produces Go source (and hopefully other sources in future), which may be slightly confusing given its usual association with the program that produces machine code out of assembly langauge. The aim is to draw an analogy: Thompson's goal was efficiency, so his implementation produced object code; mine is illustration and education, so it makes sense to target a high level language with the human mind being the real intended machine.

Contributing.

Feel free to create issues to share feedback or make suggestions. The way the codebase is designed it is quite easy to add other output formats, provided they are text-based: see here.

thompson-regex's People

Contributors

akiarie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.