GithubHelp home page GithubHelp logo

zaidharoon001 / multi-lexgen Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.17 MB

Multi LexGen is an easily integrateable lexer generator that works with multiple languages

License: MIT License

Haskell 33.06% Python 16.68% F# 13.77% Rust 25.65% TypeScript 9.67% JavaScript 1.18%

multi-lexgen's Introduction

Multi-LexGen

Multi LexGen is an easily integrateable lexer generator that works with multiple languages, currently officially supported languages include Python, Rust, and FSharp(and consequently the dotnet). It has a very intuitive recursive interface like a LL(*) recusrive descent parser generator but it leverages that to create a lexer instead, every rule in a lexer is specified with a regular expression-like interface that can reference any rule including itself. Following is an example of how you'd detect numbers in this Multi-LexGen.

digit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
num = digit+
decimal = num ('.' num)?

Now, naturally a critical question for this kind of parser would be how does it handle giving it's tokens names. This is exactly where Data extrators come in, to mark every decimal you can simply say

NUM = decimal

There can be multiple of these extractors and they are all applied one after another in the given order on the input, and the first one that matches on that input will be returned. Here's how the multiple of the will be handled by the lexer generator.

digit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
num = digit+
decimal = num ('.' num)?

NUM = decimal
PLUS = '+'
MINUS = '-'
MUL = '*'
DIV = '/'

This lexer is able to parse any input of following kind.

1+2-31.3/1*4

and the output will obviously be

[NUM:1]
[PLUS:+]
[NUM:2]
[MINUS:-]
[NUM:31.3]
[DIV:/]
[NUM:1]
[MUL:*]
[NUM:4]

Even though this can be considered a sufficient solution, there's also another thing we need to discuss, Data Modifiers. Here's an example of how data modifiers can be used to remove whitespace.

digit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
num = digit+
decimal = num ('.' num)?

NUM = decimal
WHITESPACE = '\r' | '\n' | '\t' | ' '
PLUS = '+'
MINUS = '-'
MUL = '*'
DIV = '/'

WHITESPACE ?= _

Now the output will remain the same even if the input changes to

1+2 - 31.3 / 1 * 4

You can also add conditionals here, consider the following

digit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
num = digit+
decimal = num ('.' num)*

NUM = decimal
WHITESPACE = ' '

WHITESPACE ?= _
NUM ?= contains '.' 1 ? FLOAT
NUM ?= contains '.' 3 ? IPV4
NUM ?= contains '.' 0 ? INT
NUM ?= !'Unsupported number of periods found'

These conditions applied in the given order for a specific token but no order is guranteed for different input tokens. The above, given this input

1 1.1.1.1 1.1

will output

[INT:1]
[IPV4:1.1.1.1]
[FLOAT:1.1]

but if the given output is

1.1.1

then it would raise an error*(Error is very much language dependent, in Fsharp or Rust for example it would return an error while an exception would be raised when using Python) because of the last modification like such

Unsupported number of periods found

The functions such as contains come from a specified helper module from the language being used, the command line tool takes commands with the following syntax, there exists a helper module containing few functions for every language runtime

[tool-name] [output-language] [source-file] [output-file-path] [helper-module]

functions in these modules should take a token followed by whatever arguments it deems nessacary and always returns a boolean so that a decision on whether it's matched or not can be made. More information is available about integration in the integration section.

Integration

Comming soon...

multi-lexgen's People

Contributors

zaidharoon001 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.