GithubHelp home page GithubHelp logo

hedhyw / rex Goto Github PK

View Code? Open in Web Editor NEW
191.0 3.0 5.0 685 KB

Flexible regular expressions constructor for Golang.

License: MIT License

Makefile 0.78% Go 99.22%
builder-pattern constructor go golang regexp regexp-builder regular-expression regular-expressions dsl dsl-syntax

rex's Introduction

Rex

Version Go Report Card Coverage Status PkgGoDev

rex-gopher

This is a regular expressions builder for gophers!

Why?

It makes readability better and helps to construct regular expressions using human-friendly constructions. Also, it allows commenting and reusing blocks, which improves the quality of code. It provides a convenient way to use parameterized patterns. It is easy to implement custom patterns or use a combination of others.

It is just a builder, so it returns standart *regexp.Regexp.

The library supports groups, composits, classes, flags, repetitions and if you want you can even use raw regular expressions in any place. Also it contains a set of predefined helpers with patterns for number ranges, phones, emails, etc...

Let's see an example of validating or matching someid[#] using a verbose pattern:

re := rex.New(
    rex.Chars.Begin(), // `^`
    // ID should begin with lowercased character.
    rex.Chars.Lower().Repeat().OneOrMore(), // `[a-z]+`
    // ID should contain number inside brackets [#].
    rex.Group.NonCaptured( // (?:)
        rex.Chars.Single('['),                   // `[`
        rex.Chars.Digits().Repeat().OneOrMore(), // `[0-9]+`
        rex.Chars.Single(']'),                   // `]`
    ),
    rex.Chars.End(), // `$`
).MustCompile()

Yes, it requires more code, but it has its advantages.

More, but simpler code, fewer bugs.

You can still use original regular expressions as is in any place. Example of matching numbers between -111.99 and 1111.99 using a combination of patterns and raw regular expression:

re := rex.New(
    rex.Common.Raw(`^`),
    rex.Helper.NumberRange(-111, 1111),
    rex.Common.RawVerbose(`
        # RawVerbose is a synonym to Raw,
        # but ignores comments, spaces and new lines.
        \.        # Decimal delimter.  
        [0-9]{2}  # Only two digits.
        $         # The end.
    `),
).MustCompile()

// Produces:
// ^((?:\x2D(?:0|(?:[1-9])|(?:[1-9][0-9])|(?:10[0-9])|(?:11[0-1])))|(?:0|(?:[1-9])|(?:[1-9][0-9])|(?:[1-9][0-9][0-9])|(?:10[0-9][0-9])|(?:110[0-9])|(?:111[0-1])))\.[0-9]{2}$

The style you prefer is up to you.

Meme

Drake Hotline Bling meme

FAQ

  1. It is too verbose. Too much code.

    More, but simpler code, fewer bugs. Anyway, you can still use the raw regular expressions syntax in combination with helpers.

    rex.New(
        rex.Chars.Begin(),
        rex.Group.Define(
            // `Raw` can be placed anywhere in blocks.
            rex.Common.Raw(`[a-z]+\d+[A-Z]*`),
        ),
        rex.Chars.End(),
    )

    Or just raw regular expression with comments:

    rex.Common.RawVerbose(`
        ^                # Start of the line.
        [a-zA-Z0-9]+     # Local part.
        @                # delimeter.
        [a-zA-Z0-9\.]+   # Domain part.
        $                # End of the line.
    `)
  2. Should I know regular expressions?

    It is better to know them in order to use this library most effectively. But in any case, it is not strictly necessary.

  3. Is it language-dependent? Is it transferable to other languages?

    We can use this library only in Go. If you want to use any parts in other places, then just call rex.New(...).String() and copy-paste generated regular expression.

  4. What about my favourite DSL?

    Every IDE has convenient auto-completion for languages. So all helpers of this library are easy to use out of the box. Also, it is easier to create custom parameterized helpers.

  5. Is it stable?

    It is 0.X.Y version, but there are some backward compatibility guarantees:

    • rex.Chars helpers can change output to an alternative synonym.
    • rex.Common helpers can be deprecated, but not removed.
    • rex.Group some methods can be deprecated.
    • rex.Helper can be changed with breaking changes due to specification complexities.
    • The test coverage should be ~100% without covering test helpers and cmd.
    • Any breaking change will be prevented as much as possible.

    All of the above may not be respected when upgrading the major version.

  6. I have another question. I found an issue. I have a feature request. I want to contribute.

    Please, create an issue.

License

rex's People

Contributors

dependabot[bot] avatar gaissmai avatar hedhyw avatar ryzheboka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rex's Issues

Support edge markers

symbol description Suggestion
\A at beginning of text Chars.BeginOfText()
\b at ASCII word boundary (\w on one side and \W, \A, or \z on the other) Chars.ASCIIWordBoundary()
\B not at ASCII word boundary Chars.NotASCIIWordBoundary()
\z at end of text Chars.EndOfText()

feat: support flags

Definition

(?flags)       set flags within current group; non-capturing
(?flags:re)    set flags during re; non-capturing

Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:

i              case-insensitive (default false)
m              multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s              let . match \n (default false)
U              ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)

Example methods

rex.Group.Flags(...dialect.GroupFlag) GroupToken // `(?flags)`
type GroupToken struct {
   prefix   string
   tokens ...dialect.Token
}

GroupToken.WithFlags(...base.GroupFlag) dialect.Token // prefix -> `?flags:`

Flag structure

// ab-cd
// means to set `a` and `b`, reset `c` and `d`.
//
// -abc
// reset `a`, `b` and `c`
//
// abc
// set `a`, `b` and `c`

type base.GroupFlag struct {
    val rune      // i, m, s or U
    reset bool
}

rex.Group.FlagCaseInsensitive()
rex.Group.FlagMultiLineMode()
rex.Group.FlagAnyMatchEndLine()
rex.Group.FlagUngreedy()
// WithReset() can be used for any flag.
rex.Group.FlagUngreedy().WithReset()

Support grouping

Definition

(re)           numbered capturing group (submatch)
(?P<name>re)   named & numbered capturing group (submatch)
(?:re)         non-capturing group

Example methods

rex.Group.Define(...dialect.Token) GroupToken // `(tokens)`
rex.Group.Composite(...Token) GroupToken // `(a|b|c)`
type GroupToken struct {
   prefix   string
   tokens ...dialect.Token
}

WithName(name string) dialect.Token // prefix -> `?P<name>`
NonCaptured() dialect.Token  // prefix -> `?:`

Support unicode character class

\p{Greek}

Helper:
Chars.Unicode(*unicode.RangeTable)

Find it in unicode.Categories or unicode.Scripts and specify the name as \p{Name}.
If RangeTable is not found, write nothing.

Docs:

Unicode character classes are those in unicode.Categories and unicode.Scripts.

Support asscii classes

[[:alnum:]]    alphanumeric (== [0-9A-Za-z])
[[:alpha:]]    alphabetic (== [A-Za-z])
[[:ascii:]]    ASCII (== [\x00-\x7F])
[[:blank:]]    blank (== [\t ])
[[:cntrl:]]    control (== [\x00-\x1F\x7F])
[[:digit:]]    digits (== [0-9])
[[:graph:]]    graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]]    lower case (== [a-z])
[[:print:]]    printable (== [ -~] == [ [:graph:]])
[[:punct:]]    punctuation (== [!-/:-@[-`{-~])
[[:space:]]    whitespace (== [\t\n\v\f\r ])
[[:upper:]]    upper case (== [A-Z])
[[:word:]]     word characters (== [0-9A-Za-z_])
[[:xdigit:]]   hex digit (== [0-9A-Fa-f])

They can be defined as [:lower:] in TokenClass.

Restrict usage of characters in `Class`, `NotClass`

Now, we can pass anything to rex.Common.Class, not rex.Common.NotClass. We need a way to restrict it.

Idea

Create an abstraction:

type ClassToken interface {
   Token
   Unwrap() Token
   // ....
}

Make Raw as a ClassToken?

Support custom Chars.Range

Preparation:

  1. Make all chars as class tokens.
  2. Add WithoutBrackets to ClassToken
  3. Remove brackets for child tokens in ClassToken which implements WithoutBrackets()

Support:

  1. [A-Z], [a-z], ...
  2. NotClass

Bonus:

  1. Another name for Class?

Add Group.NonCaptured

It will be an alias to rex.Group.Define(...dialect.Token).NonCaptured()

Group.NonCaptured(...dialect.Token)

Helper for numbers ranges

From: https://www.reddit.com/r/golang/comments/vfqp2y/comment/icykgwa/?utm_source=share&utm_medium=web2x&context=3

rex.Helper.NumberRange(from, to int)

Split by digits, and create a composite of number groups.

  • without leading zeros or with them?;
  • consider negative numbers.

More details:

Group.Define( // 250-255.

Example generator: https://www.npmjs.com/package/to-regex-range
Algorithm: https://stackoverflow.com/a/33539325

50 - 1230: (5[0-9]|[6-9][0-9]|[1-9][0-9]{2}|1[01][0-9]{2}|12[0-2][0-9]|1230)

Trace method for debug purposes

It should print something like:

1: Chars.Begin: ^
2: Chars.Class: [A-Z]
    3: Chars.Range: A-Z
4: Chars.End: $

Add Name and Children to Token? type Traceable interface{}.
Hide it by default, and check that something implements it in runtime.

Support composite

composite description Suggestion
x|y x or y (prefer x) Common.Composite(Common.Single('x'), Common.Single('y'))

It should accept multiple tokens and group them with |.

Group.Composite(Chars.Digits().OneOrMore(), Common.Text("world"))
// Output: ([0-9]+|world)

Relates to #8

Create a GroupToken, that will add ( and ), and an optional prefix: (<prefix><tokens>). It is similar to ClassToken.

Add it to readme and add unit tests.

examples_test.go doesn't show up at pkg.go.dev

Hi, maybe you should put examples_test.go one directory deeper:

├── go.mod
└── pkg
     ├── dialect
     ├── examples_test.go
     └── rex

to

├── go.mod
└── pkg
    ├── dialect
    └── rex
        └── examples_test.go

Thanks for rex!

Make repeatable as a Token

  1. Idea

    type Repeatable struct {
       Token dialect.Token
       Suffix string
    }
    
    newRepeatable(token dialect.Token) // Default Suffix is `""`.
    
    OneOrMore() // And others should return `dialect.Token`.

    WriteTo -> Token than Suffix.

  2. ClassRepetable

type ClassRepetable struct {
   dialect.ClassToken
   Repeatable
}
  1. Return ClassRepetable for all Chars.* and for Common.Raw.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.