GithubHelp home page GithubHelp logo

francisrstokes / super-expressive Goto Github PK

View Code? Open in Web Editor NEW
4.6K 4.6K 138.0 328 KB

๐Ÿฆœ Super Expressive is a zero-dependency JavaScript library for building regular expressions in (almost) natural language

License: MIT License

JavaScript 100.00%

super-expressive's Introduction

Hi there ๐Ÿ‘‹ I'm Francis Stokes

  • ๐Ÿ“ฝ I'm creating low level programming videos as Low Byte Productions on YouTube
  • ๐Ÿ‡ณ๐Ÿ‡ฑ I'm a firmware engineer living in the Netherlands
  • ๐Ÿค– I love working at the hardware/software interface
  • ๐ŸŽจ I like generative art and programatic animation
  • ๐Ÿฆœ You can find me on twitter

super-expressive's People

Contributors

0xflotus avatar bassim avatar cogentredtester avatar dependabot[bot] avatar francisrstokes avatar jcao219 avatar jimmyaffatigato avatar kiesun avatar nartc avatar pogromistdev avatar skratchdot avatar timothygillespie avatar w3bdesign avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

super-expressive's Issues

Support named capture groups and backreferences

First of all, cool concept! Something I would find useful is the ability to use named capture groups.

I think naming a group could be as simple as

.capture('my_name')

Backreferences could be one of the following:

.backreference('my_name')
.group('my_name')
.captureGroup('my_name')

Bonus points if you support numeric backreferences, but with naming them being so easy, I don't see too much use:

.backreference(2)

Is there a way of compiling regex at build time.

I'm one of those people who avoids regex at all times because of it's readability issues, but I love its conciseness.

From the readme is appears that this library is runtime only, but I was wondering if there could be a way of compiling into regex for production to combine the best of both worlds?

Stateful URL

Rewrite anythingButString in terms of a lookahead/consume any character

Right now anythingButString is implemented in a very non-ideal way (see #58). The plan is to replace the existing function, and potentially add one more.

anythingButString('aeiou') will produce output like:

// non-capturing group, containing a lookahead for exact string, then matching any characters repeatedly for inputString.length
/(?:(?!aeiou).{5})/

This implementation will only work predictably for ascii-type strings, because length actually counts UTF-16 codepoints. The same unicode characters can be encoded in multiple distinct ways due to the fact that UTF-16 is not normalised.

To provide an API that is also able to deal with unicode strings, something like anythingButStringUnicode(inputString, numCharactersToMatch) could be added. In this case, the user would be expected to provide the actual number of characters that should be matched after the lookahead. This is kind of fraught in itself due to normalisation, and the fact that whatever string you'd want to match in place may not match the number of code points anyway.

I imagine that this API would still cause confusion with users, both those looking explicitly to match unicode strings, and those who assume they should use this version of the function because why wouldn't you use unicode? In that case, it may be better to skip it altogether, and allow the user to use the group/assertAhead/anyChar/exactly APIs to build the equivalent manually. Though in that case, it still might be worth adding a anyDataUnit as a low-level API for unicode matching.

Prettier formatting

Hi. Thanks for this package. I really struggle with regex and I'll certainly be using this in the future.

I've only got as far as your example but I see a potential issue. You are using indentation as a guide, which is great but does not work well in a Prettier world, for example, your example is changed to have no nesting:

const SuperExpressive = require('super-expressive')

const myRegex = SuperExpressive()
  .startOfInput.optional.string('0x')
  .capture.exactly(4)
  .anyOf.range('A', 'F')
  .range('a', 'f')
  .range('0', '9')
  .end()
  .end()
  .endOfInput.toRegex()

I'm not quite sure what the answer is here, other than something like this maybe:

const SuperExpressive = require('super-expressive')

const myRegex = SuperExpressive()
  .startOfInput((se) =>
    se.optional
      .string('0x')
      .capture((se) =>
        se.exactly(4).anyOf.range('A', 'F').range('a', 'f').range('0', '9'),
      ),
  )
  .toRegex()

or

const se = require('super-expressive')

const myRegex = se()
  .startOfInput(
    se.optional
      .string('0x')
      .capture(
        se.exactly(4).anyOf.range('A', 'F').range('a', 'f').range('0', '9'),
      ),
  )
  .toRegex()

But either would be a major shake-up to your API ๐Ÿคท

Expand documentation with examples section

To aid in showing how Super Expressive can be used in a practical context, it would be nice add an examples folder, with various common regex use cases.

  • Using string.replace (#5)
  • Using string.match to capture information
    • Indexed groups
    • Named groups
  • Code organisation with subexpressions

Each example should be a separate .md document, with a title, short introduction, some commented example code, and related links sections (to things like MDN, or regex101). The examples folder should contain an index.md that acts as the contents page - linking to each example with a one sentence description.

Include React example

It is considered best practice to use import instead of require in React and ES6, which is why it could be nice to include an example with React and ES6 import.

Mixing import and require is a bad practice, which is why I thought it would be good to include a simple example using Create React App, which could be handy for React beginners (or people experienced with React).

See: #18

Does anythingButString work as intended?

.anythingButString(str) is documented like this:

Matches any string the same length as str, except the characters sequentially defined in str.

SuperExpressive() .anythingButString('aeiou') .toRegex(); // -> /(?:[^a][^e][^i][^o][^u])/

But what the case it has covered? It's not the case of matching any string not equal to str: it would reject any word having one (or more) of the listed charachers at the corresponding position (as said here). Is this how it was planned?

lookBehind API missing from playground

The docs mention assertBehind and assertNotBenind but they don't work in the playground.

Also, JS didn't used to support lookBehind, only lookAhead - might be worth documenting when this changed?

How to capture from the previously found non-capturing group using Super Expressive?

Test string:

if ( typeof(NewContent) != 'undefined' ) {NewContent('005056B6F1202A0A45AC13010029C376','2',{'title':'Audio Transcript','y':0,'x':0,'height':450,'width':400,'modal':false,'fixedcenter':false});}">Audio Transcript</a></p>

Trying to capture the value of NewContent function in single quotes. There can be one or more instances, thus need to use named capture groups.

SuperExpressive methods used:

SuperExpressive()
    .optional.group
        .string("{NewContent('")
    .end()
    .capture
        .word
    .end()
    .optional.group
        .string("'")
    .end()
    .toRegex()

Current output:

/(?:\{NewContent\(')?(\w)(?:')?/

Expected output:

(?:\{NewContent\(')(\w+)(?:')

Could you please suggest which method I should use to start the capture from the previously found non-capturing group?

Better unicode support

Include primitives to support matching:

  • Unicode any char (\X)
  • Single data unit (\C)
  • Unicode newlines (\R)
  • Unicode properties and categories (\pX / \p{..,})
  • Hex characters (\xXX / \xXXXX)

some question about `anyof` operator.

i want a patter /xxx|yyy/, if i use anyof operator i get /(?:xxx|yyy)/, this is non-capture group, so i can't get real groups:

SuperExpressive()
.anyOf
    .string('xxx')
    .string('yyy')
.end()
.toRegex()

get:

> /(?:xxx|yyy)/.exec('xxxx234xxxsdf')
> ["xxx", index: 0, input: "xxxx234xxxsdf", groups: undefined]

> /(xxx|yyy)/.exec('xxxx234xxxsdf')
> (2)ย ["xxx", "xxx", index: 0, input: "xxxx234xxxsdf", groups: undefined]

anyway, i cant get capture-group by adding capture operator prefixed:

SuperExpressive()
.capture.anyOf
    .string('xxx')
    .string('yyy')
.end()
.end()
.toRegex()

get

> /((?:xxx|yyy))/.exec('xxxx234xxxsdf')
> (2)ย ["xxx", "xxx", index: 0, input: "xxxx234xxxsdf", groups: undefined]

is there any other way to get the simple one ๏ผŸ

Improve the github workflow

  • Introduce a workflow for running the tests and reporting the coverage
  • Create a PR template with a checklist
    • Is the change purely for documentation purposes?
      • Please include a summary of changes and an explanation
    • Does the change introduce new functionality?
      • Does the code style reasonably match the existing code?
      • Are the changes tested (using the existing format, as far as is possible?)
      • Are the changes documented in the readme with a suitable example?
      • Is the table of contents updated?
      • Is the index.d.ts file updated, using the same description as the readme?
    • This PR introduces some other kind of change
      • Please explain the change below

Proposal: include support for plugins

It would be nice to support user-definable plugins, i.e. "Phone Number"

Something like:

const myRegex = SuperExpressive({ plugins: [ phoneNumber ] })
  .startOfInput
  .optional.string('0x')
  .capture
    .exactly(4).group
      .plugin.phoneNumber()
      .newline
    .end()
  .end()
  .endOfInput
  .toRegex();

Compare with similar libraries

VerbalExpressions is a library that's been around for a long time, with a lot of stars and a lot of ports, but there are reasons to choose SuperExpressive instead:

  • VerbalExpressions are incompatible with Promises and Async/Await, because they contain a non-compliant .then() method. If you return a verbal expression from a promise, it's not going to behave as expected.
  • No support for various regular expression features:
    • non-capturing groups
    • named groups
    • backreferences
    • lookaheads
    • etc
  • No support for sub expressions, so creating reusable parts is not possible
  • VerbalExpressions are mutable, so creating a new expression based on an existing one is not possible (see example below)
expr.maybe('s');
const expr2 = expr.maybe('t');

// expr is now: expr.maybe('s').maybe('t')

Numeric Range

Hello, I was wondering if there is any capability to generate numeric ranges within your library. Specifically, if we consider a range with a start of 1 and an end of 17, the desired output would be something like ([1-9]|1[0-7]).

I'm curious if such functionality is already supported in your library, or if I might have overlooked it. I find everything about super-expressive to be great, and a feature like this would be incredibly useful.

If implementing numeric ranges proves to be challenging, could you consider adding the ability to use string subexpressions? This way, when using packages like to-regex-range for generating numeric ranges one could seamlessly insert their output just like in currently existing .subexpression but with string argument type instead of SuperExpressive instance.

For reference:
https://3widgets.com/
https://www.npmjs.com/package/to-regex-range

include replace examples

It would be great to demonstrate how to use it in a search and replace scenario, like in:
str.replace(/(Cap1.+)_(Cap2.+)/, "Capture group 1 is $1, Capture group 2 is $2")

Can't use multiple endOfInput

This is probably a niche case, but it's still something that I need and I can't express with this library (or can't figure out how to).

My case is that I have three groups that are all optional, but it should match at least one of those.

The regex I need is:
/^(?!$)(?:\d+g)?(?:\s*\d+s)?(?:\s*\d+c)?$/i

The playground code that I currently have for this is:

SuperExpressive()
    .caseInsensitive
    .startOfInput
    .assertNotAhead
        .endOfInput
        // I also tried this
        // .subexpression(SuperExpressive().endOfInput, { ignoreStartAndEnd: false })
    .end()
    .optional.group
    .oneOrMore.digit
    .char('g')
    .end()
    .optional.group
        .zeroOrMore.whitespaceChar
        .oneOrMore.digit
        .char('s')
    .end()
    .optional.group
        .zeroOrMore.whitespaceChar
        .oneOrMore.digit
        .char('c')
    .end()
    .endOfInput
    .toRegex()

If there's another way to write this regex it's much appreciated. The (?!$) trick is the only answer I could find that works with my case of all optional groups.

incomplete *Lazy coverage

While you have covered all variants of {,}, including * and +, the lazy alternatives are missing for exactly and atLeast.

Porting for Ruby

Hi. Thank you for the wonderful library!

I usually write programs using Ruby.
So I made a port for Ruby so that I can use super-expressive in Ruby as well.
If you don't mind, I would like to publish a super-expressive for Ruby.
Is that okay?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.