GithubHelp home page GithubHelp logo

Comments (13)

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

I think dab9f14 correctly implements submatching now. It also gave me an excuse to separate code generation and DFA generation, which is nice.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

We used to break with «a*» but I fixed that. But we still struggle with the nth derivative «a*>* growing in size proportional to 2^n, which shouldn't happen.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

The almost-last change was to rewrite a + [blah]a → a (or generally any a + b where (remove-tags a) = (remove-tags b)) to prevent the latter RE from growing in size pointlessly, but we have a problem where we optimise too eagerly now!

CL-USER> (one-more-re-nightmare:all-string-matches "«a»|«a»" "aaa")
("a" "a" "a")
(#("a") #("a") #("a"))

While a nice gesture to the programmer to save them some consing from redundant registers, they were probably expecting to have both registers. The parser already maintains a count of registers, which we could use to specify the layout of a correct register vector.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

At this rate, we probably should encapsulate regular expressions in an object with the information we want, as well as an optimised RE. This would also solve the problem where the group constructor requires an index, as we can't assign indexes automatically. But then that also suggests we should have a second representation which is more literal; we could throw in a S-expression input syntax like cl-ppcre does.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

Normal matching is broken too! Try to scan for (ab)* on ababaand it will fail, despite that it should match the abab prefix. Baumann also provided a better way to implement grep and what you want instead of backtracking in the paper, so I will go to implement that.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

I am slowly getting close to having useful machines. The current idea is to use a alpha wrapper expression to record history (recall we don't backtrack with a DFA), and then slam that into a grep machine which clones it for each character. We use tags to record the start and end of a match. After some fiddling, we get a sufficiently small DFA with few statements and assignments:

A DFA for the regular expression abcd

Some assignments don't look right, but I'm happy to get something this neat at this point.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

More impressive is the DFA for ab* of course:

The DFA for ab*

If one traces out abc on this machine, one will see the final state failed but remembered the last winning end position.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

As of 111eb8f the assignments appear correct:

Another DFA graph of abcd

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

The assignments for ab* required more shaking out, but I think we got there in the end?

The DFA for ab*

Perhaps I need a real Graphviz interface somewhere rather than using one-more-re-nightmare::print-dfa and making up a digraph file and fixing up escaping...

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

I got an interpreter working properly for «ab»* which I regard as a big win. So all that is left is to write a code generator.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

The DFA for a grep for ca«ab»* looks about right. I decided to make graphs using cl-dot which eliminated some hair-pulling to get nice-looking output.

The DFA for this regular expression.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

The new compiler seems to work, I just have to make it work with the protocol functions.

It would also be useful to generate type-splitted scanners at runtime, as cl-ppcre does. Then we could also lint the RE, detecting pointless expressions and unmatchable subgroups and that sort of thing.

from one-more-re-nightmare.

no-defun-allowed avatar no-defun-allowed commented on May 29, 2024

All done methinks. As of 8a1b5d2 the compile time is bearable. Might as well merge the new compiler now.

from one-more-re-nightmare.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.