Comments (13)
I think dab9f14 correctly implements submatching now. It also gave me an excuse to separate code generation and DFA generation, which is nice.
from one-more-re-nightmare.
We used to break with «a*»
but I fixed that. But we still struggle with the nth derivative «a*>*
growing in size proportional to 2^n, which shouldn't happen.
from one-more-re-nightmare.
The almost-last change was to rewrite a + [blah]a → a (or generally any a + b where (remove-tags a)
= (remove-tags b)
) to prevent the latter RE from growing in size pointlessly, but we have a problem where we optimise too eagerly now!
CL-USER> (one-more-re-nightmare:all-string-matches "«a»|«a»" "aaa")
("a" "a" "a")
(#("a") #("a") #("a"))
While a nice gesture to the programmer to save them some consing from redundant registers, they were probably expecting to have both registers. The parser already maintains a count of registers, which we could use to specify the layout of a correct register vector.
from one-more-re-nightmare.
At this rate, we probably should encapsulate regular expressions in an object with the information we want, as well as an optimised RE. This would also solve the problem where the group
constructor requires an index, as we can't assign indexes automatically. But then that also suggests we should have a second representation which is more literal; we could throw in a S-expression input syntax like cl-ppcre
does.
from one-more-re-nightmare.
Normal matching is broken too! Try to scan for (ab)*
on ababa
and it will fail, despite that it should match the abab
prefix. Baumann also provided a better way to implement grep and what you want instead of backtracking in the paper, so I will go to implement that.
from one-more-re-nightmare.
I am slowly getting close to having useful machines. The current idea is to use a alpha
wrapper expression to record history (recall we don't backtrack with a DFA), and then slam that into a grep
machine which clones it for each character. We use tags to record the start and end of a match. After some fiddling, we get a sufficiently small DFA with few statements and assignments:
Some assignments don't look right, but I'm happy to get something this neat at this point.
from one-more-re-nightmare.
More impressive is the DFA for ab*
of course:
If one traces out abc
on this machine, one will see the final state failed but remembered the last winning end position.
from one-more-re-nightmare.
As of 111eb8f the assignments appear correct:
from one-more-re-nightmare.
The assignments for ab*
required more shaking out, but I think we got there in the end?
Perhaps I need a real Graphviz interface somewhere rather than using one-more-re-nightmare::print-dfa
and making up a digraph file and fixing up escaping...
from one-more-re-nightmare.
I got an interpreter working properly for «ab»*
which I regard as a big win. So all that is left is to write a code generator.
from one-more-re-nightmare.
The DFA for a grep for ca«ab»*
looks about right. I decided to make graphs using cl-dot which eliminated some hair-pulling to get nice-looking output.
from one-more-re-nightmare.
The new compiler seems to work, I just have to make it work with the protocol functions.
It would also be useful to generate type-splitted scanners at runtime, as cl-ppcre does. Then we could also lint the RE, detecting pointless expressions and unmatchable subgroups and that sort of thing.
from one-more-re-nightmare.
All done methinks. As of 8a1b5d2 the compile time is bearable. Might as well merge the new compiler now.
from one-more-re-nightmare.
Related Issues (15)
- Customise prefix code generation HOT 2
- Compiler objects HOT 1
- Tiered compilation
- Lint regular expressions HOT 1
- CL-PPCRE compatible functions HOT 1
- Runtime feedback for SIMD and repetition HOT 1
- Include in quicklisp HOT 4
- Match start and end of string HOT 5
- Some systems failed to build for Quicklisp dist HOT 1
- Character classes HOT 4
- Shorter package nickname?
- Equivalent of ? and {n,m} for POSIX ERE feature parity HOT 2
- Matching problems with `[0-9][0-9]` HOT 1
- Make character ranges inclusive HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from one-more-re-nightmare.