Comments (20)
You should use the latest master branch for now, instead of the latest pypi release. I plan to make a release soon, once I see all the recent breaking changes are stable.
from lark.
Regarding line and column for the default parser, I haven't implemented it yet, but I know it's a crucial feature. Feel free to open a new issue for it if you like, and I'll try to get to it soon.
from lark.
That regex doesn't seem to work with LALR or with Earley?
Regarding LALR: I can see you have different terminals matching the same string. LALR relies on a lexer, which creates tokens before they reach the parser, and so every terminal has to be chosen without awareness to the structure of the grammar.
I did write a slight improvement which makes the lexer a bit more contextually aware. You can try it with parser="lalr", lexer="contextual"
, but I doubt it will work for your case.
The error you got happens because the token COMMAND matches everything, and it happens to take priority over some of the other terminals.
If the contextual lexer doesn't solve it, you can solve this specific error by giving it priority zero, which will make it match last:
COMMAND.0: "whatever"
However, this issue repeats with a few other terminals too. You will probably have to disentangle them.
from lark.
Thank you for the quick response, Erez! I do appreciate that.
That particular error happens only when using LALR. I will try to implement your suggestions in short order. The though of LALR relying on token uniqueness for context did cross my mind, but I quickly dismissed it along with other 10 theories in my head at that moment :) I don't have much experience with parsing, but it's super interesting.
from lark.
Yes, it's super interesting, and also super confusing :)
Let me know how it works out.
from lark.
Will do. Quick question, if I'm not using LALR and given my grammar above, do you know why the parser is having trouble with:
DBA ALL = (oracle) ALL, !SU : ALL = (postgres) ALL, !SU
It looks like /[^,:\n]+/
for the COMMAND token consumes the space before the ':' and that causes an issue. Modifying the rule to/[^,:\n]+(?!:)/
(negative lookahead preventing the consumption of the space) does help, but grinds the parser to a halt on even a few rules.
from lark.
Well, it sounds like a bug. But why would consuming the space be an issue?
It's probably not the regexp that grinds it to a halt, but rather the parser. I have recently found a performance issue in the Earley parser when dealing with ambiguous grammars. I have written a fix for it, but I haven't merged it to master yet, because I want to run a few more tests on it.
In the meanwhile, check-out the branch earley_fix2
(https://github.com/erezsh/lark/tree/earley_fix2) , and see if it fixes your issue.
from lark.
In the meanwhile, check-out the branch earley_fix2
Awesome (🥇 ) Will try by tonight. Tomorrow morning (American morning :))
So you think %ignore /[\\\\]/
is the correct syntax to instruct the parser to ignore a backslash at the end of the line?
Also, if I say /^/
and /$
in a regex, does it mean beginning and end of the whole input string or the line being parsed?
from lark.
Whereas it took the current master version almost 2 minutes to parse 10 lines, your dev branch got a 300 line file parsed in less than 3 secs. I'm hoping not much else changed. I had to modify some code in my transformer, but it seems fine otherwise. Woot woot! Thanks again.
from lark.
Glad to hear it!
%ignore /[\\\\]/
can ignore any backslash, not just at the end of the line. By default, ^ and $ refer to the whole file. But if you use latest master, you can try:
%ignore /\\\\$/m
The 'm' flag tells the regexp to match $ to newlines. I also merged the Earley optimization to master, so you shouldn't have any problem with that.
But, I must say, it's a pretty strange language, if backslashes do nothing in it. Usually it has some structural role, such as continuing the line. In which case you will have to ignore the newline as well. Perhaps like this?
%ignore /\\\\$\n/m
I must say I never tried something like this, so let me know if it doesn't work, and I'll try to figure out what to do next.
from lark.
%ignore /[\\]/ can ignore any backslash, not just at the end of the line. By default, ^ and $ refer to the whole file. But if you use latest master, you can try:
%ignore /\\$/m
Excellent
The 'm' flag tells the regexp to match $ to newlines. I also merged the Earley optimization to master, so you shouldn't have any problem with that.
Cool
But, I must say, it's a pretty strange language, if backslashes do nothing in it. Usually it has some structural role, such as continuing the line. In which case you will have to ignore the newline as well. Perhaps like this?
%ignore /\\$\n/m
Indeed the backslash is a line continuation. I will try to modify and watch out for any problems.
I must say I never tried something like this, so let me know if it doesn't work, and I'll try to figure out what to do next.
Will do.
There's one small thing I discovered yesterday that I wanted to ask you about. Given this input:
DBA ALL = (oracle) ALL
the tree looks like this:
user_list
user DBA
host_list
host ALL
cmnd_spec
runas_list
user oracle
cmnd_alias_name ALL
After I apply my transformations, which are very simple, I get back a tuple that looks like this:
('user_spec', [['DBA'], ['ALL'], [['oracle'], 'ALL']])
So far so good.. If the source is two lines like so:
DBA ALL = (oracle) ALL
DBA ALL2 = (oracle2) ALL
three looks like this:
user_spec
user_list
user DBA
host_list
host ALL
cmnd_spec
runas_list
user oracle
cmnd_alias_name ALL
user_spec
user_list
user DBA
host_list
host ALL2
cmnd_spec
runas_list
user oracle2
cmnd_alias_name ALL
or any single rule, the output comes out as an array of two tuples:
[
('user_spec', [['DBA'], ['ALL'], [['oracle'], 'ALL']]),
('user_spec', [['DBA'], ['ALL2'], [['oracle2'], 'ALL']])
]
As a result, I have to first check if the result is a tuple or an array. If it's a tuple, I create a 1-element array from the tuple.
Is that expected that a single rule might return a tuple while two will return an array? Is it something about my grammar? This can be a tough question and since it's not critical, you can ignore it :)
from lark.
Well, I assume user_spec
always returns a tuple as it should. But you match it in ?sudo_item
, and the ?
sign literally means: If there is only one item, return it instead of appending it as a branch.
Does that answer your question?
from lark.
from lark.
With the default parser, is there a way to see which line of input caused the problem?
from lark.
I get the error below when trying to ignore the backslash with %ignore /[\\\\]$/m
lark.common.GrammarError: Rules aren't allowed inside tokens (m in __IGNORE_0)
from lark.
Last question: is there a built in function to read the grammar from a file?
from lark.
Any reason why the default open(filename).read() isn't good enough?
from lark.
No reason really. Just thought there might be a convenience function. All good. Thanks ago. I think we can close this ticket. Thanks again!
from lark.
Well, not sure if it makes any difference to convenience, but you can pass file objects to Lark, so something like Lark(open(filename))
will work just as well.
Thanks for the bug reports and suggestions!
from lark.
Yep, that's quite good enough.
from lark.
Related Issues (20)
- Making a comment by using regular expression HOT 5
- earley very, very slow HOT 24
- Cant read `meta` from Tree or Token? HOT 5
- How to define lark grammar for best parsing performance HOT 8
- Unable to parse Arabic text HOT 3
- Incorrect start_pos / end_pos in the tree HOT 8
- Add `outlines` in the list of projects using Lark HOT 2
- Lark.open_from_package() does not support namespace packages HOT 2
- Stand-alone program cannot be run HOT 4
- Issue of installing lark in Python HOT 1
- Pipe in terminal regex not working as expected HOT 1
- Transformer Not Applying Expected Transformations in Lark Parser HOT 3
- Deprecation Warning HOT 6
- accepts() vs choices() in InteractiveParser HOT 10
- No such file or directory: 'COMMON.lark' HOT 4
- Grammar Syntax For Unordered Groups HOT 1
- Is it possible to parse parts of the input? HOT 12
- Forgiving syntax HOT 3
- Post 1388 changes HOT 4
- Dynamic Earley: Incorrect value for SymbolNode.end
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.