Hello Erez, I'm having trouble coming up with the correct rule when

Question (possibly bug?) about lark HOT 20 CLOSED

gerases commented on July 18, 2024

Question (possibly bug?)

from lark.

Comments (20)

erezsh commented on July 18, 2024 1

You should use the latest master branch for now, instead of the latest pypi release. I plan to make a release soon, once I see all the recent breaking changes are stable.

from lark.

erezsh commented on July 18, 2024 1

Regarding line and column for the default parser, I haven't implemented it yet, but I know it's a crucial feature. Feel free to open a new issue for it if you like, and I'll try to get to it soon.

from lark.

erezsh commented on July 18, 2024

That regex doesn't seem to work with LALR or with Earley?

Regarding LALR: I can see you have different terminals matching the same string. LALR relies on a lexer, which creates tokens before they reach the parser, and so every terminal has to be chosen without awareness to the structure of the grammar.

I did write a slight improvement which makes the lexer a bit more contextually aware. You can try it with parser="lalr", lexer="contextual", but I doubt it will work for your case.

The error you got happens because the token COMMAND matches everything, and it happens to take priority over some of the other terminals.

If the contextual lexer doesn't solve it, you can solve this specific error by giving it priority zero, which will make it match last:

COMMAND.0: "whatever"

However, this issue repeats with a few other terminals too. You will probably have to disentangle them.

from lark.

gerases commented on July 18, 2024

Thank you for the quick response, Erez! I do appreciate that.

That particular error happens only when using LALR. I will try to implement your suggestions in short order. The though of LALR relying on token uniqueness for context did cross my mind, but I quickly dismissed it along with other 10 theories in my head at that moment :) I don't have much experience with parsing, but it's super interesting.

from lark.

erezsh commented on July 18, 2024

Yes, it's super interesting, and also super confusing :)

Let me know how it works out.

from lark.

gerases commented on July 18, 2024

Will do. Quick question, if I'm not using LALR and given my grammar above, do you know why the parser is having trouble with:

DBA ALL = (oracle) ALL, !SU : ALL = (postgres) ALL, !SU

It looks like /[^,:\n]+/ for the COMMAND token consumes the space before the ':' and that causes an issue. Modifying the rule to/[^,:\n]+(?!:)/ (negative lookahead preventing the consumption of the space) does help, but grinds the parser to a halt on even a few rules.

from lark.

erezsh commented on July 18, 2024

Well, it sounds like a bug. But why would consuming the space be an issue?

It's probably not the regexp that grinds it to a halt, but rather the parser. I have recently found a performance issue in the Earley parser when dealing with ambiguous grammars. I have written a fix for it, but I haven't merged it to master yet, because I want to run a few more tests on it.

In the meanwhile, check-out the branch earley_fix2 (https://github.com/erezsh/lark/tree/earley_fix2) , and see if it fixes your issue.

from lark.

gerases commented on July 18, 2024

In the meanwhile, check-out the branch earley_fix2

Awesome (🥇 ) Will try by tonight. Tomorrow morning (American morning :))

So you think %ignore /[\\\\]/ is the correct syntax to instruct the parser to ignore a backslash at the end of the line?

Also, if I say /^/ and /$ in a regex, does it mean beginning and end of the whole input string or the line being parsed?

from lark.

gerases commented on July 18, 2024

Whereas it took the current master version almost 2 minutes to parse 10 lines, your dev branch got a 300 line file parsed in less than 3 secs. I'm hoping not much else changed. I had to modify some code in my transformer, but it seems fine otherwise. Woot woot! Thanks again.

from lark.

erezsh commented on July 18, 2024

Glad to hear it!

%ignore /[\\\\]/ can ignore any backslash, not just at the end of the line. By default, ^ and $ refer to the whole file. But if you use latest master, you can try:

%ignore /\\\\$/m

The 'm' flag tells the regexp to match $ to newlines. I also merged the Earley optimization to master, so you shouldn't have any problem with that.

But, I must say, it's a pretty strange language, if backslashes do nothing in it. Usually it has some structural role, such as continuing the line. In which case you will have to ignore the newline as well. Perhaps like this?

%ignore /\\\\$\n/m

I must say I never tried something like this, so let me know if it doesn't work, and I'll try to figure out what to do next.

from lark.

gerases commented on July 18, 2024

%ignore /[\\]/ can ignore any backslash, not just at the end of the line. By default, ^ and $ refer to the whole file. But if you use latest master, you can try:

%ignore /\\$/m

Excellent

The 'm' flag tells the regexp to match $ to newlines. I also merged the Earley optimization to master, so you shouldn't have any problem with that.

Cool

But, I must say, it's a pretty strange language, if backslashes do nothing in it. Usually it has some structural role, such as continuing the line. In which case you will have to ignore the newline as well. Perhaps like this?

%ignore /\\$\n/m

Indeed the backslash is a line continuation. I will try to modify and watch out for any problems.

I must say I never tried something like this, so let me know if it doesn't work, and I'll try to figure out what to do next.

Will do.

There's one small thing I discovered yesterday that I wanted to ask you about. Given this input:

DBA ALL = (oracle) ALL

the tree looks like this:

  user_list
    user        DBA
  host_list
    host        ALL
  cmnd_spec
    runas_list
      user      oracle
    cmnd_alias_name     ALL

After I apply my transformations, which are very simple, I get back a tuple that looks like this:

('user_spec', [['DBA'], ['ALL'], [['oracle'], 'ALL']])

So far so good.. If the source is two lines like so:

DBA ALL = (oracle) ALL
DBA ALL2 = (oracle2) ALL

three looks like this:

user_spec
    user_list
      user      DBA
    host_list
      host      ALL
    cmnd_spec
      runas_list
        user    oracle
      cmnd_alias_name   ALL
  user_spec
    user_list
      user      DBA
    host_list
      host      ALL2
    cmnd_spec
      runas_list
        user    oracle2
      cmnd_alias_name   ALL

or any single rule, the output comes out as an array of two tuples:

[
   ('user_spec', [['DBA'], ['ALL'], [['oracle'], 'ALL']]),
    ('user_spec', [['DBA'], ['ALL2'], [['oracle2'], 'ALL']])
]

As a result, I have to first check if the result is a tuple or an array. If it's a tuple, I create a 1-element array from the tuple.

Is that expected that a single rule might return a tuple while two will return an array? Is it something about my grammar? This can be a tough question and since it's not critical, you can ignore it :)

from lark.

erezsh commented on July 18, 2024

Well, I assume user_spec always returns a tuple as it should. But you match it in ?sudo_item, and the ? sign literally means: If there is only one item, return it instead of appending it as a branch.

Does that answer your question?

from lark.

gerases commented on July 18, 2024

yep, it certainly does answer my question. thank you!

…

On Nov 16, 2017, at 1:03 PM, Erez Shinan ***@***.***> wrote: Well, I assume user_spec always returns a tuple as it should. But you match it in ?sudo_item, and the ? sign literally means: If there is only one item, return it instead of appending it as a branch. Does that answer your question? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

from lark.

gerases commented on July 18, 2024

With the default parser, is there a way to see which line of input caused the problem?

from lark.

gerases commented on July 18, 2024

I get the error below when trying to ignore the backslash with %ignore /[\\\\]$/m

lark.common.GrammarError: Rules aren't allowed inside tokens (m in __IGNORE_0)

from lark.

gerases commented on July 18, 2024

Last question: is there a built in function to read the grammar from a file?

from lark.

erezsh commented on July 18, 2024

Any reason why the default open(filename).read() isn't good enough?

from lark.

gerases commented on July 18, 2024

No reason really. Just thought there might be a convenience function. All good. Thanks ago. I think we can close this ticket. Thanks again!

from lark.

erezsh commented on July 18, 2024

Well, not sure if it makes any difference to convenience, but you can pass file objects to Lark, so something like Lark(open(filename)) will work just as well.

Thanks for the bug reports and suggestions!

from lark.

gerases commented on July 18, 2024

Yep, that's quite good enough.

from lark.

Question (possibly bug?) about lark HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs