GithubHelp home page GithubHelp logo

Question (possibly bug?) about lark HOT 20 CLOSED

gerases avatar gerases commented on July 18, 2024
Question (possibly bug?)

from lark.

Comments (20)

erezsh avatar erezsh commented on July 18, 2024 1

You should use the latest master branch for now, instead of the latest pypi release. I plan to make a release soon, once I see all the recent breaking changes are stable.

from lark.

erezsh avatar erezsh commented on July 18, 2024 1

Regarding line and column for the default parser, I haven't implemented it yet, but I know it's a crucial feature. Feel free to open a new issue for it if you like, and I'll try to get to it soon.

from lark.

erezsh avatar erezsh commented on July 18, 2024

That regex doesn't seem to work with LALR or with Earley?

Regarding LALR: I can see you have different terminals matching the same string. LALR relies on a lexer, which creates tokens before they reach the parser, and so every terminal has to be chosen without awareness to the structure of the grammar.

I did write a slight improvement which makes the lexer a bit more contextually aware. You can try it with parser="lalr", lexer="contextual", but I doubt it will work for your case.

The error you got happens because the token COMMAND matches everything, and it happens to take priority over some of the other terminals.

If the contextual lexer doesn't solve it, you can solve this specific error by giving it priority zero, which will make it match last:

COMMAND.0: "whatever"

However, this issue repeats with a few other terminals too. You will probably have to disentangle them.

from lark.

gerases avatar gerases commented on July 18, 2024

Thank you for the quick response, Erez! I do appreciate that.

That particular error happens only when using LALR. I will try to implement your suggestions in short order. The though of LALR relying on token uniqueness for context did cross my mind, but I quickly dismissed it along with other 10 theories in my head at that moment :) I don't have much experience with parsing, but it's super interesting.

from lark.

erezsh avatar erezsh commented on July 18, 2024

Yes, it's super interesting, and also super confusing :)

Let me know how it works out.

from lark.

gerases avatar gerases commented on July 18, 2024

Will do. Quick question, if I'm not using LALR and given my grammar above, do you know why the parser is having trouble with:

DBA ALL = (oracle) ALL, !SU : ALL = (postgres) ALL, !SU

It looks like /[^,:\n]+/ for the COMMAND token consumes the space before the ':' and that causes an issue. Modifying the rule to/[^,:\n]+(?!:)/ (negative lookahead preventing the consumption of the space) does help, but grinds the parser to a halt on even a few rules.

from lark.

erezsh avatar erezsh commented on July 18, 2024

Well, it sounds like a bug. But why would consuming the space be an issue?

It's probably not the regexp that grinds it to a halt, but rather the parser. I have recently found a performance issue in the Earley parser when dealing with ambiguous grammars. I have written a fix for it, but I haven't merged it to master yet, because I want to run a few more tests on it.

In the meanwhile, check-out the branch earley_fix2 (https://github.com/erezsh/lark/tree/earley_fix2) , and see if it fixes your issue.

from lark.

gerases avatar gerases commented on July 18, 2024

In the meanwhile, check-out the branch earley_fix2

Awesome (🥇 ) Will try by tonight. Tomorrow morning (American morning :))

So you think %ignore /[\\\\]/ is the correct syntax to instruct the parser to ignore a backslash at the end of the line?

Also, if I say /^/ and /$ in a regex, does it mean beginning and end of the whole input string or the line being parsed?

from lark.

gerases avatar gerases commented on July 18, 2024

Whereas it took the current master version almost 2 minutes to parse 10 lines, your dev branch got a 300 line file parsed in less than 3 secs. I'm hoping not much else changed. I had to modify some code in my transformer, but it seems fine otherwise. Woot woot! Thanks again.

from lark.

erezsh avatar erezsh commented on July 18, 2024

Glad to hear it!

%ignore /[\\\\]/ can ignore any backslash, not just at the end of the line. By default, ^ and $ refer to the whole file. But if you use latest master, you can try:

%ignore /\\\\$/m

The 'm' flag tells the regexp to match $ to newlines. I also merged the Earley optimization to master, so you shouldn't have any problem with that.

But, I must say, it's a pretty strange language, if backslashes do nothing in it. Usually it has some structural role, such as continuing the line. In which case you will have to ignore the newline as well. Perhaps like this?

%ignore /\\\\$\n/m

I must say I never tried something like this, so let me know if it doesn't work, and I'll try to figure out what to do next.

from lark.

gerases avatar gerases commented on July 18, 2024

%ignore /[\\]/ can ignore any backslash, not just at the end of the line. By default, ^ and $ refer to the whole file. But if you use latest master, you can try:

%ignore /\\$/m

Excellent

The 'm' flag tells the regexp to match $ to newlines. I also merged the Earley optimization to master, so you shouldn't have any problem with that.

Cool

But, I must say, it's a pretty strange language, if backslashes do nothing in it. Usually it has some structural role, such as continuing the line. In which case you will have to ignore the newline as well. Perhaps like this?

%ignore /\\$\n/m

Indeed the backslash is a line continuation. I will try to modify and watch out for any problems.

I must say I never tried something like this, so let me know if it doesn't work, and I'll try to figure out what to do next.

Will do.

There's one small thing I discovered yesterday that I wanted to ask you about. Given this input:

DBA ALL = (oracle) ALL

the tree looks like this:

  user_list
    user        DBA
  host_list
    host        ALL
  cmnd_spec
    runas_list
      user      oracle
    cmnd_alias_name     ALL

After I apply my transformations, which are very simple, I get back a tuple that looks like this:

('user_spec', [['DBA'], ['ALL'], [['oracle'], 'ALL']])

So far so good.. If the source is two lines like so:

DBA ALL = (oracle) ALL
DBA ALL2 = (oracle2) ALL

three looks like this:

user_spec
    user_list
      user      DBA
    host_list
      host      ALL
    cmnd_spec
      runas_list
        user    oracle
      cmnd_alias_name   ALL
  user_spec
    user_list
      user      DBA
    host_list
      host      ALL2
    cmnd_spec
      runas_list
        user    oracle2
      cmnd_alias_name   ALL

or any single rule, the output comes out as an array of two tuples:

[
   ('user_spec', [['DBA'], ['ALL'], [['oracle'], 'ALL']]),
    ('user_spec', [['DBA'], ['ALL2'], [['oracle2'], 'ALL']])
]

As a result, I have to first check if the result is a tuple or an array. If it's a tuple, I create a 1-element array from the tuple.

Is that expected that a single rule might return a tuple while two will return an array? Is it something about my grammar? This can be a tough question and since it's not critical, you can ignore it :)

from lark.

erezsh avatar erezsh commented on July 18, 2024

Well, I assume user_spec always returns a tuple as it should. But you match it in ?sudo_item, and the ? sign literally means: If there is only one item, return it instead of appending it as a branch.

Does that answer your question?

from lark.

gerases avatar gerases commented on July 18, 2024

from lark.

gerases avatar gerases commented on July 18, 2024

With the default parser, is there a way to see which line of input caused the problem?

from lark.

gerases avatar gerases commented on July 18, 2024

I get the error below when trying to ignore the backslash with %ignore /[\\\\]$/m

lark.common.GrammarError: Rules aren't allowed inside tokens (m in __IGNORE_0)

from lark.

gerases avatar gerases commented on July 18, 2024

Last question: is there a built in function to read the grammar from a file?

from lark.

erezsh avatar erezsh commented on July 18, 2024

Any reason why the default open(filename).read() isn't good enough?

from lark.

gerases avatar gerases commented on July 18, 2024

No reason really. Just thought there might be a convenience function. All good. Thanks ago. I think we can close this ticket. Thanks again!

from lark.

erezsh avatar erezsh commented on July 18, 2024

Well, not sure if it makes any difference to convenience, but you can pass file objects to Lark, so something like Lark(open(filename)) will work just as well.

Thanks for the bug reports and suggestions!

from lark.

gerases avatar gerases commented on July 18, 2024

Yep, that's quite good enough.

from lark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.