GithubHelp home page GithubHelp logo

Comments (6)

erezsh avatar erezsh commented on September 28, 2024

I have explicitly added white space tokens, rather than ignoring white space in general.

Why? For what purpose?
LALR(1), as its name suggests, only has a look-ahead of 1 token. If you add whitespace to the grammar, you are effectively turning it into LR(0), which is significantly less powerful.

I should confess that there is a bug here, which I should fix. You can bypass it easily with:
_WS.2: WS_INLINE
But as you'll see, it doesn't solve the bigger issue, of LALR's limited lookahead (which Earley, of course, doesn't have).

from lark.

polwel avatar polwel commented on September 28, 2024

Hi, thanks for the quick answer.

I was suspecting that the grammar was no longer parsable by Earley. I just wanted to point out that Lark should have warned me about that. Your explanation makes sense to me.

Why? For what purpose?

I was just playing around. Say I have a language where white spaces are generally ignored, except in a few constructs, e.g. in numbers with SI prefixes, like 12u, standing for 12e-6. Here whitespace between the numeral and the prefix are specifically disallowed. Is there an easy way to do that other than polluting the grammar with optional white spaces, or capturing these numbers with a regexp, and then parsing them separately?

What exactly does _WS.2: WS_INLINE do? AFAICT, it doesn't help LALR, and it was working fine with Earley already.

from lark.

erezsh avatar erezsh commented on September 28, 2024

You're right, these are the two options. You can use regexp groups to avoid parsing the same string twice, but that's a very minor optimization in a language like Python.
There is a mechanism called "lexer states", which allows switching between different modes according to key tokens. So, for example, when you see a number, you'll switch to a state that doesn't ignore whitespace. But it's a little tricky to use, and anyway, Lark doesn't support it right now.

What exactly does _WS.2: WS_INLINE do?

It helps the lexer.

from lark.

psboyce avatar psboyce commented on September 28, 2024

I have a language called SVF that I need to parse where LALR finds unexpected tokens even though the expected token would also work, and I'm also explicitly ignoring whitespace tokens. (I'm using LALR because SVF has C style comments and issue #24 precludes the use of the Earley parser) In SVF the last character in a statement must be a semicolon, and the semicolon must not be preceded by a whitespace character.

Your explanation about why using explicit whitespace tokens causes trouble with LALR makes sense to me, but I'm not clear about how to work around this issue with LALR.

Edit: Disregard, I think this is a problem with the lexer and not LALR, I have terminals that are ambiguous

from lark.

erezsh avatar erezsh commented on September 28, 2024

Please see my comment on issue #24

from lark.

erezsh avatar erezsh commented on September 28, 2024

I'm closing this issue, since I fixed issue #24. If the problem persists feel free to re-open this issue, or open a new one.

from lark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.