Below you find an adapted version of the calc example. I have explicitly added white s

Please see my comment on issue <a class="issue-link js-issue-link" data-error-text="Fa

I'm closing this issue, since I fixed issue <a class="issue-link js-issue-link" data-e

Grammar works as expected with Earley, parser fails with LALR about lark HOT 6 CLOSED

lark-parser commented on September 28, 2024

Grammar works as expected with Earley, parser fails with LALR

from lark.

Comments (6)

erezsh commented on September 28, 2024

I have explicitly added white space tokens, rather than ignoring white space in general.

Why? For what purpose?
LALR(1), as its name suggests, only has a look-ahead of 1 token. If you add whitespace to the grammar, you are effectively turning it into LR(0), which is significantly less powerful.

I should confess that there is a bug here, which I should fix. You can bypass it easily with:
_WS.2: WS_INLINE
But as you'll see, it doesn't solve the bigger issue, of LALR's limited lookahead (which Earley, of course, doesn't have).

from lark.

polwel commented on September 28, 2024

Hi, thanks for the quick answer.

I was suspecting that the grammar was no longer parsable by Earley. I just wanted to point out that Lark should have warned me about that. Your explanation makes sense to me.

Why? For what purpose?

I was just playing around. Say I have a language where white spaces are generally ignored, except in a few constructs, e.g. in numbers with SI prefixes, like 12u, standing for 12e-6. Here whitespace between the numeral and the prefix are specifically disallowed. Is there an easy way to do that other than polluting the grammar with optional white spaces, or capturing these numbers with a regexp, and then parsing them separately?

What exactly does _WS.2: WS_INLINE do? AFAICT, it doesn't help LALR, and it was working fine with Earley already.

from lark.

erezsh commented on September 28, 2024

You're right, these are the two options. You can use regexp groups to avoid parsing the same string twice, but that's a very minor optimization in a language like Python.
There is a mechanism called "lexer states", which allows switching between different modes according to key tokens. So, for example, when you see a number, you'll switch to a state that doesn't ignore whitespace. But it's a little tricky to use, and anyway, Lark doesn't support it right now.

What exactly does _WS.2: WS_INLINE do?

It helps the lexer.

from lark.

psboyce commented on September 28, 2024

I have a language called SVF that I need to parse where LALR finds unexpected tokens even though the expected token would also work, and I'm also explicitly ignoring whitespace tokens. (I'm using LALR because SVF has C style comments and issue #24 precludes the use of the Earley parser) In SVF the last character in a statement must be a semicolon, and the semicolon must not be preceded by a whitespace character.

Your explanation about why using explicit whitespace tokens causes trouble with LALR makes sense to me, but I'm not clear about how to work around this issue with LALR.

Edit: Disregard, I think this is a problem with the lexer and not LALR, I have terminals that are ambiguous

from lark.

erezsh commented on September 28, 2024

Please see my comment on issue #24

from lark.

erezsh commented on September 28, 2024

I'm closing this issue, since I fixed issue #24. If the problem persists feel free to re-open this issue, or open a new one.

from lark.

Grammar works as expected with Earley, parser fails with LALR about lark HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs