Comments (6)
I have explicitly added white space tokens, rather than ignoring white space in general.
Why? For what purpose?
LALR(1), as its name suggests, only has a look-ahead of 1 token. If you add whitespace to the grammar, you are effectively turning it into LR(0), which is significantly less powerful.
I should confess that there is a bug here, which I should fix. You can bypass it easily with:
_WS.2: WS_INLINE
But as you'll see, it doesn't solve the bigger issue, of LALR's limited lookahead (which Earley, of course, doesn't have).
from lark.
Hi, thanks for the quick answer.
I was suspecting that the grammar was no longer parsable by Earley. I just wanted to point out that Lark should have warned me about that. Your explanation makes sense to me.
Why? For what purpose?
I was just playing around. Say I have a language where white spaces are generally ignored, except in a few constructs, e.g. in numbers with SI prefixes, like 12u
, standing for 12e-6
. Here whitespace between the numeral and the prefix are specifically disallowed. Is there an easy way to do that other than polluting the grammar with optional white spaces, or capturing these numbers with a regexp, and then parsing them separately?
What exactly does _WS.2: WS_INLINE
do? AFAICT, it doesn't help LALR, and it was working fine with Earley already.
from lark.
You're right, these are the two options. You can use regexp groups to avoid parsing the same string twice, but that's a very minor optimization in a language like Python.
There is a mechanism called "lexer states", which allows switching between different modes according to key tokens. So, for example, when you see a number, you'll switch to a state that doesn't ignore whitespace. But it's a little tricky to use, and anyway, Lark doesn't support it right now.
What exactly does _WS.2: WS_INLINE do?
It helps the lexer.
from lark.
I have a language called SVF that I need to parse where LALR finds unexpected tokens even though the expected token would also work, and I'm also explicitly ignoring whitespace tokens. (I'm using LALR because SVF has C style comments and issue #24 precludes the use of the Earley parser) In SVF the last character in a statement must be a semicolon, and the semicolon must not be preceded by a whitespace character.
Your explanation about why using explicit whitespace tokens causes trouble with LALR makes sense to me, but I'm not clear about how to work around this issue with LALR.
Edit: Disregard, I think this is a problem with the lexer and not LALR, I have terminals that are ambiguous
from lark.
Please see my comment on issue #24
from lark.
I'm closing this issue, since I fixed issue #24. If the problem persists feel free to re-open this issue, or open a new one.
from lark.
Related Issues (20)
- Grammar Syntax For Unordered Groups HOT 2
- Is it possible to parse parts of the input? HOT 12
- Forgiving syntax HOT 3
- Post 1388 changes HOT 4
- Dynamic Earley: Incorrect value for SymbolNode.end
- Inconsistent parse results from simple ambiguous grammar HOT 4
- Superfluous identical ambiguities in Earley HOT 3
- Porting from pyparsing match_previous_literal HOT 4
- _TERMINAL appears in tree HOT 1
- Lexer matches shorter literals before longer literals HOT 1
- Priorities not working within recursive rules
- Error in parsing datetime strings HOT 1
- Ambiguities and Priorities in 1.2.1 earley HOT 6
- tests/test_cache.py and Python >= 3.12: SyntaxWarning: invalid escape sequence '\w'
- Terminal collision when importing the same terminal from different grammars
- Generate counter-examples for conflicts in LALR parsing HOT 3
- Escape double quote not working HOT 2
- Extending python grammar HOT 2
- Confusing Error on simple example. HOT 2
- Resolution order changed HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.