I know its a long shot and not really something earley was meant for, but are there an

Hmm, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Limiting Parsings for Ambiguous Grammars (Bocages) about nearley HOT 23 CLOSED

YafahEdelman commented on August 23, 2024

Limiting Parsings for Ambiguous Grammars (Bocages)

from nearley.

Comments (23)

kach commented on August 23, 2024 1

That's not a well-written grammar, though. You should write num -> num "+" [0-9]:+ | [0-9]:+. If there's an ambiguous grammar, the assumption is that you want all possible parsings (I can think of use-cases where this is a legitimate assumption).

from nearley.

YafahEdelman commented on August 23, 2024

But it would be nice to have there be an option rather than have nearley just make this assumption. Do you have any ideas on how to use this?

from nearley.

kach commented on August 23, 2024

Well, I think it would be better to write a linting service that warns you if your grammar is ambiguous. In fact, that would be really nice! Also, algorithmically nontrivial, because I can't quickly think of a way to statically determine if a grammar is ambiguous.

from nearley.

YafahEdelman commented on August 23, 2024

After a bit of searching around because making such an algorithm did seem quiet interesting I discovered that it was extremely non-trivial, aka, not computable. Although maybe it will be possible to make a heuristic that works in a lot of basic cases, I'll have to research some more.
Edit: Maybe its only not computable for general grammars? I'm not sure, I'll keep researching.

from nearley.

tjvr commented on August 23, 2024

Well, I think it would be better to write a linting service that warns you if your grammar is ambiguous.

You know that's undecidable in general, right? As JacobEdelman points out above.

from nearley.

YafahEdelman commented on August 23, 2024

Although its undecidable in general it is possible to add a mode that allows precedence and will only find one possible parsing. I'm still not sure exactly what kind of speed increase this would give you but I've implemented a (toy) parser that can do it.

from nearley.

tjvr commented on August 23, 2024

Because Earley builds a parse table with all possible states, how can you avoid computing all possible parsings?

I'm not sure what you mean by adding precedence, but that sounds like a bad idea. @Hardmath123 is a fan of using BNF to express operator precedence :-)

from nearley.

YafahEdelman commented on August 23, 2024

A general outline is that you can selectively compute parts of the parse table in an attempt to compute the final parsing. If your parsing was not successful you compute more of the parse table. I'll look at doing that but I do agree that if it does end up causing a dramatic speed increase it won't be worth it.

from nearley.

patrickhuber commented on August 23, 2024

You could build the shared packed parse forest (SPPF) as you go. Elizabeth Scott noted this in her 2008 paper "SPPF-Style Parsing From Earley Recognisers" .

Marpa's Jeffery Kegler uses the term Bocage to denote this structure as you can see here .

from nearley.

kach commented on August 23, 2024

Hmm, @patrickhuber, that looks pretty interesting. From what I can tell, this should allow us to create some sort of lazy iterable data structure where each iteration results in one valid parsing. The obvious advantage is that for very ambiguous grammars with exponential parses, it would be more memory-efficient. On the other hand, building an SPPF takes time, and so there's a speed tradeoff. This sounds like a fun experiment.

(Again, my first choice is that people just learn to write unambiguous grammars—in 99.9% of cases, it's very obvious how to write your grammar unambiguously and you don't actually want to enumerate all possible parsings anyway.)

from nearley.

patrickhuber commented on August 23, 2024

Near the bottom of the paper she creates a inline representation that runs at the same time complexity as the Earley algorithm. Jeffery Kegler stated in the marpa forums that building the parse tree using binary, reusable nodes follows the same complexity as recognition. On trivial samples, you may notice a very small bit of overhead, but as the size of the parse grows, the overhead will be inconsequential.

I agree that the easy option is to rewrite the grammar, but part of the appeal of the Earley algorithm is its ability to handle ambiguity with ease.

from nearley.

kach commented on August 23, 2024

Whoo, okay, I'm sold. Time to dig into some papers and find someone smart enough to teach me how to implement it (@JacobEdelman ? @blob8108 ? @rwindelz ?).

from nearley.

YafahEdelman commented on August 23, 2024

I'll look it over later today/tomorrow.
On Apr 14, 2015 6:48 PM, "Hardmath123" [email protected] wrote:

Whoo, okay, I'm sold. Time to dig into some papers and find someone smart
enough to teach me how to implement it (@JacobEdelman
https://github.com/JacobEdelman ? @blob8108
https://github.com/blob8108 ? @rwindelz https://github.com/rwindelz
?).

—
Reply to this email directly or view it on GitHub
#53 (comment).

from nearley.

patrickhuber commented on August 23, 2024

If it helps, I have a implementation here in C#. https://github.com/patrickhuber/Pliant/tree/shared-packed-parse-forest/libraries/Pliant. It doesn't currently work for Leo Items. Main body of code is in the PulseRecognizer.cs.

from nearley.

kach commented on August 23, 2024

Ah, nice to know. This will be a handy reference (the collection of links in your README is also very useful!).

(We haven't implemented Leo optimization yet, so that's not an issue. I should really go read about that soon.)

from nearley.

erezsh commented on August 23, 2024

Personally, I would be very happy if there was some boolean parameter I could give the parser (say, "disable_ambiguity"), that would throw an exception if detecting an ambiguous input.

Obviously we shouldn't write ambiguous grammars if we don't want them, but we also shouldn't write bugs in general, and that still happens some time. And some grammars can be pretty big. It'd be nice to have a graceful exit, instead of a hang / memory error.

from nearley.

kach commented on August 23, 2024

See, the problem with this is that since nearley is streaming, you never know if the input is "ambiguous" because the ambiguity might be resolved later. Let me give you an example:

main -> cow_or_colorado | cone_or_colorado
cow_or_colorado -> "CO" "W":?
cone_or_colorado -> "CO" "NE":?

Suppose I gave this parser "C" and "O". Now, it thinks you could either have a cow_or_colorado or a cone_or_colorado—but it can't throw an error yet! This is because if you feed the parser "W" or "NE", the ambiguity will get resolved in favor of the former or latter, respectively. Nearley never knows when you're done giving it input.

This means, by the way, that you could always detect ambiguity on your end by looking at the results array and ensuring that its length is exactly 1.

from nearley.

erezsh commented on August 23, 2024

Well, the reason I ask is that I wrote a python version of nearley (just the parser) for my parsing library, and I want to be able to protect my users from writing ambiguous grammars that practically never end. Right now it's very hard to discover exactly which rule is ambiguous.

You can see it here btw: https://github.com/erezsh/plyplus/blob/master/plyplus/pearley.py
It works very well, but unfortunately it's still very slow (10x time compared to PLY).

from nearley.

tjvr commented on August 23, 2024

@erezsh On the subject of optimisation: I wrote a python version a while ago, it might have some useful optimisations

from nearley.

kach commented on August 23, 2024

@erezsh pearley looks really cool! Very exciting.

I notice most of the code seems to be a direct port of nearley's—in this case, I think I can answer your comment on line 67 (https://github.com/erezsh/plyplus/blob/master/plyplus/pearley.py#L67). Let me know if you'd like that.

Again, as I've said, you can't detect ambiguity automatically. Not possible. The solution is to inspect the output with small test cases—then it becomes really easy to see, intuitively, what is getting parsed in multiple ways. Perhaps you can figure out a clever way to automate this process—if you do, let me know and I would be happy to include it in the nearley package. :-)

from nearley.

wizzard0 commented on August 23, 2024

My 5 cents: being able to enumerate exponential number of parses is very important and is what makes nearley, um, the parser of choice (for NLP and for generated grammars at least).
Of course it makes it slower at times... but hey, universality is almost always at odds with performance.

btw the grammar num -> num "+" num | [0-9]:+ does not in fact generate ambiguous parses in playground at least, how can get my 2^n parses there? :)

from nearley.

kach commented on August 23, 2024

It's certainly supposed to show all parses… I think the author is using an outdated version of nearley, a version which had a bug that suppressed ambiguous output in some cases (including your case). Not sure, though, because the client JS is packed.

from nearley.

kach commented on August 23, 2024

Closing because #163 can continue this discussion.

from nearley.

Limiting Parsings for Ambiguous Grammars (Bocages) about nearley HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs