GithubHelp home page GithubHelp logo

Comments (23)

kach avatar kach commented on August 23, 2024 1

That's not a well-written grammar, though. You should write num -> num "+" [0-9]:+ | [0-9]:+. If there's an ambiguous grammar, the assumption is that you want all possible parsings (I can think of use-cases where this is a legitimate assumption).

from nearley.

YafahEdelman avatar YafahEdelman commented on August 23, 2024

But it would be nice to have there be an option rather than have nearley just make this assumption. Do you have any ideas on how to use this?

from nearley.

kach avatar kach commented on August 23, 2024

Well, I think it would be better to write a linting service that warns you if your grammar is ambiguous. In fact, that would be really nice! Also, algorithmically nontrivial, because I can't quickly think of a way to statically determine if a grammar is ambiguous.

from nearley.

YafahEdelman avatar YafahEdelman commented on August 23, 2024

After a bit of searching around because making such an algorithm did seem quiet interesting I discovered that it was extremely non-trivial, aka, not computable. Although maybe it will be possible to make a heuristic that works in a lot of basic cases, I'll have to research some more.
Edit: Maybe its only not computable for general grammars? I'm not sure, I'll keep researching.

from nearley.

tjvr avatar tjvr commented on August 23, 2024

Well, I think it would be better to write a linting service that warns you if your grammar is ambiguous.

You know that's undecidable in general, right? As JacobEdelman points out above.

from nearley.

YafahEdelman avatar YafahEdelman commented on August 23, 2024

Although its undecidable in general it is possible to add a mode that allows precedence and will only find one possible parsing. I'm still not sure exactly what kind of speed increase this would give you but I've implemented a (toy) parser that can do it.

from nearley.

tjvr avatar tjvr commented on August 23, 2024

Because Earley builds a parse table with all possible states, how can you avoid computing all possible parsings?

I'm not sure what you mean by adding precedence, but that sounds like a bad idea. @Hardmath123 is a fan of using BNF to express operator precedence :-)

from nearley.

YafahEdelman avatar YafahEdelman commented on August 23, 2024

A general outline is that you can selectively compute parts of the parse table in an attempt to compute the final parsing. If your parsing was not successful you compute more of the parse table. I'll look at doing that but I do agree that if it does end up causing a dramatic speed increase it won't be worth it.

from nearley.

patrickhuber avatar patrickhuber commented on August 23, 2024

You could build the shared packed parse forest (SPPF) as you go. Elizabeth Scott noted this in her 2008 paper "SPPF-Style Parsing From Earley Recognisers" .

Marpa's Jeffery Kegler uses the term Bocage to denote this structure as you can see here .

from nearley.

kach avatar kach commented on August 23, 2024

Hmm, @patrickhuber, that looks pretty interesting. From what I can tell, this should allow us to create some sort of lazy iterable data structure where each iteration results in one valid parsing. The obvious advantage is that for very ambiguous grammars with exponential parses, it would be more memory-efficient. On the other hand, building an SPPF takes time, and so there's a speed tradeoff. This sounds like a fun experiment.

(Again, my first choice is that people just learn to write unambiguous grammars—in 99.9% of cases, it's very obvious how to write your grammar unambiguously and you don't actually want to enumerate all possible parsings anyway.)

from nearley.

patrickhuber avatar patrickhuber commented on August 23, 2024

Near the bottom of the paper she creates a inline representation that runs at the same time complexity as the Earley algorithm. Jeffery Kegler stated in the marpa forums that building the parse tree using binary, reusable nodes follows the same complexity as recognition. On trivial samples, you may notice a very small bit of overhead, but as the size of the parse grows, the overhead will be inconsequential.

I agree that the easy option is to rewrite the grammar, but part of the appeal of the Earley algorithm is its ability to handle ambiguity with ease.

from nearley.

kach avatar kach commented on August 23, 2024

Whoo, okay, I'm sold. Time to dig into some papers and find someone smart enough to teach me how to implement it (@JacobEdelman ? @blob8108 ? @rwindelz ?).

from nearley.

YafahEdelman avatar YafahEdelman commented on August 23, 2024

I'll look it over later today/tomorrow.
On Apr 14, 2015 6:48 PM, "Hardmath123" [email protected] wrote:

Whoo, okay, I'm sold. Time to dig into some papers and find someone smart
enough to teach me how to implement it (@JacobEdelman
https://github.com/JacobEdelman ? @blob8108
https://github.com/blob8108 ? @rwindelz https://github.com/rwindelz
?).


Reply to this email directly or view it on GitHub
#53 (comment).

from nearley.

patrickhuber avatar patrickhuber commented on August 23, 2024

If it helps, I have a implementation here in C#. https://github.com/patrickhuber/Pliant/tree/shared-packed-parse-forest/libraries/Pliant. It doesn't currently work for Leo Items. Main body of code is in the PulseRecognizer.cs.

from nearley.

kach avatar kach commented on August 23, 2024

Ah, nice to know. This will be a handy reference (the collection of links in your README is also very useful!).

(We haven't implemented Leo optimization yet, so that's not an issue. I should really go read about that soon.)

from nearley.

erezsh avatar erezsh commented on August 23, 2024

Personally, I would be very happy if there was some boolean parameter I could give the parser (say, "disable_ambiguity"), that would throw an exception if detecting an ambiguous input.

Obviously we shouldn't write ambiguous grammars if we don't want them, but we also shouldn't write bugs in general, and that still happens some time. And some grammars can be pretty big. It'd be nice to have a graceful exit, instead of a hang / memory error.

from nearley.

kach avatar kach commented on August 23, 2024

See, the problem with this is that since nearley is streaming, you never know if the input is "ambiguous" because the ambiguity might be resolved later. Let me give you an example:

main -> cow_or_colorado | cone_or_colorado
cow_or_colorado -> "CO" "W":?
cone_or_colorado -> "CO" "NE":?

Suppose I gave this parser "C" and "O". Now, it thinks you could either have a cow_or_colorado or a cone_or_colorado—but it can't throw an error yet! This is because if you feed the parser "W" or "NE", the ambiguity will get resolved in favor of the former or latter, respectively. Nearley never knows when you're done giving it input.

This means, by the way, that you could always detect ambiguity on your end by looking at the results array and ensuring that its length is exactly 1.

from nearley.

erezsh avatar erezsh commented on August 23, 2024

Well, the reason I ask is that I wrote a python version of nearley (just the parser) for my parsing library, and I want to be able to protect my users from writing ambiguous grammars that practically never end. Right now it's very hard to discover exactly which rule is ambiguous.

You can see it here btw: https://github.com/erezsh/plyplus/blob/master/plyplus/pearley.py
It works very well, but unfortunately it's still very slow (10x time compared to PLY).

from nearley.

tjvr avatar tjvr commented on August 23, 2024

@erezsh On the subject of optimisation: I wrote a python version a while ago, it might have some useful optimisations

from nearley.

kach avatar kach commented on August 23, 2024

@erezsh pearley looks really cool! Very exciting.

I notice most of the code seems to be a direct port of nearley's—in this case, I think I can answer your comment on line 67 (https://github.com/erezsh/plyplus/blob/master/plyplus/pearley.py#L67). Let me know if you'd like that.

Again, as I've said, you can't detect ambiguity automatically. Not possible. The solution is to inspect the output with small test cases—then it becomes really easy to see, intuitively, what is getting parsed in multiple ways. Perhaps you can figure out a clever way to automate this process—if you do, let me know and I would be happy to include it in the nearley package. :-)

from nearley.

wizzard0 avatar wizzard0 commented on August 23, 2024

My 5 cents: being able to enumerate exponential number of parses is very important and is what makes nearley, um, the parser of choice (for NLP and for generated grammars at least).
Of course it makes it slower at times... but hey, universality is almost always at odds with performance.

btw the grammar num -> num "+" num | [0-9]:+ does not in fact generate ambiguous parses in playground at least, how can get my 2^n parses there? :)

from nearley.

kach avatar kach commented on August 23, 2024

It's certainly supposed to show all parses… I think the author is using an outdated version of nearley, a version which had a bug that suppressed ambiguous output in some cases (including your case). Not sure, though, because the client JS is packed.

from nearley.

kach avatar kach commented on August 23, 2024

Closing because #163 can continue this discussion.

from nearley.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.