GithubHelp home page GithubHelp logo

Comments (11)

aslatter avatar aslatter commented on July 18, 2024

I don't have a lot of experience with the TokenParser module, but I believe the intent was that the different lexeme parsers would work together in concert and jointly handle whitespace among them - the original guide gives some useful examples and background:

https://web.archive.org/web/20140529211116/http://legacy.cs.uu.nl/daan/download/parsec/parsec.html#Lexical%20analysis

from parsec.

albertnetymk avatar albertnetymk commented on July 18, 2024

Am I posting this in the wrong place? I thought this issue tracker is for parsec.

Thanks for the link; I am not aware of it. The lexeme part is identical, AFAIC.

According to both of them, lexeme parsers could take care of all trailing spaces, but not leading spaces.

The only point where the whiteSpace parser should be called explicitly is the start of the main parser in order to skip any leading white space.

Therefore, in the original example I provided, natural, integer, and integer', which are the main parsers, for they are used for parse input directly, should fail for input with leading spaces, but the actual behavior is not, as shown in above snippet.

Hope I made myself clear now.

from parsec.

mrkkrp avatar mrkkrp commented on July 18, 2024

@albertnetymk, I would consider it a bug. All lexemes are expected to consume trailing, but not leading whitespace. In this case, in some circumstances integer won't fail when it should, thus some alternative branch of parsing logic won't be used, for example. While this is not a severe bug, it's still a bug.

This happens because sign is parsed in int parser (in Text.Parsec.Token) as a lexeme and sign can be missing, in this case we consume leading whitespace:

    integer         = lexeme int      <?> "integer"

    int             = do{ f <- lexeme sign
                        ; n <- nat
                        ; return (f n)
                        }

    sign            =   (char '-' >> return negate)
                    <|> (char '+' >> return id)
                    <|> return id

This is easy to fix, though. I can fix it when I will be fixing #35, what do you think, @aslatter? Is it desirable? If you want to keep present behavior, you should document it in description of integer parser.

from parsec.

albertnetymk avatar albertnetymk commented on July 18, 2024

@mrkkrp It's good that we reach consensus on this. However, I think it's best that the fix for this goes into a separate PR, for this issue is different from #35, I think.

from parsec.

mrkkrp avatar mrkkrp commented on July 18, 2024

@albertnetymk, yes, of course. I meant that I would do it at the same time, but not in the same PR.

from parsec.

aslatter avatar aslatter commented on July 18, 2024

To make sure I'm understanding this - the bug is that int consumes leading whitespace because of the lexeme combinator wrapping the sign parser, which is internally structure to optionally parse a sign. The the sign combinator consumes nothing we still consume the trailing spaces, which leads to wrong errors (or wrong behavior) when int is used as part of some other branching structure.

It seems to me that the whole point of the structure of the TokeParser module is to defer white-space consuming as late as possible, so fixing this would be good.

from parsec.

mrkkrp avatar mrkkrp commented on July 18, 2024

@aslatter, great. Should we add new test for this bug or it's not necessary? What do you think?

from parsec.

albertnetymk avatar albertnetymk commented on July 18, 2024

@aslatter I agree with your understanding above. @mrkkrp I think it makes sense to add one new test. My original test snippet might be of any use.

from parsec.

mrkkrp avatar mrkkrp commented on July 18, 2024

I think the best way to fix this is by rewriting of sign parser so it consumes only + or -, but fails when they are not present. This sounds very intuitive to me. Next, we can just rewrite parsing of sign in int as f <- option id (lexeme sign). Of course we should make sure that new version of sign is used properly in exponent' too (to allow valid inputs like 0.5e3 as well as 0.5e+3, etc).

from parsec.

albertnetymk avatar albertnetymk commented on July 18, 2024

Exactly. That's my thought as well. See the top snippet in this ticket.

from parsec.

mrkkrp avatar mrkkrp commented on July 18, 2024

@albertnetymk, oh indeed. I just paid more attention to tests that demonstrate the flaw, I should have noticed your solution. Anyway I'm done with it, but I'm waiting decision on my current PR, so I won't push it until this PR is merged. I could create a separate branch, though, if necessary.

I've added test that checks against these inputs:

shouldFail :: [String]
shouldFail = [" 1", " +1", " -1"]

shouldSucceed :: [String]
shouldSucceed = ["1", "+1", "-1", "+ 1 ", "- 1 ", "1 "]

this should be ample.

from parsec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.