<a href="https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Token.html#

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

integer dosen't fail for leading white spaces about parsec HOT 11 OPEN

haskell commented on August 17, 2024

integer dosen't fail for leading white spaces

from parsec.

Comments (11)

aslatter commented on August 17, 2024

I don't have a lot of experience with the TokenParser module, but I believe the intent was that the different lexeme parsers would work together in concert and jointly handle whitespace among them - the original guide gives some useful examples and background:

https://web.archive.org/web/20140529211116/http://legacy.cs.uu.nl/daan/download/parsec/parsec.html#Lexical%20analysis

from parsec.

albertnetymk commented on August 17, 2024

Am I posting this in the wrong place? I thought this issue tracker is for parsec.

Thanks for the link; I am not aware of it. The lexeme part is identical, AFAIC.

According to both of them, lexeme parsers could take care of all trailing spaces, but not leading spaces.

The only point where the whiteSpace parser should be called explicitly is the start of the main parser in order to skip any leading white space.

Therefore, in the original example I provided, natural, integer, and integer', which are the main parsers, for they are used for parse input directly, should fail for input with leading spaces, but the actual behavior is not, as shown in above snippet.

Hope I made myself clear now.

from parsec.

mrkkrp commented on August 17, 2024

@albertnetymk, I would consider it a bug. All lexemes are expected to consume trailing, but not leading whitespace. In this case, in some circumstances integer won't fail when it should, thus some alternative branch of parsing logic won't be used, for example. While this is not a severe bug, it's still a bug.

This happens because sign is parsed in int parser (in Text.Parsec.Token) as a lexeme and sign can be missing, in this case we consume leading whitespace:

    integer         = lexeme int      <?> "integer"

    int             = do{ f <- lexeme sign
                        ; n <- nat
                        ; return (f n)
                        }

    sign            =   (char '-' >> return negate)
                    <|> (char '+' >> return id)
                    <|> return id

This is easy to fix, though. I can fix it when I will be fixing #35, what do you think, @aslatter? Is it desirable? If you want to keep present behavior, you should document it in description of integer parser.

from parsec.

albertnetymk commented on August 17, 2024

@mrkkrp It's good that we reach consensus on this. However, I think it's best that the fix for this goes into a separate PR, for this issue is different from #35, I think.

from parsec.

mrkkrp commented on August 17, 2024

@albertnetymk, yes, of course. I meant that I would do it at the same time, but not in the same PR.

from parsec.

aslatter commented on August 17, 2024

To make sure I'm understanding this - the bug is that int consumes leading whitespace because of the lexeme combinator wrapping the sign parser, which is internally structure to optionally parse a sign. The the sign combinator consumes nothing we still consume the trailing spaces, which leads to wrong errors (or wrong behavior) when int is used as part of some other branching structure.

It seems to me that the whole point of the structure of the TokeParser module is to defer white-space consuming as late as possible, so fixing this would be good.

from parsec.

mrkkrp commented on August 17, 2024

@aslatter, great. Should we add new test for this bug or it's not necessary? What do you think?

from parsec.

albertnetymk commented on August 17, 2024

@aslatter I agree with your understanding above. @mrkkrp I think it makes sense to add one new test. My original test snippet might be of any use.

from parsec.

mrkkrp commented on August 17, 2024

I think the best way to fix this is by rewriting of sign parser so it consumes only + or -, but fails when they are not present. This sounds very intuitive to me. Next, we can just rewrite parsing of sign in int as f <- option id (lexeme sign). Of course we should make sure that new version of sign is used properly in exponent' too (to allow valid inputs like 0.5e3 as well as 0.5e+3, etc).

from parsec.

albertnetymk commented on August 17, 2024

Exactly. That's my thought as well. See the top snippet in this ticket.

from parsec.

mrkkrp commented on August 17, 2024

@albertnetymk, oh indeed. I just paid more attention to tests that demonstrate the flaw, I should have noticed your solution. Anyway I'm done with it, but I'm waiting decision on my current PR, so I won't push it until this PR is merged. I could create a separate branch, though, if necessary.

I've added test that checks against these inputs:

shouldFail :: [String]
shouldFail = [" 1", " +1", " -1"]

shouldSucceed :: [String]
shouldSucceed = ["1", "+1", "-1", "+ 1 ", "- 1 ", "1 "]

this should be ample.

from parsec.

integer dosen't fail for leading white spaces about parsec HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs