Comments (11)
I don't have a lot of experience with the TokenParser
module, but I believe the intent was that the different lexeme parsers would work together in concert and jointly handle whitespace among them - the original guide gives some useful examples and background:
from parsec.
Am I posting this in the wrong place? I thought this issue tracker is for parsec.
Thanks for the link; I am not aware of it. The lexeme part is identical, AFAIC.
According to both of them, lexeme parsers could take care of all trailing spaces, but not leading spaces.
The only point where the whiteSpace parser should be called explicitly is the start of the main parser in order to skip any leading white space.
Therefore, in the original example I provided, natural
, integer
, and integer'
, which are the main parsers, for they are used for parse input directly, should fail for input with leading spaces, but the actual behavior is not, as shown in above snippet.
Hope I made myself clear now.
from parsec.
@albertnetymk, I would consider it a bug. All lexemes are expected to consume trailing, but not leading whitespace. In this case, in some circumstances integer
won't fail when it should, thus some alternative branch of parsing logic won't be used, for example. While this is not a severe bug, it's still a bug.
This happens because sign is parsed in int
parser (in Text.Parsec.Token
) as a lexeme and sign can be missing, in this case we consume leading whitespace:
integer = lexeme int <?> "integer"
int = do{ f <- lexeme sign
; n <- nat
; return (f n)
}
sign = (char '-' >> return negate)
<|> (char '+' >> return id)
<|> return id
This is easy to fix, though. I can fix it when I will be fixing #35, what do you think, @aslatter? Is it desirable? If you want to keep present behavior, you should document it in description of integer
parser.
from parsec.
@mrkkrp It's good that we reach consensus on this. However, I think it's best that the fix for this goes into a separate PR, for this issue is different from #35, I think.
from parsec.
@albertnetymk, yes, of course. I meant that I would do it at the same time, but not in the same PR.
from parsec.
To make sure I'm understanding this - the bug is that int
consumes leading whitespace because of the lexeme
combinator wrapping the sign
parser, which is internally structure to optionally parse a sign. The the sign
combinator consumes nothing we still consume the trailing spaces, which leads to wrong errors (or wrong behavior) when int
is used as part of some other branching structure.
It seems to me that the whole point of the structure of the TokeParser
module is to defer white-space consuming as late as possible, so fixing this would be good.
from parsec.
@aslatter, great. Should we add new test for this bug or it's not necessary? What do you think?
from parsec.
@aslatter I agree with your understanding above. @mrkkrp I think it makes sense to add one new test. My original test snippet might be of any use.
from parsec.
I think the best way to fix this is by rewriting of sign
parser so it consumes only +
or -
, but fails when they are not present. This sounds very intuitive to me. Next, we can just rewrite parsing of sign in int
as f <- option id (lexeme sign)
. Of course we should make sure that new version of sign
is used properly in exponent'
too (to allow valid inputs like 0.5e3
as well as 0.5e+3
, etc).
from parsec.
Exactly. That's my thought as well. See the top snippet in this ticket.
from parsec.
@albertnetymk, oh indeed. I just paid more attention to tests that demonstrate the flaw, I should have noticed your solution. Anyway I'm done with it, but I'm waiting decision on my current PR, so I won't push it until this PR is merged. I could create a separate branch, though, if necessary.
I've added test that checks against these inputs:
shouldFail :: [String]
shouldFail = [" 1", " +1", " -1"]
shouldSucceed :: [String]
shouldSucceed = ["1", "+1", "-1", "+ 1 ", "- 1 ", "1 "]
this should be ample.
from parsec.
Related Issues (20)
- (>>=) leaks memory
- Documentation regarding updatePosChar does not match the function's behaviour
- tokenise comments
- add a parameter to makeTokenParser to specify options for treating space by lexeme
- `updatePosChar` does not increment line number like doccumentation says
- GHC 9.2.1 release? HOT 2
- Which unfoldM is meant in "unfoldM uncons gives the [t] corresponding to the stream"?
- Compatibility with mtl-2.3 HOT 2
- string function not working correctly with (<|>) when the head of the strings are the same but their tails are not HOT 2
- Link in readme broken HOT 1
- How to handle include?
- Parsec crashes HOT 2
- cabal build -c 'mtl == 2.2.1' fails because of Safe Haskell HOT 7
- Generate syntax highlighting files HOT 1
- Parameterized enclosing for char and string literals HOT 1
- The recent fix of the `(>>=)` memory leak seems to cause an enormous performance degradation. HOT 22
- Track consumed token count in ParserState (and ParseError)
- Haddock combinator 'many' is applied to a parser that accepts an empty string. HOT 5
- End of Line Parser HOT 2
- `anyToken` breaks source position state HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parsec.