Comments (12)
Is indentGuard
flexible enough to allow different styles of indentation? e.g., maybe you want space to normally be a whitespace, but not allow it in indentation. Or maybe you want the whitespace to be whatever sequence of (white) symbols (tabs/spaces), but fixed for any block of code.
I don't like the naming of integer
: in particular, there is a very strong consensus among mathematicians that an integer can be negative (i.e. is signed). Mathematicians call a positive integer a natural, and there is also Data.Natural. There's also Data.Word for an Int-sized positive number.
Similarly, I think a float
should be able to be negative, although this is less of an issue, since when you're going to deal with floating point numbers, you should really look at implementation details anyway. Then again, it outputs a Double
, which suggests it accepts negative input values.
In the same spirit, if you claim that a number
is either an Integer
or a Double
, then we should be talking about the signed variants.
Why can the new lexer read integer-style numbers with unlimited precision (Integer
), but floating point-style numbers only with finite precision (Double
)? Either:
- use
Int
instead ofInteger
- use the appropriate typeclass instead of
Integer
- use some kind of arbitrary precision real number instead of
Double
- emphasize that one is arbitrary size while the other is not
Regarding octal
: C people would say starting with 0
suffices. I think your documentation here is sufficiently clear about the difference, but it's worth mentioning somewhere, so here you go.
I like the new design and think it is much more generally useful. Indeed, essentially the only usage of the old TokenParser stuff is in example code, while this can be used more widely.
I read most of the code, and algebraically it looks correct. Thanks again for your work!
from megaparsec.
(I will address your points in separate comments.)
I think indentGuard
is flexible enough for vast majority of cases. After all you can supply custom parser to consume white space. This parser can only parse tabs or something like that.
It's currently not possible (using only built-in indentGuard
) to make sure that every indentation in indentation block consists of identical sequence of white space characters. To make this happen user will need to write about 4 lines of original Haskell I guess.
I would be much more concerned with the fact that tab-width is hard-coded in updatePosChar
function. Not sure how to make it configurable.
from megaparsec.
Idea to parse only unsigned values originated here:
it turns out that Haskell report doesn't say anything about sign in numeric literals, so “basic” versions (those that supposed to parse things according to Haskell report) don't parse sign. After all you may want to parse only positive numbers.
These functions should preferably anyway return values of signed types so they can be easily turned into parsers for signed numbers with help of signed
combinator.
About size of integers: this is mostly taken from Parsec without changes. Choice of data types is not that unusual: Integer
can be used to parse arbitrary sized integers, that is better than bounded Int
s. It can be downgraded after all, but Int
cannot be “upgraded”.
Use some kind of arbitrary precision real number instead of Double.
Haskell doesn't come with this sort of thing by default AFAIK. Otherwise yes, it would make sense given how float
parser is defined (it accepts unlimited row of digits in both whole and fractional parts).
from megaparsec.
About octal
: indeed different languages vary in these subtle details. I think I will make these parsers (octal
and hexadecimal
) parse “raw” values without prefixes, so programmer will be able to prefix it with any sort of parser. This will be more flexible.
from megaparsec.
Done in 3de3f69.
from megaparsec.
I just wanted to say that I'm trying this out, in max 2 weeks I should have some feedback.
I have to parse a pseudo-asm, so the parsing part is practically non-existent, it's all about lexing. Parsec eating newlines was troublesome because newline has to be used as separator between asm statements.
from megaparsec.
@doppioandante, great. Please post here descriptions of any difficulties that you experience, so we can improve design of the lexer if necessary.
from megaparsec.
Re: naturals vs integers.
base >= 4.8
has Numeric.Natural
, so you can use that.
from megaparsec.
It'd be pretty nice to be able to get
parens = between (symbol "(") (symbol ")")
braces = between (symbol "{") (symbol "}")
angles = between (symbol "<") (symbol ">")
brackets = between (symbol "[") (symbol "]")
and so on without defining them all manually every time, but there's an obstacle: all those definitions need to have access to the whitespace parser, which would result in a lot of passing-stuff-around a la Parsec:
parens = L.parens spaceConsumer
braces = L.braces spaceConsumer
angles = L.angles spaceConsumer
brackets = L.brackets spaceConsumer
This could probably be fixed by letting them get “whitespace configuration” (and possibly other things?) from MonadReader
, but then we make it impossible for others to use Reader
in the transformer stack. An alternative is defining our own MonadParserConfig
as a Reader
newtype, but that's too much and I'm not actually proposing for this to end up in the main library.
Just throwing this in because maybe somebody would have a better idea – and if so, I'd like to know about it.
from megaparsec.
@mrkkrp, the hassle is in the fact that once symbol
is defined, you also have to define
parens = between (symbol "(") (symbol ")")
where symbol
refers to your definition. So, everyone keeps writing parens = between (symbol "(") (symbol ")")
over and over, and yet this definition can't be reused.
Actually, I think it's something Backpack might fix once it's released, so maybe the question is moot.
from megaparsec.
@neongreen, I think the idea with “white space configuration” in any form is an unnecessary complication. We could provide definitions like parens
by default where they would take space consuming parser as argument. Yes, passing of the space consuming parser may be kind of boilerplate in most cases, but don't forget that it allows you to tune white space consumption policy on per-lexeme basis.
from megaparsec.
but don't forget that it allows you to tune white space consumption policy on per-lexeme basis
Yep, but so does the Reader
solution (with its local
function). I also think it could be interesting to be able to do the following:
-- This is possible already.
expr = ... -- parses “1+2+3”, “( 1 + 2 + 3 ) ”, etc
-- This is possible as well.
angles expr -- parses “<1+2+3>”, “< 1+2 +3 >”, etc
-- This is trivial with the Reader solution.
noSpaces expr -- parses “(1+2+3)”
-- but not “(1+(2+ 3))”
-- I have no idea how to achieve this, and it's probably useless.
(glued angles) expr -- parses “<1+2+3>”, “<1 + 2 + 3>”, etc
-- but not “<1+2+3 >” or “< 1+2+3>”
I think the idea with “white space configuration” in any form is an unnecessary complication.
Maybe. I'd like to once again stress that it's not something I'm proposing or even have any use for – I merely wonder how much flexibility can we get out of the lexer without it becoming too complicated/weird.
from megaparsec.
Related Issues (20)
- takeP fails with "unexpected end of input" for negative number of tokens
- tracing-megaparsec HOT 3
- Does there already exist functionality for lexing a little further to generate better "unexpected" error messages HOT 3
- Text.Megaparsec -- Running Parser HOT 6
- [proposed labels: question, feature request] best practices for stateful matching of simple patterns HOT 4
- Question: mergeError HOT 2
- Greedy combinators HOT 11
- Problematic `IsString` instance HOT 4
- Processing input prior to parsing while retaining source positions HOT 1
- Mention in documentation that `parse` is an alias for `runParser` HOT 1
- MonadAccum instance for ParsecT HOT 5
- Tabs are not handled correctly when errors are rendered HOT 4
- Indentation error lost in alternative HOT 3
- `local` clears all hints HOT 5
- 9.4 migration: getSourcePos now requires `Monad m =>` HOT 6
- Remove/upgrade version bounds of executable `test-debug` from megaparsec-tests HOT 2
- Add `drop` to `Text.Megaparsec` HOT 3
- Wrong source locations on `unexpected end of input` with custom tokens HOT 2
- get col, row (as Int's) and error message of the first TrivialError HOT 2
- deepseq-1.5? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from megaparsec.