Comments (11)
Hello. Yes, Text.Parsec.Token
is not very flexible. One of part of the problem is that it's designed to parse certain class of languages (Haskell-like, although at the same time it cannot deal with indentation). While you actually can have multiline string literal in Haskell, it requires ugly escaping, so if you want to have something like Common Lisp's multiline strings — there is no easy way to get it.
Apart from indentation (that certainly will involve working with state), most problems can be solved by replacing of string parameters with actual parsers. We will also need more parameters overall.
So, for example to solve your problem we could introduce the following parameters in LanguageDef
(GenLanguageDef
in Parsec):
stringStart
— a parser that parses start of string, e.g."
symbol.stringEnd
— a parser that parses end of string, e.g. again"
.- Since in most cases there will be need for escape characters, etc., this should be also configurable. We probably could have defaults similar to Parsecs', i.e. that parse things according to Haskell report.
There is also different topic that I've heard about when I studied all existing Parsec's issues in order to understand what users of Parsec want. Here is an interesting issue: haskell/parsec#15. People working on PureScript needed to implement doc-string similar to those used by Haddock. There is again no way to do it without copying entire Text.Parsec.Token
.
Initially I though that replacing of commentStart
and commentEnd
string fields with parser fields in LanguageDef
could solve the problem, but this would be not the best solution. You see, what if you would like to know what's inside of comments? Parsec just ignores them altogether. Maybe we could use state here and somehow record comments, so user can access them when needed? Then PureScripts devs could check if comment starts with a |
and they are done.
Intermediate conclusion is that we need:
- Many little parsers to control aspects of parsing of various tokens in details.
- Some sort of state that would include commonly useful things.
I'm not sure yet how exactly it will be implemented. Currently one contributor works on benchmarks and I myself work on Text.Megaparsec.Expr
(including tests, tests for everything else except for Text.Megaparsec.Expr
and Text.Megaparsec.Token
are done).
from megaparsec.
- Many parsers is a good idea. However the must be somehow composable and it must be possible to replace them. Which means you maybe have to put them all in a record (as currently done in token) and ensure that you can also replace the subparsers, e.g. like replace charLetter etc. An alternative would be to use higher order functions for all the literal parsers to make them more flexible.
- I think state should be done as it is done now in parsec with the state monad transformer. For what do we need additional state here? I think this issue is more an issue of composability of parsers.
from megaparsec.
@minad, Well I didn't say additional state. In fact, built-in Parsec's state is the only way to go because it's backtracking.
from megaparsec.
ok, then everything is fine :)
from megaparsec.
To point 1: that's what I meant, you have many little parsers and makeLexer
(makeTokenParser
in Parsec) will compose them to build things like stringLiteral
.
from megaparsec.
Yes, this is also how I understand it and what I proposed. However this might get very ugly.
from megaparsec.
We could provide default language definition so user would need to replace only some fields. This shouldn't get too ugly.
from megaparsec.
@minad, the feature you're requesting can be implemented with new lexer (see Text.Megaparsec.Lexer
in new-lexer
branch) as following:
stringLiteral = char '"' >> manyTill charLiteral (char '"')
New lexer is minimalistic and doesn't impose any assumptions on you, as you can see you can even have string literals that quoted differently. charLiteral
just helps to parse escape codes and other hairy stuff (by the way it uses built-in Haskell support without trying to re-implement the whole thing because it may get very buggy, I'm not sure this part of Parsec is bug-free), the rest is up to you. It's philosophy of the new lexer to give you more freedom in every aspect, although this means that you'll possibly need to write a bit more glue to make it work.
Note that charLiteral
doesn't parse quotes too, because some languages have different syntax for character literals, again to use it to parse character literals you need to handle these details manually.
You can criticize current decisions here: #28. I'll close this issue once new-lexer
branch is merged.
from megaparsec.
@mrkkrp great! :)
from megaparsec.
can you say something when this library will stabilize enough and be available at hackage? some kind of progress/status?
from megaparsec.
@minad I think it will be released by October.
from megaparsec.
Related Issues (20)
- [proposed labels: question, feature request] best practices for stateful matching of simple patterns HOT 4
- Question: mergeError HOT 2
- Greedy combinators HOT 11
- Problematic `IsString` instance HOT 4
- Processing input prior to parsing while retaining source positions HOT 1
- Mention in documentation that `parse` is an alias for `runParser` HOT 1
- MonadAccum instance for ParsecT HOT 5
- Tabs are not handled correctly when errors are rendered HOT 4
- Indentation error lost in alternative HOT 3
- `local` clears all hints HOT 5
- 9.4 migration: getSourcePos now requires `Monad m =>` HOT 6
- Remove/upgrade version bounds of executable `test-debug` from megaparsec-tests HOT 2
- Add `drop` to `Text.Megaparsec` HOT 3
- Wrong source locations on `unexpected end of input` with custom tokens HOT 2
- get col, row (as Int's) and error message of the first TrivialError HOT 2
- deepseq-1.5? HOT 1
- Error context with additional source positions HOT 4
- Parsing hexadecimal floats HOT 1
- Rewrite rules for parser primitives HOT 2
- How to use `Operator` from tutorial HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from megaparsec.