Hi, I would like to have such a feature, however using that in parse

Many parsers is a good idea. However the must be somehow composable and it must

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

To point 1: that's what I meant, you have many little parsers and <code class="notrans

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

multiline string literals about megaparsec HOT 11 CLOSED

mrkkrp commented on June 9, 2024

multiline string literals

from megaparsec.

Comments (11)

mrkkrp commented on June 9, 2024

Hello. Yes, Text.Parsec.Token is not very flexible. One of part of the problem is that it's designed to parse certain class of languages (Haskell-like, although at the same time it cannot deal with indentation). While you actually can have multiline string literal in Haskell, it requires ugly escaping, so if you want to have something like Common Lisp's multiline strings — there is no easy way to get it.

Apart from indentation (that certainly will involve working with state), most problems can be solved by replacing of string parameters with actual parsers. We will also need more parameters overall.

So, for example to solve your problem we could introduce the following parameters in LanguageDef (GenLanguageDef in Parsec):

stringStart — a parser that parses start of string, e.g. " symbol.
stringEnd — a parser that parses end of string, e.g. again ".
Since in most cases there will be need for escape characters, etc., this should be also configurable. We probably could have defaults similar to Parsecs', i.e. that parse things according to Haskell report.

There is also different topic that I've heard about when I studied all existing Parsec's issues in order to understand what users of Parsec want. Here is an interesting issue: haskell/parsec#15. People working on PureScript needed to implement doc-string similar to those used by Haddock. There is again no way to do it without copying entire Text.Parsec.Token.

Initially I though that replacing of commentStart and commentEnd string fields with parser fields in LanguageDef could solve the problem, but this would be not the best solution. You see, what if you would like to know what's inside of comments? Parsec just ignores them altogether. Maybe we could use state here and somehow record comments, so user can access them when needed? Then PureScripts devs could check if comment starts with a | and they are done.

Intermediate conclusion is that we need:

Many little parsers to control aspects of parsing of various tokens in details.
Some sort of state that would include commonly useful things.

I'm not sure yet how exactly it will be implemented. Currently one contributor works on benchmarks and I myself work on Text.Megaparsec.Expr (including tests, tests for everything else except for Text.Megaparsec.Expr and Text.Megaparsec.Token are done).

from megaparsec.

minad commented on June 9, 2024

Many parsers is a good idea. However the must be somehow composable and it must be possible to replace them. Which means you maybe have to put them all in a record (as currently done in token) and ensure that you can also replace the subparsers, e.g. like replace charLetter etc. An alternative would be to use higher order functions for all the literal parsers to make them more flexible.
I think state should be done as it is done now in parsec with the state monad transformer. For what do we need additional state here? I think this issue is more an issue of composability of parsers.

from megaparsec.

mrkkrp commented on June 9, 2024

@minad, Well I didn't say additional state. In fact, built-in Parsec's state is the only way to go because it's backtracking.

from megaparsec.

minad commented on June 9, 2024

ok, then everything is fine :)

from megaparsec.

mrkkrp commented on June 9, 2024

To point 1: that's what I meant, you have many little parsers and makeLexer (makeTokenParser in Parsec) will compose them to build things like stringLiteral.

from megaparsec.

minad commented on June 9, 2024

Yes, this is also how I understand it and what I proposed. However this might get very ugly.

from megaparsec.

mrkkrp commented on June 9, 2024

We could provide default language definition so user would need to replace only some fields. This shouldn't get too ugly.

from megaparsec.

mrkkrp commented on June 9, 2024

@minad, the feature you're requesting can be implemented with new lexer (see Text.Megaparsec.Lexer in new-lexer branch) as following:

stringLiteral = char '"' >> manyTill charLiteral (char '"')

New lexer is minimalistic and doesn't impose any assumptions on you, as you can see you can even have string literals that quoted differently. charLiteral just helps to parse escape codes and other hairy stuff (by the way it uses built-in Haskell support without trying to re-implement the whole thing because it may get very buggy, I'm not sure this part of Parsec is bug-free), the rest is up to you. It's philosophy of the new lexer to give you more freedom in every aspect, although this means that you'll possibly need to write a bit more glue to make it work.

Note that charLiteral doesn't parse quotes too, because some languages have different syntax for character literals, again to use it to parse character literals you need to handle these details manually.

You can criticize current decisions here: #28. I'll close this issue once new-lexer branch is merged.

from megaparsec.

minad commented on June 9, 2024

@mrkkrp great! :)

from megaparsec.

minad commented on June 9, 2024

can you say something when this library will stabilize enough and be available at hackage? some kind of progress/status?

from megaparsec.

mrkkrp commented on June 9, 2024

@minad I think it will be released by October.

from megaparsec.

multiline string literals about megaparsec HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs