GithubHelp home page GithubHelp logo

multiline string literals about megaparsec HOT 11 CLOSED

mrkkrp avatar mrkkrp commented on June 9, 2024
multiline string literals

from megaparsec.

Comments (11)

mrkkrp avatar mrkkrp commented on June 9, 2024

Hello. Yes, Text.Parsec.Token is not very flexible. One of part of the problem is that it's designed to parse certain class of languages (Haskell-like, although at the same time it cannot deal with indentation). While you actually can have multiline string literal in Haskell, it requires ugly escaping, so if you want to have something like Common Lisp's multiline strings — there is no easy way to get it.

Apart from indentation (that certainly will involve working with state), most problems can be solved by replacing of string parameters with actual parsers. We will also need more parameters overall.

So, for example to solve your problem we could introduce the following parameters in LanguageDef (GenLanguageDef in Parsec):

  • stringStart — a parser that parses start of string, e.g. " symbol.
  • stringEnd — a parser that parses end of string, e.g. again ".
  • Since in most cases there will be need for escape characters, etc., this should be also configurable. We probably could have defaults similar to Parsecs', i.e. that parse things according to Haskell report.

There is also different topic that I've heard about when I studied all existing Parsec's issues in order to understand what users of Parsec want. Here is an interesting issue: haskell/parsec#15. People working on PureScript needed to implement doc-string similar to those used by Haddock. There is again no way to do it without copying entire Text.Parsec.Token.

Initially I though that replacing of commentStart and commentEnd string fields with parser fields in LanguageDef could solve the problem, but this would be not the best solution. You see, what if you would like to know what's inside of comments? Parsec just ignores them altogether. Maybe we could use state here and somehow record comments, so user can access them when needed? Then PureScripts devs could check if comment starts with a | and they are done.

Intermediate conclusion is that we need:

  1. Many little parsers to control aspects of parsing of various tokens in details.
  2. Some sort of state that would include commonly useful things.

I'm not sure yet how exactly it will be implemented. Currently one contributor works on benchmarks and I myself work on Text.Megaparsec.Expr (including tests, tests for everything else except for Text.Megaparsec.Expr and Text.Megaparsec.Token are done).

from megaparsec.

minad avatar minad commented on June 9, 2024
  1. Many parsers is a good idea. However the must be somehow composable and it must be possible to replace them. Which means you maybe have to put them all in a record (as currently done in token) and ensure that you can also replace the subparsers, e.g. like replace charLetter etc. An alternative would be to use higher order functions for all the literal parsers to make them more flexible.
  2. I think state should be done as it is done now in parsec with the state monad transformer. For what do we need additional state here? I think this issue is more an issue of composability of parsers.

from megaparsec.

mrkkrp avatar mrkkrp commented on June 9, 2024

@minad, Well I didn't say additional state. In fact, built-in Parsec's state is the only way to go because it's backtracking.

from megaparsec.

minad avatar minad commented on June 9, 2024

ok, then everything is fine :)

from megaparsec.

mrkkrp avatar mrkkrp commented on June 9, 2024

To point 1: that's what I meant, you have many little parsers and makeLexer (makeTokenParser in Parsec) will compose them to build things like stringLiteral.

from megaparsec.

minad avatar minad commented on June 9, 2024

Yes, this is also how I understand it and what I proposed. However this might get very ugly.

from megaparsec.

mrkkrp avatar mrkkrp commented on June 9, 2024

We could provide default language definition so user would need to replace only some fields. This shouldn't get too ugly.

from megaparsec.

mrkkrp avatar mrkkrp commented on June 9, 2024

@minad, the feature you're requesting can be implemented with new lexer (see Text.Megaparsec.Lexer in new-lexer branch) as following:

stringLiteral = char '"' >> manyTill charLiteral (char '"')

New lexer is minimalistic and doesn't impose any assumptions on you, as you can see you can even have string literals that quoted differently. charLiteral just helps to parse escape codes and other hairy stuff (by the way it uses built-in Haskell support without trying to re-implement the whole thing because it may get very buggy, I'm not sure this part of Parsec is bug-free), the rest is up to you. It's philosophy of the new lexer to give you more freedom in every aspect, although this means that you'll possibly need to write a bit more glue to make it work.

Note that charLiteral doesn't parse quotes too, because some languages have different syntax for character literals, again to use it to parse character literals you need to handle these details manually.

You can criticize current decisions here: #28. I'll close this issue once new-lexer branch is merged.

from megaparsec.

minad avatar minad commented on June 9, 2024

@mrkkrp great! :)

from megaparsec.

minad avatar minad commented on June 9, 2024

can you say something when this library will stabilize enough and be available at hackage? some kind of progress/status?

from megaparsec.

mrkkrp avatar mrkkrp commented on June 9, 2024

@minad I think it will be released by October.

from megaparsec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.