GithubHelp home page GithubHelp logo

Comments (36)

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, You have proposed many useful things, without these propositions Megaparsec would be different. I consider it significant contribution to the project. I would like to include you into the list of contributors, should I include you as "Artyom"?

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

@mrkkrp, yep (and thanks). Just “Artyom” would be ambiguous, but since we're on Github anyway, you could just add a link to my Github nick:

Artyom (@neongreen)

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, Can you point out where (<|>) is defined for StateT as:

m <|> n = StateT $ \ s -> runStateT m s <|> runStateT n s

?

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

@mrkkrp, here <|> is defined as mplus, and here mplus is defined as follows:

m `mplus` n = StateT $ \ s -> runStateT m s `mplus` runStateT n s

So, yeah, the original definition uses mplus instead of <|>, but I didn't want to introduce extra baggage when pasting, so I rewrote it with <|>.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, I see. I will try to define mtl-style MonadParsec (although in Megaparsec we have more low-level functions than in Parsec (parsers is modeled after Parsec I guess) because I decided that we could define some of them clearer on lower level).

I'll try to eliminate user state, so user will need to combine monads to get backtracking state. This should work. Changes will live in monad-transformer branch until it's clear that everything is all right.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

I will also make combinators more general, many of them can work with Applicative and Alternative instances, not necessarily with MonadParsec instances.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

I had to make other minor changes to make it work smoothly. Also, it seems we could put functions to get current state (parser state, position in input stream, etc) into the class too. This way most part of the library will be defined for any instance of MonadParsec. Tests will need to be corrected too. This requires more work then it seemed.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

OK, this is almost done. I want to re-work some tests for Text.Megaparsec.Prim though. I think it's good idea to add tests that check backtracking abilities of StateT MyStateType Parser a monad combination. Once it's done I'll organize and push my changes.

@neongreen, I'll ask you to try the improvements and see if Megaparsec is flexible enough now.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, please try out code in monad-transformer branch. I also ask you to write one or two tests that you deem realistic and practical to test backtracking state achieved by the method you mention. I suggest you first test this locally and tell me about results, then I merge this into master. Then you can open a pull request adding the tests.

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

Okay, here go some observations.

  1. <?> should be moved out of the class, I think. Alternatively, there should be a MINIMAL pragma saying that only one of {label, <?>} is needed. Justification: I got bit by this when I was defining an instance of MonadParsec, forgot about both label and <?>, GHC didn't warn me (because they both have default definitions), and as the result I spent 10m trying to find out why my parser was going in an endless loop.

  2. We need instances for MonadParsec s (WriterT w m) c and so on. Here's a sample instance for WriterT:

    instance (Monoid w, MonadParsec s m c) => MonadParsec s (WriterT w m) c where
      unexpected x        = lift $ unexpected x
      eof                 = lift $ eof
      token x y           = lift $ token x y
      tokens x y z        = lift $ tokens x y z
      getParserState      = lift $ getParserState
      updateParserState f = lift $ updateParserState f
      try           (WriterT m) = WriterT $ try m
      label x       (WriterT m) = WriterT $ label x m
      lookAhead     (WriterT m) = WriterT $ lookAhead m
      notFollowedBy (WriterT m) = WriterT $
        notFollowedBy (fst <$> m) >>= \x -> return (x, mempty)

    The same instance in parsers has {-# INLINE ... #-} for all parsers, so maybe we should inline all the things too. I'll try to benchmark it.

  3. It's more annoying now to write generic parsers. Before:

    whitespace :: Parser ()
    whitespace = void (some (char ' '))

    After:

    whitespace :: MonadParsec s m Char => m ()
    whitespace = void (some (char ' '))

    Small as the difference may be, it might push people towards reusing less code than they used to, and it's bad. I'd like both Text.Megaparsec.String and Text.Megaparsec.Text to export a Parsing that would be respectively MonadParsec String m Char and MonadParsec Text m Char under the hood (after all, they already export different Parsers that are named the same). This would make the code look like this:

    whitespace :: Parsing m => m ()
    whitespace = ...
    
    inParens :: Parsing m => m a -> m a
    inParens = ...

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

Oh, and another observation: I had to write

warnParse :: WarnParser a -> SourceName -> Text -> Either ParseError (a, [String])
warnParse p src s = parse (runWriterT p) src s

and it feels like it's something that should've been done automatically, but I have no idea how to actually accomplish this.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

I agree with points 1 and 2. As for 3, I think end user can define Parsing type or something like that if he thinks it's necessary. Mostly I don't see any problem in MonadParsec s m Char => m () stuff.


I currently don't think it should be done automatically. You're trying to run code inside monad and usually if you have stack of monads you need to nest functions like parse and runWriterT accordingly. Some functions like runStateT will need to accept additional arguments like initial state, so I don't think all these cases can be handled in uniform (non-hackish) way.

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

Some functions like runStateT will need to accept additional arguments like initial state, so I don't think all these cases can be handled in uniform (non-hackish) way.

Okay, you're right.


As for 3, I think end user can define Parsing type or something like that if he thinks it's necessary.

Sure, but at least in my case I'm probably going to be defining Parsing literally every time I use Megaparsec with a custom stack, and unlike parse it can be done generically in a non-hackish way. I guess GenParser was included into original parsec for the same reason. (By the way, am I right that GenParser shouldn't be in the monad-transformer branch at all?)

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, OK, I'll add Parsing and GenParser is indeed should disappear, just overlooked that.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, To write something like this:

type Parsing m = MonadParsec C.ByteString m Char

You need to enable ConstraintKinds extension. Because it's synonym for a constraint, not type. This once more makes me think that this is not entirely necessary.

I think we will do it this way: for now Parsing won't be added. Then in future releases this might be reconsidered. After all it saves not much typing.

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

Okay.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, There will be two instances for WriterT: Lazy.WriterT and Strict.WriterT, by the way.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, about inlining: honestly, I think GHC will inline all this stuff anyway. So, you probably won't be able to find any performance difference.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, I've committed all the changes, so you can try again. Is there anything else you want to tell me?

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

I've committed all the changes, so you can try again.

Okay, thanks. I'll start writing tests now.

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

@mrkkrp, I have just realised that Parsec's lookAhead and Megaparsec's lookAhead have different semantics – Parsec's lookAhead reverts changes made by the parser to backtracking state, Megaparsec's lookAhead doesn't. It won't matter in most cases when lookAhead is used; however, it might be useful to have a version of lookAhead that adheres to the intuitive definition of “run the parser but don't let it change anything whatsoever”. Would it be easy to write? (I don't know Parsec's internals well enough to answer this question on my own. It may even be impossible, I don't know.)

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, parsers seems to be brainy about the stuff. How does lookAhead work there?

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

@mrkkrp, I already checked – it isn't being brainy in this particular case, unfortunately.

import Text.Parser.LookAhead
import Text.Parser.Combinators
import Text.Parsec.String
import Text.Parsec (parse, runParser, Parsec, setState, getState)
import qualified Control.Monad.State as State

tParsers = parse (State.evalStateT p 0) "" ""
  where p :: State.StateT Integer Parser Integer
        p = do
          State.put 0
          _ <- lookAhead (State.put 1 >> eof)
          State.get

tParsec = runParser p 0 "" ""
  where p = do
          setState 0
          _ <- lookAhead (setState 1 >> eof)
          getState
> tParsers
Right 1

> tParsec
Right 0

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, ah I see they just use Parsec's version without reimplementing it.

instance (Parsec.Stream s m t, Show t) => LookAheadParsing (Parsec.ParsecT s u m) where
  lookAhead = Parsec.lookAhead

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, I think Megaparec should revert backtracking state, you're right. I'll see how to fix it. Make sure you include something in your tests to check this feature in future.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, Have you tried with simple alternatives à la p <|> n? Does it work as expected?

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

@mrkkrp, yep, it works alright. E.g. the modified prop_user_backtrack test (that already was there) passes:

prop_user_backtrack :: Integer -> Integer -> Property
prop_user_backtrack n m = runParser (State.evalStateT p 0) "" "" === Right n
  where p = do
          State.put n
          (State.put m >> fail "failed") <|> return ()
          State.get

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

OK, first of all, Megaparsec's and Parsec's lookAhead are essentially identical:

-- Megaparsec

pLookAhead :: ParsecT s m a -> ParsecT s m a
pLookAhead p = ParsecT $ \s _ cerr eok eerr ->
  let eok' a _ _ = eok a s mempty
  in unParser p s eok' cerr eok' eerr

-- Parsec

lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
lookAhead p =
    ParsecT $ \s _ cerr eok eerr -> do
        let eok' a _ _ = eok a s (newErrorUnknown (statePos s))
        unParser p s eok' cerr eok' eerr

Another thing is Megaparsec currently has no built-in user state. But this should not matter. As your little experiment demonstrates, while Parsec's lookAhead generally works OK, that version with StateT transformer doesn't work so great. So I think the problem is in definition of lookAhead for StateT.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, This means that all these definitions of WriterT, StateT, etc. in parsers are not necessarily correct with respect this sort of corner case. We should perhaps correct them in Megaparsec instead of copying them.

Maybe we should also open an issue on GitHub page of parsers library.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, We also should tests all these definitions too, I think. Don't worry about that, I will fix it and later add new tests myself.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, Try with this definition (with TupleSections extension):

instance (MonadPlus m, MonadParsec s m t) =>
         MonadParsec s (L.StateT e m) t where
  label n       (L.StateT m) = L.StateT $ \s -> label n (m s)
  try           (L.StateT m) = L.StateT $ try . m
  lookAhead     (L.StateT m) = L.StateT $ \s ->
    (,s) . fst <$> lookAhead (m s)
  notFollowedBy (L.StateT m) = L.StateT $ \s ->
    notFollowedBy (fst <$> m s) >> return ((),s)
  unexpected                 = lift . unexpected
  eof                        = lift eof
  token  f e                 = lift $ token  f e
  tokens f e ts              = lift $ tokens f e ts
  getParserState             = lift getParserState
  updateParserState f        = lift $ updateParserState f

If this passes the test I will commit necessary corrections.

from megaparsec.

abooij avatar abooij commented on May 31, 2024

I don't have anything to add, but it looks like you two are doing pretty good work - thanks!

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, I checked it and it seems to work OK. I've committed the changes so this bug should be eliminated by now.

from megaparsec.

neongreen avatar neongreen commented on May 31, 2024

@mrkkrp, I have the following (passing) tests that use lookAhead, notFollowedBy, and <|>:

-- Monad transformers + Parser

prop_StateT :: Integer -> Integer -> Property
prop_StateT n m = checkParser (State.evalStateT p 0) (Right (n+m)) ""
  where p = do
          State.put n
          State.modify (+ m)
          State.get

prop_StateT_backtrack :: Integer -> Integer -> Property
prop_StateT_backtrack n m = checkParser (State.evalStateT p 0) (Right n) ""
  where p = do
          State.put n
          (State.put m >> fail "failed") <|> return ()
          State.get

-- See <https://github.com/mrkkrp/megaparsec/issues/27#issuecomment-141785141>.
prop_StateT_lookAhead :: Integer -> Integer -> Property
prop_StateT_lookAhead n m = checkParser (State.evalStateT p 0) (Right n) ""
  where p = do
          State.put n
          lookAhead (State.put m >> eof)
          State.get

prop_WriterT :: String -> String -> Property
prop_WriterT pre post = checkParser (Writer.runWriterT p) result "abx:"
  where logged_letter = do
          x <- letterChar
          Writer.tell [x]
          return x
        logged_colon = do
          x <- char ':'
          Writer.tell [x]
          return x
        p = do
          Writer.tell pre
          cs <- Writer.censor (map toUpper) $
                  many (try (logged_letter <* notFollowedBy logged_colon))
          Writer.tell post
          _ <- logged_letter
          _ <- logged_colon
          return cs
        result = Right ("ab", pre ++ "AB" ++ post ++ "x:")

Not claiming the last one is particularly realistic :)

I have a question: how should label be tested, if at all? I.e. can Megaparsec's new error reporting system somehow interfere with monad transformers?

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

@neongreen, There are two cases when label has effect:

  1. When parser p succeeds without consuming input and next parser n fails immediately after p without consuming input. “Label” of p will be displayed in the error message.
  2. When labeled parser fails without consuming input.

See existing prop_label for example. Although it's a bit dense in that it tests many behaviors at once, so you should not necessarily take it as base for new tests.


Also, prop_StateT has nothing to do with MonadParsec, it tests features of StateT, which we can safely assume work OK. Two other tests for StateT are OK. Please add one more to test notFollowedBy, I think it's easy. Other functions all should be tested too, preferably…

I advise you enable coverage statistics and see what else you can test. It's a good idea to test everything, although it's more than I can ask from you. Try at least test all functions in instance declaration of StateT. Also, you can have two copies of the same tests, one for lazy version of state monad and another one for strict version.

Coverage dropped because of these new changes and I think we should restore it at least on 80 % level. Initially I naively thought that these “boilerplate” definitions are not necessary to test, but after this issue with lookAhead I think everything should be tested.

from megaparsec.

mrkkrp avatar mrkkrp commented on May 31, 2024

I suppose this may be closed now.

from megaparsec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.