GithubHelp home page GithubHelp logo

parsec's Introduction

Parsec Build Status

Please refer to the package description on Hackage for more information.

A monadic parser combinator library, written by Daan Leijen. Parsec is designed from scratch as an industrial-strength parser library. It is simple, safe, well documented, has extensive libraries, good error messages, and is fast.

Some links:

By analyzing Parsec's reverse dependencies on Hackage we can find open source project that make use of Parsec. For example bibtex, ConfigFile, csv and hjson.

Getting started

This requires a working version of cabal and ghci, which are part of any modern installation of Haskell, such as Haskell Platform.

First install Parsec.

cabal install parsec

Below we show how a very simple parser that tests matching parentheses was made from GHCI (the interactive GHC environment), which we started with the ghci command).

Prelude> :m +Text.Parsec
Prelude Text.Parsec> let parenSet = char '(' >> many parenSet >> char ')' :: Parsec String () Char
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package array-0.5.0.0 ... linking ... done.
Loading package deepseq-1.3.0.2 ... linking ... done.
Loading package bytestring-0.10.4.0 ... linking ... done.
Loading package mtl-2.1.3.1 ... linking ... done.
Loading package text-1.1.1.3 ... linking ... done.
Loading package parsec-3.1.5 ... linking ... done.
Prelude Text.Parsec> let parens = (many parenSet >> eof) <|> eof
Prelude Text.Parsec> parse parens "" "()"
Right ()
Prelude Text.Parsec> parse parens "" "()(())"
Right ()
Prelude Text.Parsec> parse parens "" "("
Left (line 1, column 2):
unexpected end of input
expecting "(" or ")"

The Right () results indicate successes: the parentheses matched. The Left [...] result indicates a parse failure, and is detailed with an error message.

For a more thorough introduction to Parsec we recommend the links at the top of this README file.

Contributing

Issues (bugs, feature requests or otherwise feedback) may be reported in the Github issue tracker for this project.

Pull-requests are also welcome.

License

See the LICENSE file in the repository.

parsec's People

Contributors

alissa-tung avatar aslatter avatar barufa avatar benpence avatar bfrengley avatar bgamari avatar bodigrim avatar bookshelfdave avatar cdepillabout avatar chris-martin avatar code5hot avatar creswick avatar daniel-diaz avatar derekelkins avatar gbaz avatar hdgarrood avatar hvr avatar int-index avatar liyishuai avatar michaelficarra avatar phadej avatar ret avatar ryanglscott avatar shuhei avatar simonvandel avatar sjakobi avatar slava-sh avatar talw avatar unkindpartition avatar wz1000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsec's Issues

identifiers in Text.Parsec.Language not polymorphic enough

we have

emptyDef :: LanguageDef st 
type LanguageDef st = GenLanguageDef String st Identity
makeTokenParser :: Stream s m Char => GenLanguageDef s u m -> GenTokenParser s u m 

This means that this can only be used for String parsers, and the following is ill-typed (whatever the contents of { ... }

lexer :: GenTokenParser ByteString () Identity
lexer = makeTokenParser $ emptyDef { ... }

integer dosen't fail for leading white spaces

integer doesn't behave properly for input with leading white spaces.

import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (emptyDef)

lexer =
  P.makeTokenParser emptyDef

integer    = P.integer lexer
natural    = P.natural lexer

integer' :: Parser Integer
integer' = do
  f <- option id sign'
  n <- natural
  return $ f n

sign' :: Parser (Integer -> Integer)
sign' = (char '-' >> return negate)
     <|> (char '+' >> return id)

main = do
  print . (parse natural "input") $ "1"
  print . (parse integer "input") $ "1"
  print . (parse integer' "input") $ "1"

  print . (parse natural "input") $ " 1"
  print . (parse integer "input") $ " 1"
  print . (parse integer' "input") $ " 1"

  print . (parse natural "input") $ " 1"
  print . (parse integer "input") $ " +1"
  print . (parse integer' "input") $ " +1"

The prime version (integer') is provided to illustrate the expected behavior. The output of above program is:

Right 1
Right 1
Right 1
Left "input" (line 1, column 1):
unexpected " "
expecting natural
Right 1
Left "input" (line 1, column 1):
unexpected " "
expecting "-", "+" or natural
Left "input" (line 1, column 1):
unexpected " "
expecting natural
Left "input" (line 1, column 2):
unexpected "+"
expecting digit
Left "input" (line 1, column 1):
unexpected " "
expecting "-", "+" or natural

The integer parser doesn't fail for <space>1, while it should, and it gives misleading error message for <space>+1. On the other hand, the error message for integer' is more understandable.

Missing Eq instance for ParseError

If you want to test a parser (e.g. with QuickCheck or HUnit) you need an Eq instance for ParseError. This comes up every now and then. Is there any good reason why we would not want an Eq instance for ParseError?

More flexible support for comments in Text.Parsec.Token

We are currently using Text.Parsec.Token for the parser in PureScript, and we are trying to find a way to handle doc string comments.

The current approach is less than ideal - we need to replicate the entire Text.Parsec.Token module, and change one line inside oneLineComment to handle the | character at the start of the comment which indicates a doc string.

I would like to submit a PR which allowed the user to specify a parser instead of a String for things like commentLine and commentStart. This would be backwards-compatible in the sense that if the user only specified a String, the old approach would be used.

I just wanted to ask, before starting work on this, if it is the sort of thing that is likely to be accepted.

the name of `notFollowedBy` is not consistent with the semantics of the other parsers

Just a small thing if you want, i'm pointing this out while approaching Parsec, before getting used to this.

Usually a parser's name refers to the current parser input. char 'a' will parse 'a' and so on.

If we read notFollowedBy with the same semantics, a parser like notFollowedBy '.' should correctly parse "." because "." is not followed by a dot. It should fail for "a.", instead.

In this sense a parser called notFollowedBy should probably get two parsers as arguments, like manyTill etcetera. I find that the current semantic of notFollowedBy is closer to a simple not.

Import error of parsec 3.1.11

Hi, I try to use parsec in a new project generated by stack new.

here the exact error:

   Main.hs:1:1: error:
    Failed to load interface for ‘Text.ParserCombinators.Parsec’
    It is a member of the hidden package ‘parsec-3.1.11’.
    Use -v to see a list of the files searched for.

Here my .cabal file:

name:                haskell-scheme
version:             0.1.0.0
-- synopsis:
-- description:
homepage:            https://github.com/githubuser/haskell-scheme#readme
license:             BSD3
license-file:        LICENSE
author:              Author name here
maintainer:          [email protected]
copyright:           2017 Author name here
category:            Web
build-type:          Simple
cabal-version:       >=1.10
extra-source-files:  README.md

executable haskell-scheme
  hs-source-dirs:      src
  main-is:             Main.hs
  default-language:    Haskell2010
  build-depends:       base >= 4.7 && < 5, parsec, text

and my stack.yaml

# This file was automatically generated by 'stack init'
#
# Some commonly used options have been documented as comments in this file.
# For advanced use and comprehensive documentation of the format, please see:
# http://docs.haskellstack.org/en/stable/yaml_configuration/

# Resolver to choose a 'specific' stackage snapshot or a compiler version.
# A snapshot resolver dictates the compiler version and the set of packages
# to be used for project dependencies. For example:
#
# resolver: lts-3.5
# resolver: nightly-2015-09-21
# resolver: ghc-7.10.2
# resolver: ghcjs-0.1.0_ghc-7.10.2
# resolver:
#  name: custom-snapshot
#  location: "./custom-snapshot.yaml"
resolver: lts-8.13

# User packages to be built.
# Various formats can be used as shown in the example below.
#
# packages:
# - some-directory
# - https://example.com/foo/bar/baz-0.0.2.tar.gz
# - location:
#    git: https://github.com/commercialhaskell/stack.git
#    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
# - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
#   extra-dep: true
#  subdirs:
#  - auto-update
#  - wai
#
# A package marked 'extra-dep: true' will only be built if demanded by a
# non-dependency (i.e. a user package), and its test suites and benchmarks
# will not be run. This is useful for tweaking upstream packages.
packages:
- '.'
# Dependency packages to be pulled from upstream that are not in the resolver
# (e.g., acme-missiles-0.3)
extra-deps: [parsec-3.1.11]

# Override default flag values for local packages and extra-deps
flags: {}

# Extra package databases containing global packages
extra-package-dbs: []

# Control whether we use the GHC we find on the path
# system-ghc: true
#
# Require a specific version of stack, using version ranges
# require-stack-version: -any # Default
# require-stack-version: ">=1.4"
#
# Override the architecture used by stack, especially useful on Windows
# arch: i386
# arch: x86_64
#
# Extra directories used by stack for building
# extra-include-dirs: [/path/to/dir]
# extra-lib-dirs: [/path/to/dir]
#
# Allow a newer minor version of GHC than the snapshot specifies
# compiler-check: newer-minor

Parsec returns additional, spurious error messages on parserFail

Consider the following code entered in ghci:

Parsec.parse (Parsec.many1 Parsec.digit <* Parsec.parserFail "foo") "" "123"

Here is the result:

Left (line 1, column 4):
unexpected end of input
expecting digit
foo

The error messages "unexpected end of input" and "expecting digit" should not be there.

This is simplified from a more involved example in which the result of parsing a number is checked, and parserFail is called if the number is out of bounds.

MissingH 1.4.0.1 and parsec 3.1.12.0 are not compatible

MissingH 1.4.0.1 and parsec 3.1.12.0 are not compatible because the Text.ParserCombinators.Parsec module is no longer Safe. It used to be Safe in version 3.1.11.

This incompatibility breaks upstream packages such as Gitit (see comment by cidig on 2018-02-04).

Issue in MissingH repo: haskell-hvr/missingh#42

src/Text/ParserCombinators/Parsec/Utils.hs:33:1: error:
    Text.ParserCombinators.Parsec: Can't be safely imported!
    The module itself isn't safe.
   |
33 | import Text.ParserCombinators.Parsec
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Argument order in sepBy is ambiguous

Here is the type

sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

Usually when parsing some tokens the separator parser results in same type of value as value parser, i.e. type var a and sep will be same type, which means user can write incorrect code for example:
sepBy separatorParser valueParser vs sepBy valueParser separatorParser

It means user needs to look at the type variables to understand in which order values should be passed. But if we change type to:

sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m () -> ParsecT s u m [a]

Then the incorrect code from above will not compile. As a result it would be way less likely to mess with the argument order. (unless you want to get [()] as result).

Downside would be the need to do extra $> () or *> return () in the end of separator parser.

javaStyle defined as case insensitive

In Text.Parsec.Language the definition for the javaStyle is:

javaStyle  :: LanguageDef st
javaStyle   = emptyDef
                { commentStart   = "/*"
                , commentEnd     = "*/"
                , commentLine    = "//"
                , nestedComments = True
                , identStart     = letter
                , identLetter    = alphaNum <|> oneOf "_'"
                , reservedNames  = []
                , reservedOpNames= []
                , caseSensitive  = False -- <==
}

Which defines java style to be case insensitive, which is very confusing as most java style languages are in fact case sensitive. Not sure if this is an oversight or me misunderstanding the intention of this language definition.

Bad error handling: unhelpful "unexpected" error message when parsing set of strings

A parser like

parser :: GenParser Char st String
parser = choice $ fmap (try . string) ["head", "tail", "tales"]

with malformed input "ta " will produce an error message like unexpected "t" and point to the first position of the input. Better output would be unexpected " " (and possibly a list of expected outputs, ["tails", "tales"] in this case).

Please see this much more detailed StackOverflow answer/analysis written by CR Drost (starting with the second headline).

Expose ParsecT constructor or implement hoisting

Basically I need a function with the following signature:

hoistParsecT :: (m a -> n a) -> (n a -> m a) -> ParsecT s u m a -> ParsecT s u n a

I could do that if the constructor of ParsecT was available, is there a reason it is not exported?

parser never returns

This is a primitve csv parser which skips intermediate blank lines:

parser :: Stream s m Char => ParsecT s u m [[String]]
parser = line `sepEndBy` (some endOfLine) <* eof
  where line = many (noneOf ",\n\r") `sepBy` char ','

and looks like it works:

\> parse parser "" "1,2,3\n\n\n4,5,6"
Right [["1","2","3"],["4","5","6"]]

but if i change some to many:

parser :: Stream s m Char => ParsecT s u m [[String]]
parser = line `sepEndBy` (many endOfLine) <* eof
  where line = many (noneOf ",\n\r") `sepBy` char ','

it will never return:

\> parse parser "" "1,2,3\n\n\n4,5,6"
^CInterrupted.

Why does Parsec change the fixity of `<|>`?

In Control.Applicative, <|> is defined as having fixity infixl 3, but in Parsec it is defined as infixr 1. Is there a reason for this difference? Should Parsec be changed to match?

Applicative interface leaks memory

Using *> instead of >> makes parsers use more memory. I suggest adding these definitions to the Functor instance for ParsecT:

(*>) = (>>)
p1 <* p2 = do { x1 <- p1 ; p2 ; return x1 }

Below is a program that demonstrates the problem. With the current Parsec behaviour, you get this memory profile. You can see that lots of heap space is used, but when you force the output of the parser most of it goes away.
appleak-1
Using >> instead of *> (i.e. what you should get with the suggested change) gives this nicer profile. (Note the change in the vertical scale.)
appleak-2

Here is the program that generates the profiles.

import Control.Applicative ((<*), (*>), (<$>))
import Control.Monad
import System.Environment
import Text.Parsec
import Text.Parsec.String

-- Instructions to reproduce graphs:
--
-- First, load this file in ghci and run "generateData".  This will
-- create a file called "ab-input".  Then:
--
--   ghc -O AppLeak -prof -fprof-auto -rtsopts
--   ./AppLeak 1 +RTS -h -i0.03 -RTS < ab-input
--   hp2ps -c AppLeak.hp && mv AppLeak.ps AppLeak-1.ps
--   ./AppLeak 2 +RTS -h -i0.03 -RTS < ab-input
--   hp2ps -c AppLeak.hp && mv AppLeak.ps AppLeak-2.ps
--
-- AppLeak-1.ps shows the heap profile with the leak; AppLeak-2.ps
-- shows the well-behaved profile.
--
--
-- Adding explicit definitions of <* and *> to the Functor instance
-- for ParsecT will fix the problem:
--
--    instance Applicative.Applicative (ParsecT s u m) where
--        pure = return
--        (<*>) = ap -- TODO: Can this be optimized?
--        (*>) = (>>)
--        p1 <* p2 = do { x1 <- p1 ; p2 ; return x1 }


-- Some whitespace, followed by one or more 'a's, then one 'b'.
ab :: Parser Char
ab = spaces *> many1 (char 'a') *> char 'b'

-- Same as "ab" above, but with ">>" instead of "*>".
ab2 :: Parser Char
ab2 = spaces >> many1 (char 'a') >> char 'b'

main = do
  args <- getArgs
  selection <-
    if length args > 0 && args !! 0 == "2"
    then putStrLn "using monadic >>" >> return ab2
    else putStrLn "using applicative *>" >> return ab
  c <- getContents
  case parse (many selection <* eof) "" c of
    Left e -> print e
    Right xs -> do
         putStrLn "Parsing finished, forcing result..."
         print $ length $ show xs
         putStrLn "Delaying..."
         delay
         -- We do this again so that "xs" doesn't get garbage
         -- collected before the delay.
         print $ length $ show xs

-- Busy-wait to make the heap profile graph clearer.
delay = print $ fib 32
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

-- Generate some sample data.
generateData = writeFile "ab-input" (concat (replicate 100000 "aabaabababab"))

ByteString parser improvements

ByteString parser seems to be not usable because there are 0 parser functions operating on ByteString. When we operate on strings of chars we have useful parsers like "oneOf", "digit", "spaces". It would be nice to have something like "zeros"(for bytes filled with zeros), "bigEndiandInt32", "byte" to consume bytes from the ByteString stream and write the results into appropriate data types. Something similar was done in "HCodecs"(and many other places where people try to handle some specific file formats). Parsec would be a more natural place to look for such convenient general functions.

lookAhead and error propagation

lookAhead p is a parser that calls p and if it succeeds then lookAhead p succeeds too, but it also rewinds the input stream back to where it was before p was tried. This was implemented in 3839639 by @feuerbach. I claim that this is not enough, lookAhead p should also discard any error messages generated by p. This is best illustrated with an example. Suppose that we have the following:

p = lookAhead (many1 $ char 'a') >> char 'b'

Then parse p "" "a" should fail with the following error message:

Left (line 1, column 1):
unexpected "a"
expecting "b"

but currently it fails with the following:

Left (line 1, column 2):
unexpected end of input
expecting "a"

In other words, if p succeeds, then lookAhead p should behave as if p never occurred, both in terms of the parser state and in terms of the errors generated.

Should include a Monoid instance

The following is useful:

instance Monoid a => Monoid (ParsecT s u m a) where
    mempty = pure mempty
    a `mappend` b = mappend <$> a <*> b

It allows me to write things like this:

time :: Parser TimeOfDay
time = parseTimeM False defaultTimeLocale "%H:%M:%S" =<<
    count 2 digit <> string ":" <> count 2 digit <> string ":" <> count 2 digit

Please consider including this instance.

I would submit a PR but you probably want to choose where the instance is defined.

Derive a Data instance for more types

The Data class is, while not ubiquitous, quite useful in a variety of settings. It would be elementary to add a deriving Data clause to (at least) the types for which Typeable is already derived. Would you accept a pull request doing so?

It seems like it would also be a good idea to derive Generic for most types.

conflict between infix and prefix ops in `Text.Parser.Expr`

The function buildExpressionParser in module Text.Parser.Expr has a bug.
In the uncommon case where a prefix operator has lower precedence than an infix operator, such as in boolean expressions, e.g

  • && is an infix operator.
  • ! is a prefix operator of lower precedence.

The input !a && b is correctly parsed Not (And a b), but the parser fails to parse a && !b (as And a (Not b)).

Allow more expressive functions to be given as input in operator tables for `buildExpressionParser`

Currently, the Operator constructors used for building expression parsers are required to be pure functions wrapped in the ParsecT monad, i.e.:

Infix (ParsecT s u m (a -> a -> a)) Assoc    
Prefix (ParsecT s u m (a -> a))  
Postfix (ParsecT s u m (a -> a))

This however limits what can be done with the operators, for example it is not possible to perform additional checks of the parsed argument under the ParsecT monad and then fail using fail or unexpected, or build up new information in the complete term using the parsed sub-terms.

I would therefore like to ask for a feature request that allows taking more expressive functions where only the resulting term needs to be in the ParsecT monad. In order to not break backwards compatibility, one could imagine having an M postfix to each new constructor, so the following constructors are added:

InfixM (a -> a -> ParsecT s u m a) Assoc     
PrefixM (a -> ParsecT s u m a)   
PostfixM (a -> ParsecT s u m a)

`decimal`, `hexadecimal`, and `octal` parsers should consume trailing whitespace

Since these are lexemes (at least, they should be), they should behave like lexemes: consume trailing white space (but not leading). Currently they don't do it and I find it confusing. If you care to fix it, be careful, definitions of these parsers are used to define other things, so it's easy to introduce new bugs if you're not careful.

Descriptions of these parsers carefully don't call them lexemes, but this is not enough. If you want to preserve current behavior you should explicitly state that these are not lexemes and they do not consume trailing whitespace unlike all other members of GenTokenParser.

Update readme

Hi,
the "Getting started" section in the Readme seems to need an update.
When trying to instantiate let parenSet = char '(' >> many parenSet >> char ')', I only get an error Non type-variable argument in the constraint: Stream s m Char (Use FlexibleContexts to permit this). I needed to append :: Parsec String () Char to make it work.

Also, can you explain what the difference between (many parenSet >> eof) <|> eof and just many parenSet >> eof is? I cannot seem to get different output on any examples I tried.

It's pretty hard to make a lexer using Data.Text

Although the parser is rather easy to use to export Text from Data.Text, the Text.Parsec.Token library is rather hard to use to create a lexer using Text.

I'm not exactly sure how to make the change as of yet -- I'd love to be guided into making the change -- but it would be wonderful if we could somehow parameterize either makeTokenParser or emptyDef so that it would be easier to create a lexer that can handle to the Text type rather than the String from Prelude.

I would love for someone to correct me if I'm wrong and this is doable with the current libary, but at the same time I feel that it could potentially be a reason someone might look somewhere else. Perhaps having an easy way of creating a Text consuming lexer could be a boon to Parsec.

Cut a release for GHC 8.4

The GHC 8.4.1 release is quickly approaching and i would like to have all of the submodules finalized by next alpha. For this we'll need to cut a new release.

Why sourcePos has line and column, but no token number?

The SourcePos data structure seems to expose line number and column number, which is great for error messages, but it'd be really nice to also have the character number in the file. This would enable things like the combinator discussed here.

Is there a reason this doesn't exist? Would it be hard to add? Is there a better way to write that combinator?

Parsers in Text.Parsec.Char accept non-ascii letter characters

The built-in parsers letter, lower, upper, and alphaNum accept non-ascii letter characters, such as greek letters and letters with accents. For example:

parseTest lower "π" -- prints '\960'
parseTest upper 'Å' -- prints '\197'

This is not necessarily a bug, but it is an inaccuracy in the documentation. The docs claim that lower accepts characters "between 'a' and 'z'", that upper accepts characters "between 'A' and 'Z'", etc.

Parsec inherits its support for non-ascii letters from predicate functions defined in Data.Char, such as isLower and isUpper.

Off-by-one error in Token charControl

In Text/Parsec/Tokens.hs we have the default parsing of control characters in literals:

charControl = do{ char '^'
                    ; code <- upper
                    ; return (toEnum (fromEnum code - fromEnum 'A'))
                    }

For example, the string literal "hello^Zworld" should parse the same as "hello\x1Aworld" - that is, ^Z stands in for ASCII character 26.

Similar the other letters stand in for the corresponding control character - ^A for ASCII 1, ^B for ASCII 2, &c.

However we have an off-by-one error here - toEnum (fromEnum 'A' - fromEnum 'A') is equal to zero, not 1. This probably should be toEnum (fromEnum code - fromEnum 'A' + 1).

Add pTrace function

Here's a basic tool I hacked up back in the day to help in debugging parsers, and I've found it indispensable.

@pchiusano added a wrapper to show backtracking too.

I don't have a strong opinion on where it lands in the parsec package, but it seems that a few people have now found it useful and it only really exists as folklore. It would be nice to add it to the parsec package proper in an appropriate location:

pTrace s = pt <|> return ()
    where pt = try $
               do
                 x <- try $ many1 anyChar
                 trace (s++": " ++x) $ try $ char 'z'
                 fail x

pTraced s p = do
  pTrace s
  p <|> trace (s ++ " backtracked") (fail s)

Documentation for alphaNum is wrong

alphaNum claims:

Parses a letter or digit (a character between '0' and '9') according to isAlphaNum. Returns the parsed character.

However isAlphaNum says:

Selects alphabetic or numeric digit Unicode characters.

Note that numeric digits outside the ASCII range are selected by this function but not by isDigit. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

`notFollowedBy` and parsers which don't consume input.

Currently notFollowedBy always succeeds with parsers that don't consume input:

-- This parser succeeds.
> parseTest (lookAhead (string "a")) "abc"
"a"

-- Therefore this parser should fail – but it doesn't.
> parseTest (notFollowedBy (lookAhead (string "a"))) "abc"
()

Is this bug old enough to be considered a feature? (Even if so, this behavior should probably be documented.) If not, here's a version that works (but no idea how much slower, if at all):

notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
      do {a <- try p; return (unexpected (show a));}
  <|> return (return ())

Re-publish & update old parsec manual

Extracted from #22.

@gbaz:

It would actually be very good if we could also pull the documentation from the homepage (as hosted by web.archive.org: http://web.archive.org/web/20140329210442/http://legacy.cs.uu.nl/daan/download/parsec/parsec.html) and ported the parts that aren't just haddocks into a new parsec documentation module or the like, and put some very simple "getting started" bits from them at the top of the main Text.Parsec module. (having updated them to work with the latest parsec).

Additionally there should be some top level documentation in the package indicating that that is the main entry point, and also which modules only remain for compatibility...

@aslatter:

I haven't heard from Daan Leijen that it would be okay to copy his text verbatim elsewhere, so anything pulled in to the package description or module documentation would need to be new text.

@gbaz:

did he just not reply to your email? did you try with a relatively recent address such as that in this paper (http://research.microsoft.com/pubs/210640/paper.pdf) ?

I hope we can get in touch with him, because it would be much easier to do things that way, and I can't imagine he would object, even if it is slow to get a response and get his approval...

@aslatter:

Yeah, that's the email I tried. It was a ways back.

Token parser consumes newlines

As described in this SO question, many of the token parser functions consume newlines, which is often not the desired behaviour. It's also not easy, without copying most of the code, to change that. It'd be nice if this behaviour were configurable!

Regression in parsec-3.1.6

Consider

import Control.Applicative ((<*), (<$>), (<$))
import Text.Parsec
import Text.Parsec.Language (haskellStyle)
import Text.Parsec.String (Parser)
import Text.Parsec.Expr
import qualified Text.Parsec.Token as P

data Expr = Const Integer | Op Expr Expr
  deriving Show

{-------------------------------------------------------------------------------
  Syntax analysis
-------------------------------------------------------------------------------}

parseTopLevel :: Parser Expr
parseTopLevel = parseExpr <* eof

parseExpr :: Parser Expr
parseExpr = buildExpressionParser table (Const <$> integer)
  where
    table = [[ Infix (Op <$ reserved ">>>") AssocLeft ]]

{-------------------------------------------------------------------------------
  Lexical analysis
-------------------------------------------------------------------------------}

lexer = P.makeTokenParser haskellStyle { P.reservedOpNames = [">>>"] }

integer    = P.integer    lexer
reserved   = P.reserved   lexer
reservedOp = P.reservedOp lexer

with parsec-3.1.5 and below we get

*Main> parseTest parseTopLevel "4 >> 5"
parse error at (line 1, column 3):
unexpected '>'
expecting operator or end of input

but with parsec-3.1.6 we get

*Main> parseTest parseTopLevel "4 >> 5"
parse error at (line 1, column 5):
unexpected " "
expecting operator

Alternative formulation of Stream

Here is what kind of Stream I came to independently, when I was playing with purescript-parsing:

class Stream f c | f -> c where
  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: Position -> Position }
  stripPrefix :: Prefix f -> f -> Maybe {  rest :: f, updatePos :: Position -> Position }

class HasUpdatePosition a where
  updatePos :: Position -> a -> Position

newtype Prefix a = Prefix a


-- example implementation for list
instance (Eq a, HasUpdatePosition a) => Stream (List.List a) a where
  uncons f = L.uncons f <#> \({ head, tail}) ->
    { head, tail, updatePos: (_ `updatePos` head)}
  stripPrefix (Prefix p) s = List.stripPrefix (List.Pattern p) s <#> \rest ->
    { rest, updatePos: unwrap (fold (p <#> (flip updatePos >>> Endo)))}

purescript-contrib/purescript-parsing#62

I think this formulation is nicer as you don't need to carry updatePosition function around, and I don't see why you need the m in uncons too.

Would like your thoughts on this formulation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.