haskell / parsec Goto Github PK
View Code? Open in Web Editor NEWA monadic parser combinator library
Home Page: https://hackage.haskell.org/package/parsec
License: Other
A monadic parser combinator library
Home Page: https://hackage.haskell.org/package/parsec
License: Other
Currently, the Operator
constructors used for building expression parsers are required to be pure functions wrapped in the ParsecT
monad, i.e.:
Infix (ParsecT s u m (a -> a -> a)) Assoc
Prefix (ParsecT s u m (a -> a))
Postfix (ParsecT s u m (a -> a))
This however limits what can be done with the operators, for example it is not possible to perform additional checks of the parsed argument under the ParsecT
monad and then fail using fail
or unexpected
, or build up new information in the complete term using the parsed sub-terms.
I would therefore like to ask for a feature request that allows taking more expressive functions where only the resulting term needs to be in the ParsecT
monad. In order to not break backwards compatibility, one could imagine having an M
postfix to each new constructor, so the following constructors are added:
InfixM (a -> a -> ParsecT s u m a) Assoc
PrefixM (a -> ParsecT s u m a)
PostfixM (a -> ParsecT s u m a)
We are currently using Text.Parsec.Token
for the parser in PureScript, and we are trying to find a way to handle doc string comments.
The current approach is less than ideal - we need to replicate the entire Text.Parsec.Token
module, and change one line inside oneLineComment
to handle the |
character at the start of the comment which indicates a doc string.
I would like to submit a PR which allowed the user to specify a parser instead of a String
for things like commentLine
and commentStart
. This would be backwards-compatible in the sense that if the user only specified a String
, the old approach would be used.
I just wanted to ask, before starting work on this, if it is the sort of thing that is likely to be accepted.
There may be a bug in Text.Parsec.Token.float
. Please see this SO question for comprehensive description:
http://stackoverflow.com/questions/29820870/floating-point-numbers-precision-and-parsec
Here's a basic tool I hacked up back in the day to help in debugging parsers, and I've found it indispensable.
@pchiusano added a wrapper to show backtracking too.
I don't have a strong opinion on where it lands in the parsec package, but it seems that a few people have now found it useful and it only really exists as folklore. It would be nice to add it to the parsec package proper in an appropriate location:
pTrace s = pt <|> return ()
where pt = try $
do
x <- try $ many1 anyChar
trace (s++": " ++x) $ try $ char 'z'
fail x
pTraced s p = do
pTrace s
p <|> trace (s ++ " backtracked") (fail s)
Hi,
the "Getting started" section in the Readme seems to need an update.
When trying to instantiate let parenSet = char '(' >> many parenSet >> char ')'
, I only get an error Non type-variable argument in the constraint: Stream s m Char
(Use FlexibleContexts to permit this). I needed to append :: Parsec String () Char
to make it work.
Also, can you explain what the difference between (many parenSet >> eof) <|> eof
and just many parenSet >> eof
is? I cannot seem to get different output on any examples I tried.
The GHC 8.4.1 release is quickly approaching and i would like to have all of the submodules finalized by next alpha. For this we'll need to cut a new release.
The following is useful:
instance Monoid a => Monoid (ParsecT s u m a) where
mempty = pure mempty
a `mappend` b = mappend <$> a <*> b
It allows me to write things like this:
time :: Parser TimeOfDay
time = parseTimeM False defaultTimeLocale "%H:%M:%S" =<<
count 2 digit <> string ":" <> count 2 digit <> string ":" <> count 2 digit
Please consider including this instance.
I would submit a PR but you probably want to choose where the instance is defined.
And now I can only find Parsec: Direct Style Monadic Parser Combinators for the Real World instead of Parsec, a fast combinator parser.
What's the difference between them? Which one should I read first if I want to understand Parsec ?
In particular I was looking for a parser for qualified Haskell identifiers and operators.
Hi!
I'm a wee baby Haskell programmer, but I noticed that the example at the bottom here:
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Expr.html
Doesn't seem to type check. I don't think the types of parens / reservedOp are agreeing here. Perhaps they changed?
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Token.html#v:reservedOp
Perhaps the example needs an update, or more clarification?
Thanks!
Hi, I try to use parsec in a new project generated by stack new.
here the exact error:
Main.hs:1:1: error:
Failed to load interface for ‘Text.ParserCombinators.Parsec’
It is a member of the hidden package ‘parsec-3.1.11’.
Use -v to see a list of the files searched for.
Here my .cabal file:
name: haskell-scheme
version: 0.1.0.0
-- synopsis:
-- description:
homepage: https://github.com/githubuser/haskell-scheme#readme
license: BSD3
license-file: LICENSE
author: Author name here
maintainer: [email protected]
copyright: 2017 Author name here
category: Web
build-type: Simple
cabal-version: >=1.10
extra-source-files: README.md
executable haskell-scheme
hs-source-dirs: src
main-is: Main.hs
default-language: Haskell2010
build-depends: base >= 4.7 && < 5, parsec, text
and my stack.yaml
# This file was automatically generated by 'stack init'
#
# Some commonly used options have been documented as comments in this file.
# For advanced use and comprehensive documentation of the format, please see:
# http://docs.haskellstack.org/en/stable/yaml_configuration/
# Resolver to choose a 'specific' stackage snapshot or a compiler version.
# A snapshot resolver dictates the compiler version and the set of packages
# to be used for project dependencies. For example:
#
# resolver: lts-3.5
# resolver: nightly-2015-09-21
# resolver: ghc-7.10.2
# resolver: ghcjs-0.1.0_ghc-7.10.2
# resolver:
# name: custom-snapshot
# location: "./custom-snapshot.yaml"
resolver: lts-8.13
# User packages to be built.
# Various formats can be used as shown in the example below.
#
# packages:
# - some-directory
# - https://example.com/foo/bar/baz-0.0.2.tar.gz
# - location:
# git: https://github.com/commercialhaskell/stack.git
# commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
# - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
# extra-dep: true
# subdirs:
# - auto-update
# - wai
#
# A package marked 'extra-dep: true' will only be built if demanded by a
# non-dependency (i.e. a user package), and its test suites and benchmarks
# will not be run. This is useful for tweaking upstream packages.
packages:
- '.'
# Dependency packages to be pulled from upstream that are not in the resolver
# (e.g., acme-missiles-0.3)
extra-deps: [parsec-3.1.11]
# Override default flag values for local packages and extra-deps
flags: {}
# Extra package databases containing global packages
extra-package-dbs: []
# Control whether we use the GHC we find on the path
# system-ghc: true
#
# Require a specific version of stack, using version ranges
# require-stack-version: -any # Default
# require-stack-version: ">=1.4"
#
# Override the architecture used by stack, especially useful on Windows
# arch: i386
# arch: x86_64
#
# Extra directories used by stack for building
# extra-include-dirs: [/path/to/dir]
# extra-lib-dirs: [/path/to/dir]
#
# Allow a newer minor version of GHC than the snapshot specifies
# compiler-check: newer-minor
A parser like
parser :: GenParser Char st String
parser = choice $ fmap (try . string) ["head", "tail", "tales"]
with malformed input "ta " will produce an error message like unexpected "t"
and point to the first position of the input. Better output would be unexpected " "
(and possibly a list of expected outputs, ["tails", "tales"]
in this case).
Please see this much more detailed StackOverflow answer/analysis written by CR Drost (starting with the second headline).
On Hackage, there are no generated docs: https://hackage.haskell.org/package/parsec
Using *>
instead of >>
makes parsers use more memory. I suggest adding these definitions to the Functor instance for ParsecT:
(*>) = (>>)
p1 <* p2 = do { x1 <- p1 ; p2 ; return x1 }
Below is a program that demonstrates the problem. With the current Parsec behaviour, you get this memory profile. You can see that lots of heap space is used, but when you force the output of the parser most of it goes away.
Using >>
instead of *>
(i.e. what you should get with the suggested change) gives this nicer profile. (Note the change in the vertical scale.)
Here is the program that generates the profiles.
import Control.Applicative ((<*), (*>), (<$>))
import Control.Monad
import System.Environment
import Text.Parsec
import Text.Parsec.String
-- Instructions to reproduce graphs:
--
-- First, load this file in ghci and run "generateData". This will
-- create a file called "ab-input". Then:
--
-- ghc -O AppLeak -prof -fprof-auto -rtsopts
-- ./AppLeak 1 +RTS -h -i0.03 -RTS < ab-input
-- hp2ps -c AppLeak.hp && mv AppLeak.ps AppLeak-1.ps
-- ./AppLeak 2 +RTS -h -i0.03 -RTS < ab-input
-- hp2ps -c AppLeak.hp && mv AppLeak.ps AppLeak-2.ps
--
-- AppLeak-1.ps shows the heap profile with the leak; AppLeak-2.ps
-- shows the well-behaved profile.
--
--
-- Adding explicit definitions of <* and *> to the Functor instance
-- for ParsecT will fix the problem:
--
-- instance Applicative.Applicative (ParsecT s u m) where
-- pure = return
-- (<*>) = ap -- TODO: Can this be optimized?
-- (*>) = (>>)
-- p1 <* p2 = do { x1 <- p1 ; p2 ; return x1 }
-- Some whitespace, followed by one or more 'a's, then one 'b'.
ab :: Parser Char
ab = spaces *> many1 (char 'a') *> char 'b'
-- Same as "ab" above, but with ">>" instead of "*>".
ab2 :: Parser Char
ab2 = spaces >> many1 (char 'a') >> char 'b'
main = do
args <- getArgs
selection <-
if length args > 0 && args !! 0 == "2"
then putStrLn "using monadic >>" >> return ab2
else putStrLn "using applicative *>" >> return ab
c <- getContents
case parse (many selection <* eof) "" c of
Left e -> print e
Right xs -> do
putStrLn "Parsing finished, forcing result..."
print $ length $ show xs
putStrLn "Delaying..."
delay
-- We do this again so that "xs" doesn't get garbage
-- collected before the delay.
print $ length $ show xs
-- Busy-wait to make the heap profile graph clearer.
delay = print $ fib 32
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
-- Generate some sample data.
generateData = writeFile "ab-input" (concat (replicate 100000 "aabaabababab"))
In Text.Parsec.Language
the definition for the javaStyle
is:
javaStyle :: LanguageDef st
javaStyle = emptyDef
{ commentStart = "/*"
, commentEnd = "*/"
, commentLine = "//"
, nestedComments = True
, identStart = letter
, identLetter = alphaNum <|> oneOf "_'"
, reservedNames = []
, reservedOpNames= []
, caseSensitive = False -- <==
}
Which defines java style to be case insensitive, which is very confusing as most java style languages are in fact case sensitive. Not sure if this is an oversight or me misunderstanding the intention of this language definition.
The SourcePos
data structure seems to expose line number and column number, which is great for error messages, but it'd be really nice to also have the character number in the file. This would enable things like the combinator discussed here.
Is there a reason this doesn't exist? Would it be hard to add? Is there a better way to write that combinator?
The built-in parsers letter
, lower
, upper
, and alphaNum
accept non-ascii letter characters, such as greek letters and letters with accents. For example:
parseTest lower "π" -- prints '\960'
parseTest upper 'Å' -- prints '\197'
This is not necessarily a bug, but it is an inaccuracy in the documentation. The docs claim that lower
accepts characters "between 'a' and 'z'", that upper
accepts characters "between 'A' and 'Z'", etc.
Parsec inherits its support for non-ascii letters from predicate functions defined in Data.Char
, such as isLower
and isUpper
.
integer doesn't behave properly for input with leading white spaces.
import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (emptyDef)
lexer =
P.makeTokenParser emptyDef
integer = P.integer lexer
natural = P.natural lexer
integer' :: Parser Integer
integer' = do
f <- option id sign'
n <- natural
return $ f n
sign' :: Parser (Integer -> Integer)
sign' = (char '-' >> return negate)
<|> (char '+' >> return id)
main = do
print . (parse natural "input") $ "1"
print . (parse integer "input") $ "1"
print . (parse integer' "input") $ "1"
print . (parse natural "input") $ " 1"
print . (parse integer "input") $ " 1"
print . (parse integer' "input") $ " 1"
print . (parse natural "input") $ " 1"
print . (parse integer "input") $ " +1"
print . (parse integer' "input") $ " +1"
The prime version (integer'
) is provided to illustrate the expected behavior. The output of above program is:
Right 1
Right 1
Right 1
Left "input" (line 1, column 1):
unexpected " "
expecting natural
Right 1
Left "input" (line 1, column 1):
unexpected " "
expecting "-", "+" or natural
Left "input" (line 1, column 1):
unexpected " "
expecting natural
Left "input" (line 1, column 2):
unexpected "+"
expecting digit
Left "input" (line 1, column 1):
unexpected " "
expecting "-", "+" or natural
The integer
parser doesn't fail for <space>1
, while it should, and it gives misleading error message for <space>+1
. On the other hand, the error message for integer'
is more understandable.
This causes xmlhtml-0.1.5.2
to fail to compile due to
src/Text/XmlHtml/TextParser.hs:42:5:
`uncons' is not a (visible) method of class `P.Stream'
with parsec-3.1.6
and parsec-3.1.7
(while it still worked with parsec-3.1.5
)
λ parseTest (operator haskell) "∧"
parse error at (line 1, column 1):
unexpected "\8743"
expecting operator
https://gist.github.com/deflexor/247f97f4a59de0de5109
Here is example of strange behaviour. Parser starts parsing '<html...' with try
, but when it fails it backtracks to 'h', but not to '<' as expected
LTS 8.x has HUnit 1.5, but this package requires HUnit >=1.2 && <1.4. Would it be possible to extend support for later versions of HUnit?
Although the parser is rather easy to use to export Text
from Data.Text
, the Text.Parsec.Token
library is rather hard to use to create a lexer using Text
.
I'm not exactly sure how to make the change as of yet -- I'd love to be guided into making the change -- but it would be wonderful if we could somehow parameterize either makeTokenParser
or emptyDef
so that it would be easier to create a lexer that can handle to the Text
type rather than the String
from Prelude
.
I would love for someone to correct me if I'm wrong and this is doable with the current libary, but at the same time I feel that it could potentially be a reason someone might look somewhere else. Perhaps having an easy way of creating a Text
consuming lexer could be a boon to Parsec
.
Another thing:
Basically I need a function with the following signature:
hoistParsecT :: (m a -> n a) -> (n a -> m a) -> ParsecT s u m a -> ParsecT s u n a
I could do that if the constructor of ParsecT
was available, is there a reason it is not exported?
alphaNum
claims:
Parses a letter or digit (a character between '0' and '9') according to isAlphaNum. Returns the parsed character.
However isAlphaNum
says:
Selects alphabetic or numeric digit Unicode characters.
Note that numeric digits outside the ASCII range are selected by this function but not by isDigit. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.
Extracted from #22.
It would actually be very good if we could also pull the documentation from the homepage (as hosted by web.archive.org: http://web.archive.org/web/20140329210442/http://legacy.cs.uu.nl/daan/download/parsec/parsec.html) and ported the parts that aren't just haddocks into a new parsec documentation module or the like, and put some very simple "getting started" bits from them at the top of the main Text.Parsec module. (having updated them to work with the latest parsec).
Additionally there should be some top level documentation in the package indicating that that is the main entry point, and also which modules only remain for compatibility...
I haven't heard from Daan Leijen that it would be okay to copy his text verbatim elsewhere, so anything pulled in to the package description or module documentation would need to be new text.
did he just not reply to your email? did you try with a relatively recent address such as that in this paper (http://research.microsoft.com/pubs/210640/paper.pdf) ?
I hope we can get in touch with him, because it would be much easier to do things that way, and I can't imagine he would object, even if it is slow to get a response and get his approval...
Yeah, that's the email I tried. It was a ways back.
The function buildExpressionParser
in module Text.Parser.Expr
has a bug.
In the uncommon case where a prefix operator has lower precedence than an infix operator, such as in boolean expressions, e.g
&&
is an infix operator.!
is a prefix operator of lower precedence.The input !a && b
is correctly parsed Not (And a b)
, but the parser fails to parse a && !b
(as And a (Not b)
).
The docs for parsec-3.1.6
are missing on hackage.
Just a small thing if you want, i'm pointing this out while approaching Parsec, before getting used to this.
Usually a parser's name refers to the current parser input. char 'a'
will parse 'a'
and so on.
If we read notFollowedBy
with the same semantics, a parser like notFollowedBy '.'
should correctly parse "."
because "."
is not followed by a dot. It should fail for "a.", instead.
In this sense a parser called notFollowedBy
should probably get two parsers as arguments, like manyTill
etcetera. I find that the current semantic of notFollowedBy
is closer to a simple not
.
Currently notFollowedBy
always succeeds with parsers that don't consume input:
-- This parser succeeds.
> parseTest (lookAhead (string "a")) "abc"
"a"
-- Therefore this parser should fail – but it doesn't.
> parseTest (notFollowedBy (lookAhead (string "a"))) "abc"
()
Is this bug old enough to be considered a feature? (Even if so, this behavior should probably be documented.) If not, here's a version that works (but no idea how much slower, if at all):
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
As described in this SO question, many of the token parser functions consume newlines, which is often not the desired behaviour. It's also not easy, without copying most of the code, to change that. It'd be nice if this behaviour were configurable!
we have
emptyDef :: LanguageDef st
type LanguageDef st = GenLanguageDef String st Identity
makeTokenParser :: Stream s m Char => GenLanguageDef s u m -> GenTokenParser s u m
This means that this can only be used for String
parsers, and the following is ill-typed (whatever the contents of { ... }
lexer :: GenTokenParser ByteString () Identity
lexer = makeTokenParser $ emptyDef { ... }
ByteString parser seems to be not usable because there are 0 parser functions operating on ByteString. When we operate on strings of chars we have useful parsers like "oneOf", "digit", "spaces". It would be nice to have something like "zeros"(for bytes filled with zeros), "bigEndiandInt32", "byte" to consume bytes from the ByteString stream and write the results into appropriate data types. Something similar was done in "HCodecs"(and many other places where people try to handle some specific file formats). Parsec would be a more natural place to look for such convenient general functions.
In Control.Applicative
, <|>
is defined as having fixity infixl 3
, but in Parsec it is defined as infixr 1
. Is there a reason for this difference? Should Parsec be changed to match?
Consider the following code entered in ghci:
Parsec.parse (Parsec.many1 Parsec.digit <* Parsec.parserFail "foo") "" "123"
Here is the result:
Left (line 1, column 4):
unexpected end of input
expecting digit
foo
The error messages "unexpected end of input" and "expecting digit" should not be there.
This is simplified from a more involved example in which the result of parsing a number is checked, and parserFail is called if the number is out of bounds.
If you want to test a parser (e.g. with QuickCheck or HUnit) you need an Eq
instance for ParseError
. This comes up every now and then. Is there any good reason why we would not want an Eq
instance for ParseError
?
Here is the type
sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
Usually when parsing some tokens the separator parser results in same type of value as value parser, i.e. type var a
and sep
will be same type, which means user can write incorrect code for example:
sepBy separatorParser valueParser
vs sepBy valueParser separatorParser
It means user needs to look at the type variables to understand in which order values should be passed. But if we change type to:
sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m () -> ParsecT s u m [a]
Then the incorrect code from above will not compile. As a result it would be way less likely to mess with the argument order. (unless you want to get [()]
as result).
Downside would be the need to do extra $> ()
or *> return ()
in the end of separator parser.
Here is what kind of Stream I came to independently, when I was playing with purescript-parsing:
class Stream f c | f -> c where
uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: Position -> Position }
stripPrefix :: Prefix f -> f -> Maybe { rest :: f, updatePos :: Position -> Position }
class HasUpdatePosition a where
updatePos :: Position -> a -> Position
newtype Prefix a = Prefix a
-- example implementation for list
instance (Eq a, HasUpdatePosition a) => Stream (List.List a) a where
uncons f = L.uncons f <#> \({ head, tail}) ->
{ head, tail, updatePos: (_ `updatePos` head)}
stripPrefix (Prefix p) s = List.stripPrefix (List.Pattern p) s <#> \rest ->
{ rest, updatePos: unwrap (fold (p <#> (flip updatePos >>> Endo)))}
purescript-contrib/purescript-parsing#62
I think this formulation is nicer as you don't need to carry updatePosition function around, and I don't see why you need the m
in uncons
too.
Would like your thoughts on this formulation.
Consider
import Control.Applicative ((<*), (<$>), (<$))
import Text.Parsec
import Text.Parsec.Language (haskellStyle)
import Text.Parsec.String (Parser)
import Text.Parsec.Expr
import qualified Text.Parsec.Token as P
data Expr = Const Integer | Op Expr Expr
deriving Show
{-------------------------------------------------------------------------------
Syntax analysis
-------------------------------------------------------------------------------}
parseTopLevel :: Parser Expr
parseTopLevel = parseExpr <* eof
parseExpr :: Parser Expr
parseExpr = buildExpressionParser table (Const <$> integer)
where
table = [[ Infix (Op <$ reserved ">>>") AssocLeft ]]
{-------------------------------------------------------------------------------
Lexical analysis
-------------------------------------------------------------------------------}
lexer = P.makeTokenParser haskellStyle { P.reservedOpNames = [">>>"] }
integer = P.integer lexer
reserved = P.reserved lexer
reservedOp = P.reservedOp lexer
with parsec-3.1.5
and below we get
*Main> parseTest parseTopLevel "4 >> 5"
parse error at (line 1, column 3):
unexpected '>'
expecting operator or end of input
but with parsec-3.1.6
we get
*Main> parseTest parseTopLevel "4 >> 5"
parse error at (line 1, column 5):
unexpected " "
expecting operator
It's not immediately obvious to me why there is this restriction.
In particular, I'd like to use the haskell
TokenParser
on a Text
stream.
According to the Haskell 2010 Language Report floating literal should be parsed as Rational
. However Text.Parsec.Language.haskell
parses them as Double
.
To fix this we would need to change the types of TokenParser.float
and TokenParser.naturalOrFloat
from Double
to Rational
. This is a breaking change that will also affect other language definitions. Is that an option?
Since these are lexemes (at least, they should be), they should behave like lexemes: consume trailing white space (but not leading). Currently they don't do it and I find it confusing. If you care to fix it, be careful, definitions of these parsers are used to define other things, so it's easy to introduce new bugs if you're not careful.
Descriptions of these parsers carefully don't call them lexemes, but this is not enough. If you want to preserve current behavior you should explicitly state that these are not lexemes and they do not consume trailing whitespace unlike all other members of GenTokenParser
.
The Data
class is, while not ubiquitous, quite useful in a variety of settings. It would be elementary to add a deriving Data
clause to (at least) the types for which Typeable
is already derived. Would you accept a pull request doing so?
It seems like it would also be a good idea to derive Generic
for most types.
This is a primitve csv parser which skips intermediate blank lines:
parser :: Stream s m Char => ParsecT s u m [[String]]
parser = line `sepEndBy` (some endOfLine) <* eof
where line = many (noneOf ",\n\r") `sepBy` char ','
and looks like it works:
\> parse parser "" "1,2,3\n\n\n4,5,6"
Right [["1","2","3"],["4","5","6"]]
but if i change some
to many
:
parser :: Stream s m Char => ParsecT s u m [[String]]
parser = line `sepEndBy` (many endOfLine) <* eof
where line = many (noneOf ",\n\r") `sepBy` char ','
it will never return:
\> parse parser "" "1,2,3\n\n\n4,5,6"
^CInterrupted.
While I see where this is from, please, don't make such changes in a minor version.
MissingH 1.4.0.1 and parsec 3.1.12.0 are not compatible because the Text.ParserCombinators.Parsec
module is no longer Safe. It used to be Safe in version 3.1.11.
This incompatibility breaks upstream packages such as Gitit (see comment by cidig on 2018-02-04).
Issue in MissingH repo: haskell-hvr/missingh#42
src/Text/ParserCombinators/Parsec/Utils.hs:33:1: error:
Text.ParserCombinators.Parsec: Can't be safely imported!
The module itself isn't safe.
|
33 | import Text.ParserCombinators.Parsec
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Can we add deriving Typeable
for ParseError
and also:
instance Exception ParseError
That would be helpful for me.
In Text/Parsec/Tokens.hs we have the default parsing of control characters in literals:
charControl = do{ char '^'
; code <- upper
; return (toEnum (fromEnum code - fromEnum 'A'))
}
For example, the string literal "hello^Zworld" should parse the same as "hello\x1Aworld" - that is, ^Z stands in for ASCII character 26.
Similar the other letters stand in for the corresponding control character - ^A for ASCII 1, ^B for ASCII 2, &c.
However we have an off-by-one error here - toEnum (fromEnum 'A' - fromEnum 'A')
is equal to zero, not 1. This probably should be toEnum (fromEnum code - fromEnum 'A' + 1)
.
lookAhead p
is a parser that calls p
and if it succeeds then lookAhead p
succeeds too, but it also rewinds the input stream back to where it was before p
was tried. This was implemented in 3839639 by @feuerbach. I claim that this is not enough, lookAhead p
should also discard any error messages generated by p
. This is best illustrated with an example. Suppose that we have the following:
p = lookAhead (many1 $ char 'a') >> char 'b'
Then parse p "" "a"
should fail with the following error message:
Left (line 1, column 1):
unexpected "a"
expecting "b"
but currently it fails with the following:
Left (line 1, column 2):
unexpected end of input
expecting "a"
In other words, if p
succeeds, then lookAhead p
should behave as if p
never occurred, both in terms of the parser state and in terms of the errors generated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.