GithubHelp home page GithubHelp logo

elm / parser Goto Github PK

View Code? Open in Web Editor NEW
226.0 14.0 46.0 109 KB

A parsing library, focused on simplicity and great error messages

Home Page: https://package.elm-lang.org/packages/elm/parser/latest

License: BSD 3-Clause "New" or "Revised" License

Elm 96.33% JavaScript 3.67%
elm parser

parser's Introduction

Parser

Regular expressions are quite confusing and difficult to use. This library provides a coherent alternative that handles more cases and produces clearer code.

The particular goals of this library are:

  • Make writing parsers as simple and fun as possible.
  • Produce excellent error messages.
  • Go pretty fast.

This is achieved with a couple concepts that I have not seen in any other parser libraries: parser pipelines, backtracking, and tracking context.

Parser Pipelines

To parse a 2D point like ( 3, 4 ), you might create a point parser like this:

import Parser exposing (Parser, (|.), (|=), succeed, symbol, float, spaces)

type alias Point =
  { x : Float
  , y : Float
  }

point : Parser Point
point =
  succeed Point
    |. symbol "("
    |. spaces
    |= float
    |. spaces
    |. symbol ","
    |. spaces
    |= float
    |. spaces
    |. symbol ")"

All the interesting stuff is happening in point. It uses two operators:

  • (|.) means “parse this, but ignore the result”
  • (|=) means “parse this, and keep the result”

So the Point function only gets the result of the two float parsers.

The theory is that |= introduces more “visual noise” than |., making it pretty easy to pick out which lines in the pipeline are important.

I recommend having one line per operator in your parser pipeline. If you need multiple lines for some reason, use a let or make a helper function.

Backtracking

To make fast parsers with precise error messages, all of the parsers in this package do not backtrack by default. Once you start going down a path, you keep going down it.

This is nice in a string like [ 1, 23zm5, 3 ] where you want the error at the z. If we had backtracking by default, you might get the error on [ instead. That is way less specific and harder to fix!

So the defaults are nice, but sometimes the easiest way to write a parser is to look ahead a bit and see what is going to happen. It is definitely more costly to do this, but it can be handy if there is no other way. This is the role of backtrackable parsers. Check out the semantics page for more details!

Tracking Context

Most parsers tell you the row and column of the problem:

Something went wrong at (4:17)

That may be true, but it is not how humans think. It is how text editors think! It would be better to say:

I found a problem with this list:

    [ 1, 23zm5, 3 ]
         ^
I wanted an integer, like 6 or 90219.

Notice that the error messages says this list. That is context! That is the language my brain speaks, not rows and columns.

Once you get comfortable with the Parser module, you can switch over to Parser.Advanced and use inContext to track exactly what your parser thinks it is doing at the moment. You can let the parser know “I am trying to parse a "list" right now” so if an error happens anywhere in that context, you get the hand annotation!

This technique is used by the parser in the Elm compiler to give more helpful error messages.

parser's People

Contributors

cbenz avatar evancz avatar janiczek avatar jinjor avatar markdblackwell avatar methodgrab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parser's Issues

Infinitive recursion runtime error links to 404 https://elm-lang.org/0.19.0/halting-problem

Using Elm 0.19.0, I got the following runtime error:

Some top-level definitions from `Test` are causing infinite recursion:

  ┌─────┐
  │    f1
  │     ↓
  │    f2
  │     ↓
  │    f3
  │     ↓
  │    f4
  │     ↓
  │    f2
  └─────┘

These errors are very tricky, so read https://elm-lang.org/0.19.0/halting-problem to learn how to fix

But https://elm-lang.org/0.19.0/halting-problem returns the Poem 404 page.

Parser.int and Parser.float handle leading Zeros unexpectedly

Hi everyone,

I stumbled upon odd behaviour regarding leading zeros:

Observed

> Parser.run Parser.int "0123"
Ok 0 : Result (List Parser.DeadEnd) Int
> Parser.run Parser.float "0123"
Ok 0 : Result (List Parser.DeadEnd) Float

Expected

The documentation suggests that this would cause an Err-Result. (at least for Parser.int)

I personally expected python-like behaviour:

>>> int("0123")
123
>>> float("0123")
123.0

In any case, Ok 0 makes the user think, the input was correctly parsed, when in fact it was not.

Final Thoughts

I want to thank you for your work in the elm community and especially the nice design of the elm/parser. It's a blast to work with so far.

I'd be happy to contribute to a fix, if you too see this as a bug.

Example for Parser.float does not type check

elmFloat : Parser Float
elmFloat =
  oneOf
    [ symbol "."
        |. problem "floating point numbers must start with a digit, like 0.25"
    , float
    ]

should be

elmFloat : Parser Float
elmFloat =
  oneOf
    [ symbol "."
        |> andThen (always <| problem "floating point numbers must start with a digit, like 0.25")
    , float
    ]

The (|.) operator keeps left and drops right and thus gives something of type Parser () in the first listing which clashes with Parser Float.

expose Parser.Advanced

some docs seem to speak as though Parser.Advanced should be exposed

parser/src/Parser.elm

Lines 133 to 134 in 828a790

**Note:** If you feel limited by this type (i.e. having to represent custom
problems as strings) I highly recommend switching to `Parser.Advanced`. It

but it isn't exposed

parser/elm.json

Lines 7 to 9 in 828a790

"exposed-modules": [
"Parser"
],

Parser.int accepts input starting with `e`

I'm not exactly sure what is going on, or what is being accepted, but here is an example of Parser.int eating input starting with the letter e (although it doesn't succeed, it needs to be backtracked in order for a following parser to succeed).

https://ellie-app.com/4cD7gfngmNVa1

module Main exposing (main)

import Browser
import Html exposing (Html, br, button, div, text)
import Parser exposing (..)
import Set


{-| An Ellie example showing some parsing madness
-}
type Expr
    = Int Int
    | Func String String


main =
    div
        []
        [ text "This is what we expect: "
        , run expr """a("arg")""" |> Debug.toString |> text
        , br [] []
        , text "And this: "
        , run expr """b("something")""" |> Debug.toString |> text
        , br [] []
        , text "And this too: "
        , run expr """myFuncName("stuff")""" |> Debug.toString |> text
        , br [] []
        , br [] []
        , text "But, if the function name start with an `e`:"
        , run expr """e("what the devil?")""" |> Debug.toString |> text
        , br [] []
        , text "Another example:"
        , run expr """exerciseScoreToInt("blahblah")""" |> Debug.toString |> text
        , br [] []
        , br [] []
        , text "This doesn't happen with any other letter, and if you take away the `int` parse then it works."
        ]


expr : Parser Expr
expr =
    oneOf
        [ backtrackable int |> map Int
        , func
        ]


func : Parser Expr
func =
    succeed Func
        |= backtrackable
            (variable
                { start = Char.isLower
                , inner = Char.isAlphaNum
                , reserved = Set.empty
                }
            )
        |. backtrackable (symbol "(")
        |. symbol "\""
        |= getChompedString (chompWhile (\c -> c /= '"'))
        |. symbol "\""
        |. symbol ")"
        |. end

Is it worth to introduce `optionally`?

Dear maintainers of elm/parser,

First of all, thanks for your hard work. It is a very nice library. It could use some love here and there, so if you need some help, don't be shy to ask me.

I was creating an example to answer a question on slack and wanted to transform a Parser a into a Parser (Maybe a). I was expecting to find an optionally but could not find anything similar in the package.

I implemented it like this

optionally : Parser a -> Parser (Maybe a)
optionally parser =
    oneOf
        [ parser |> map Just
        , succeed Nothing
        ]

Is this something that is interesting to provide through the package? And if so, is my implementation missing a use case?

Thanks in advance for considering my question. I hope the new year brings a lot of joy.

Float parser comitting to leading 'e'

Similar to #28, the float parser comitts when it encounters an 'e', making it impossible to have a parser that reads either a float or some text:

floatOrText =
    oneOf
        [ float |> map (\n -> "Number: " ++ String.fromFloat n)
        , chompUntilEndOr "\n" |> getChompedString
    ]

run floatOrText "not starting with e" --> Ok "not starting with e"
run floatOrText "1e10" --> Ok "Number: 10000000000"
run floatOrText "e should be text" --> Err: ExpectingFloat

(Try it on Ellie: https://ellie-app.com/8yp6MxtzSsna1)

Readme example imports unexposed functions

when depending on "elm/json": "1.0.0"

The `Parser` module does not expose `ignore`:

10| import Parser exposing (Parser, (|.), (|=), succeed, symbol, float, ignore, zeroOrMore)

removing ignore gives

The `Parser` module does not expose `zeroOrMore`:

10| import Parser exposing (Parser, (|.), (|=), succeed, symbol, float, zeroOrMore)

chompUntil does not consume the last string

chompUntil and chompUntilEndOr consume only position but not the last string. chompIf and chompWhile work as expected.

SSCCE: https://ellie-app.com/3MJnNtV3KDFa1

main : Html msg
main =
    text (Debug.toString (Parser.run chomper "foobar"))


chomper : Parser (String, ( Int, Int ))
chomper =
    succeed Tuple.pair
        |= getChompedString (chompUntil "bar")
        |= getPosition

exceted:

Ok ("foobar",(1,7))

actual:

Ok ("foo",(1,7))
  • elm/parser: 1.1.0

Related: #2

lazy doesn't work with always instead of (\_ -> ..)

Hello,
I have ran across this

This works:

parser : Parser ()
parser =
    Parser.lazy (\_ -> parser)

But this

parser : Parser ()
parser =
    Parser.lazy (always parser)

throws error The x value is defined directly in terms of itself, causing an infinite loop.
image

I guess it would be enought to add a Hint about it so that if someone runs linto this he doesn't have to debug it.

Parser.multiComment does not chomp comment terminator in NotNestable mode

Build and run the following:

module Main exposing (..)

import Browser
import Html exposing (Html)
import Html.Attributes
import Parser exposing ((|.), (|=), Parser)


commentByItself : String
commentByItself =
    "/*abc*/"


commentWithTrailingText : String
commentWithTrailingText =
    "/*abc*/def"


nestable : Parser ()
nestable =
    Parser.multiComment "/*" "*/" Parser.Nestable |. Parser.end


notNestable : Parser ()
notNestable =
    Parser.multiComment "/*" "*/" Parser.NotNestable |. Parser.end


notNestableWorkaround : Parser ()
notNestableWorkaround =
    Parser.multiComment "/*" "*/" Parser.NotNestable
        |. Parser.token "*/"
        |. Parser.end


chompUntil : Parser ()
chompUntil =
    Parser.token "/*" |. Parser.chompUntil "*/" |. Parser.end


chompUntilWorkaround : Parser ()
chompUntilWorkaround =
    Parser.token "/*"
        |. Parser.chompUntil "*/"
        |. Parser.token "*/"
        |. Parser.end


main : Program () () ()
main =
    Browser.staticPage <|
        Html.table [ Html.Attributes.style "border-collapse" "collapse" ]
            [ Html.tr []
                [ Html.th borderAttributes [ Html.text "Parser" ]
                , Html.th borderAttributes [ Html.text commentByItself ]
                , Html.th borderAttributes [ Html.text commentWithTrailingText ]
                ]
            , exampleRow "nestable" nestable
            , exampleRow "notNestable" notNestable
            , exampleRow "notNestableWorkaround" notNestableWorkaround
            , exampleRow "chompUntil" chompUntil
            , exampleRow "chompUntilWorkaround" chompUntilWorkaround
            ]


exampleRow : String -> Parser () -> Html msg
exampleRow name parser =
    Html.tr []
        [ Html.td borderAttributes [ Html.text name ]
        , Html.td borderAttributes
            [ Html.text <|
                Debug.toString (Parser.run parser commentByItself)
            ]
        , Html.td borderAttributes
            [ Html.text <|
                Debug.toString (Parser.run parser commentWithTrailingText)
            ]
        ]


borderAttributes : List (Html.Attribute msg)
borderAttributes =
    [ Html.Attributes.style "border" "1px solid black" ]

You should see

Parser /*abc*/ /*abc*/def
nestable Ok () Err [{ col = 8, problem = ExpectingEnd, row = 1 }]
notNestable Err [{ col = 8, problem = ExpectingEnd, row = 1 }] Err [{ col = 8, problem = ExpectingEnd, row = 1 }]
notNestableWorkaround Ok () Err [{ col = 10, problem = ExpectingEnd, row = 1 }]
chompUntil Err [{ col = 8, problem = ExpectingEnd, row = 1 }] Err [{ col = 8, problem = ExpectingEnd, row = 1 }]
chompUntilWorkaround Ok () Err [{ col = 10, problem = ExpectingEnd, row = 1 }]

which has a couple issues:

  • nestable works as expected, notNestable and chompUntil do not
  • adding an extra |. Parser.token "*/" to notNestable 'fixes' the issue, but produces an incorrect error column number for the "/*abc*/def" case

I think the expected results should be:

Parser /*abc*/ /*abc*/def
nestable Ok () Err [{ col = 8, problem = ExpectingEnd, row = 1 }]
notNestable Ok () Err [{ col = 8, problem = ExpectingEnd, row = 1 }]
notNestableWorkaround Err [{ col = 8, problem = Expecting "*/", row = 1 }] Err [{ col = 8, problem = Expecting "*/", row = 1 }]
chompUntil Ok () Err [{ col = 8, problem = ExpectingEnd, row = 1 }]
chompUntilWorkaround Err [{ col = 8, problem = Expecting "*/", row = 1 }] Err [{ col = 8, problem = Expecting "*/", row = 1 }]

I'm on Windows 10.

Feature Request: recover

Hey, I'd like to propose having the ability to recover from a parser. This would be similar to oneOf but allows you to capture the context/problem of parser that failed.

TLDR:
Add a function to recover from a parser that looks like:

recover :
    (context -> problem -> value)
    -> Parser context problem value
    -> Parser context problem value

Use case:
My particular use case for such a feature is writing an error tolerant Elm parser. For example, say we want to parse import MyModule exposing (hello, $myInvalidValue$, World(..)). In this case, we want to capture both the context/problem of the invalid value while still capturing the other valid values (like the fact that it's importing fromMyModule exposing hello and World(..). To achieve this, there are two options that I see. Either capture this data as state in the parser, (like how column, row, and indent are stored, kind of like a warning in the elm compiler) and keep parsing or store the context/problem as successfully parsed data.

The former is tricky, because extending Parser to hold that state would require exposing it's constructor which makes changing the internals of this library more likely to be a breaking change. I also can't re-implement this parser and add this feature outside of the elm github organization because it uses infix operators and a kernel module to make it faster. For the infix operators, I could use named functions and pipelines, but there is no solution that I see to the kernel module.

The second option, which is the feature proposed, would be to add a recover function. We'll use the import MyModule exposing (hello, $myInvalidValue$, World(..)) string as an example.

If we structured our data for the import statement like this:

type alias Parser value =
     Parser Context Problem value

type Problem = ...

type Context = ...

type ModuleImport =
    ModuleImport ModuleName ExposingList

type ModuleName = ...

type ExposingList
    = ExposingExplicit (List ExposedValue)
    | ExposingAll

type ExposedValue
    = ExposedValue ...
    | ExposedConstructor ...
    | ...

We can parse the import statement like this:

moduleImport : Parser ModuleImport
moduleImport =
    Parser.succeed ModuleImport
        |. ...
        |= moduleName
        |. ...
        |= exposingList

moduleName : Parser ModuleName
moduleName = ...

exposingList : Parser ExposingList
exposingList =
    Parser.map ExposingExplicit
       (Parser.sequence
            { start = Parser.token "("
            , end = Parser.token ")"
            , item = exposingValue
            , spaces = ...
            , trailing = Parser.Optional
            }
        )

exposingValue : Parser ExposingValue
exposingValue = ...

exposingList will parse exposed items in a list, but if one item fails then the whole parser will fail. We could make exposingValue optional like so:

exposingList : Parser ExposingList
exposingList =
    Parser.map ExposingExplicit
       (Parser.sequence
            { ...
            , item = 
                Parser.oneOf
                    [ Parser.map Ok exposingValue
                    , Parser.succeed (Err "Invalid list item")
                        // Parse until next list item
                        |. Parser.chompUntil (\c -> c /= ',' && c /= ')')
                    ]
            }
        )

And this works. If there's an invalid value then we ignore it and move on to the next one. However, this looses the context/problem in failed parser. The function I'm proposing would have the type signature:

recover :
    (context -> problem -> value)
    -> Parser context problem value
    -> Parser context problem value

With this, we could rewrite exposingList and extend ExposedValue to recover from the failure and transform the context/problem into a successfully parsed value.

type ExposedValue
    = ...
    | ExposedValueProblem Context Problem

exposingList : Parser ExposingList
exposingList =
    Parser.map ExposingExplicit
       (Parser.sequence
            { ...
            , item = 
                Parser.recover (\context problem -> ExposedValueProblem context problem)
                    exposingValue
            }
        )

Now, we capture the reason that the value failed, while continuing to parse the other values!
I understand that this use case is pretty specific, however I think that the ability to recover from a parser could be helpful in other cases beyond this one.

I'm sorry if this issue is a bit wordy, I thought it would be best to layout a clear and specific example of this feature and how it would be helpful. If there is a different way to solve this problem that I'm not seeing, please let me know! I'm super willing to PR this feature, but wanted to get feedback before doing so!

Add some examples that show a parser alternative for regex

Because elm/parser is recommended over using regex, there should be some examples showing how to make a parser for both simple and complex regex.

If you guys agree, I'll make a draft PR when I get started. If I have not yet done so anyone can take this issue.

lineComment messes up newline tracking

Parser Version

1.1.0

Description

Following the doc on specifying spaces for Elm:

sps : Parser ()
sps =
  loop 0 <| ifProgress <|
    oneOf
      [ lineComment "--"
      , multiComment "{-" "-}" Nestable
      , spaces
      ]

ifProgress : Parser a -> Int -> Parser (Step Int ())
ifProgress p offset =
  succeed identity
    |. p
    |= getOffset
    |> map (\newOffset -> if offset == newOffset then Done () else Loop newOffset)

The above works well until we introduce position tracking:

type alias Located a =
  { from : (Int, Int)
  , value : a
  , to : (Int, Int)
  }

located : Parser a -> Parser (Located a)
located p =
  succeed Located
    |= getPosition
    |= p
    |= getPosition

Now if we specify a simple test parser that uses the sps function:

parser : Parser (Located (), Located ())
parser =
    succeed (Tuple.pair)
        |= (located <| token "one")
        |. sps
        |= (located <| token "two")

and call parser one a simple source string:

source : String
source =
    """one -- comment
two
    """

result =
    run parser source

We unexpectedly got back:

Ok ({ from = (1,1), to = (1,4), value = () },{ from = (3,1), to = (3,4), value = () })

Notice that two is one the second line but elm/parser gives back its location on the third line:

{ from = (3,1), to = (3,4), value = () }

We are expecting:

Ok ({ from = (1,1), to = (1,4), value = () },{ from = (2,1), to = (2,4), value = () })

Workaround for lineComment:
Parser version:

lineCommentWorkAround : String -> Parser ()
lineCommentWorkAround start =
    succeed () |. symbol start |. chompWhile (\c -> c /= '\n')

Parser.Advanced version:

lineCommentWorkAround : Token -> Parser ()
lineCommentWorkAround start =
    succeed () |. symbol start |. chompWhile (\c -> c /= '\n')

Reason

This issue is likely because chompUntil and chompUntilEndOr does not consume the last string. lineComment is defined using chompUntilEndOr as follows:

lineComment : Token x -> Parser c x ()
lineComment start =
  ignorer (token start) (chompUntilEndOr "\n")

Because chompUntilEndOr updates the position but does not consume the last string, which in this case is \n, the newline is counted twice in the sps function. The second count results from the 3rd option spaces of the sps function which consumes the leftover newline at the end of the line comment. #20 has a PR that solves this bug but is still not merged. Can we review and merge it?

See and run the full test code including the workaround in Ellie.

Full source code for the test

module Main exposing (main)

import Browser
import Html exposing (Html, pre, text, h1)
import Html.Events exposing (onClick)
import Parser exposing (..)

type alias Model =
{ }

initialModel : Model
initialModel =
{ }

type Msg
= DoNothing

update : Msg -> Model -> Model
update msg model =
model

source1 : String
source1 =
"""one -- comment before newline
two -- comment before end of file"""

source2 : String
source2 =
"""one -- comment before newline
two - comment before end of file"""

view : Model -> Html Msg
view model =
pre []
[ h1 [] [ text "Normal source string without error" ]
, text <| "Source string:\n"
, text <| source1 ++ "\n\n"
, text "lineComment:\n"
, text <| (Debug.toString <| run parser source1) ++ "\n"
, text <| "Notice that two is one the 2nd line but lineComment gives back its location on the 3rd line" ++ "\n\n"
, text "lineCommentWorkAround:\n"
, text <| (Debug.toString <| run parserWorkAround source1) ++ "\n"
, text <| "lineCommentWorkAround gives back the correct location\nfor both comments before newline and before end of file"
, h1 [] [ text "misspelled '--'" ]
, text <| "Source string:\n"
, text <| source2 ++ "\n\n"
, text "lineComment\n"
, text <| (Debug.toString <| run parser source2) ++ "\n"
, text <| "Notice that the error happends at the end of the 2nd line but lineComment thnks it's on the 3rd line" ++ "\n\n"
, text "lineCommentWorkAround\n"
, text <| Debug.toString <| run parserWorkAround source2
, text <| "lineCommentWorkAround gives back the correct location"
]

parser : Parser (Located (), Located ())
parser =
succeed (Tuple.pair)
|= (located <| token "one")
|. sps
|= (located <| token "two")
|. sps
|. end

parserWorkAround : Parser (Located (), Located ())
parserWorkAround =
succeed (Tuple.pair)
|= (located <| token "one")
|. spsWorkAround
|= (located <| token "two")
|. sps
|. end

type alias Located a =
{ from : (Int, Int)
, value : a
, to : (Int, Int)
}

located : Parser a -> Parser (Located a)
located p =
succeed Located
|= getPosition
|= p
|= getPosition

sps : Parser ()
sps =
loop 0 <| ifProgress <|
oneOf
[ lineComment "--"
, multiComment "{-" "-}" Nestable
, spaces
]

spsWorkAround : Parser ()
spsWorkAround =
loop 0 <| ifProgress <|
oneOf
[ lineCommentWorkAround "--"
, multiComment "{-" "-}" Nestable
, spaces
]

lineCommentWorkAround : String -> Parser ()
lineCommentWorkAround start =
succeed () |. symbol start |. chompWhile (\c -> c /= '\n')

ifProgress : Parser a -> Int -> Parser (Step Int ())
ifProgress p offset =
succeed identity
|. p
|= getOffset
|> map (\newOffset -> if offset == newOffset then Done () else Loop newOffset)

main : Program () Model Msg
main =
Browser.sandbox
{ init = initialModel
, view = view
, update = update
}

Inconsistent internal parser state

This issue describes a bug in Elm.Kernel.Parser.findSubString.


Note: the following issues describe symptoms of this bug:

In the same way, the following pull request tries to fix the symptoms:


The Elm Parser internally keeps track of the current position in two ways:

  • as a row and a column (like a code editor)
  • as an offset into the source string.

Normally both kinds of position infos (row and column vs. offset) are in sync with each other.
(For a given source string, you can calculate both row and column from the offset and vice versa.)

The bug in Elm.Kernel.Parser.findSubString breaks this synchronicity, though.
This affects the following parsers:

  • lineComment
  • multiComment
  • chompUntil
  • chompUntilEndOr

They set...

  • row and column after the (closing) token
  • the offset before the (closing) token

Here's an example with chompUntil:

import Parser exposing ((|.), (|=), Parser)

testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
    Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
        |. Parser.chompUntil "token"
        |= Parser.getRow
        |= Parser.getCol
        |= Parser.getOffset

Parser.run testParser "< token >"
--> Ok { row = 1, col = 8, offset = 2 }

The state after the test parser is run:

  • row = 1, col = 8 (corresponding to offset = 7) --> after the token
  • offset = 2 (corresponding to row = 1, col = 3) --> before the token

The root cause for these bugs lies in the Elm.Kernel.Parser.findSubString function:

var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString)
{
var newOffset = bigString.indexOf(smallString, offset);
var target = newOffset < 0 ? bigString.length : newOffset + smallString.length;
while (offset < target)
{
var code = bigString.charCodeAt(offset++);
code === 0x000A /* \n */
? ( col=1, row++ )
: ( col++, (code & 0xF800) === 0xD800 && offset++ )
}
return __Utils_Tuple3(newOffset, row, col);
});

If the smallString is found, the returned newOffset is at the position before the smallString (the result of the indexOf function), but the new row and col after the smallString (at the target position).


Note: the following pull request tries to fix the comment of the Elm.Kernel.Parser.findSubString function
to correctly describe the buggy behavior:

Omissions in comparison with prior work

The comparison with prior work section in the docs starts of with:

I have not seen the parser pipeline or the context stack ideas in other libraries, but backtracking relate to prior work.

But it seems like there's actually a lot of prior work. (|=) and (|.) seem to have exactly the same semantics Haskell's (<*>) and (<*). To translate the example into Haskell & Parsec:

import Text.ParserCombinators.Parsec

data Point = Point
    { x :: Float
    , y :: Float
    }
    deriving(Show)

point :: Parser Point
point =
    pure Point
        <* string "("
        <* spaces
        <*> float
        <* spaces
        <* string ","
        <* spaces
        <*> float
        <* spaces
        <* string ")"

-- Not part of the original example, but I wanted this to be runnable:
float :: Parser Float
float =
    fmap read
        ( pure (++)
            <*> many digit
            <*> choice
                [ pure (:)
                    <*> char '.'
                    <*> many digit
                , pure ""
                ]
        )

Parsec also defines an operator for adding context information:

https://hackage.haskell.org/package/parsec-3.1.13.0/docs/Text-ParserCombinators-Parsec-Prim.html#v:-60--63--62-

...it looks like Elm's version of this is much more sophisticated, but it might be worth discussing how it improves on things like <?>.

Incorrect problem propagated from finalizeFloat

The problem passed to float handler for numbers parser is not used, instead the generic invalid problem is used. In other words these lines:

case floatSettings of
      Err x ->
        Bad True (fromState s invalid)

Should probably look like this:

case floatSettings of
      Err x ->
        Bad True (fromState s x) -- note the `x` instead of `invalid`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.