marwes / combine Goto Github PK

View Code? Open in Web Editor NEW

1.3K 18.0 93.0 2.3 MB

A parser combinator library for Rust

Home Page: https://docs.rs/combine/*/combine/

License: MIT License

Rust 99.86% Shell 0.14%

rust parser-combinators parser zero-copy

combine's Issues

Comparison to Nom

Could you provide a comparison with nom?

Copy/paste errors in docs of count and none_of combinators

count, none_of, and one_of all have the same document description:

Extract one token and succeeds if it is part of tokens.

which I assume is correct for one_of but not the other two. :) Additionally, the docs for count give an example of none_of:

let result = many(none_of("abc".chars()))
    .parse("edc");
assert_eq!(result, Ok((String::from("ed"), "c")));

Coherence rule changes in latest nightly break parser-combinators

The impls for Stream for &str and &[T] now conflict with that for I: Iterator + Clone.

This is due to the changes in rust-lang/rust#23867, in particular the negative reasoning as demonstrated in the Replacer example.

The problem in this case, iiuc, is that &str and &[T] could one day decide to implement Iterator.

Broken with latest nightly

Package names now have to use underscores rather than dashes. IntoCow is feature gated.

Boxed parsers should allow for unsized types

Should only need to add the ?Sized bound for P at https://github.com/Marwes/parser-combinators/blob/master/src/primitives.rs#L230.

Blocked by rust-lang/rust#21379.

Please, provide guideline with example for creating a parser

I needed to parse something that has rather simple structure, but I needed one element to be a grapheme (the rest is mostly ascii symbols with specific meanings). Now no parser library provides such parser out of the box and it is not surprising; the functionality to split string to graphemes is only provided by the unicode-segmentation crate.

I had something already slapped together in nom, except I was not (and still am not) happy with it, because it should operate on &str, but has to operate on &[u8], because nom lacks &str variants of some important primitives. So I considered rewriting it in combine, but gave up, because I had trouble figuring how to write the function that would work with parser() regarding:

How generic/specific it can be. The unicode-segmentation only works on &str (because of shortcomings of Rust iterators; another story) and I always have that to provide, but I didn't see documentation about what might/might not come in the inner parser. The example in description of parse() appears to take just &str in the closure, but I didn't see description.
How to properly construct the error. The ParseError/Error/Info construct is pretty complicated and would deserve some explanation, but I failed to run across any. And perhaps also some helpers to create the Expected and Unexpected primary errors from a stream state and (in the first case; unexpected does not need one) message easily.

Indentation aware parsers

It would be nice if there was an easy way to construct indentation aware parsers as in http://hackage.haskell.org/package/IndentParser-0.2.1 (Disclaimer: I haven't used this library seriously so I don't know how easy/well it works).

This would likely be in a separate crate such as https://github.com/Marwes/combine-language.

How do I create my own char parsers?

I can't understand how to define something simple, for example, oneOf equivalent.
Can you help me out? 😞

Incremental input

It would be nice to support reading from streams where the input was produced incrementally to avoid needing to read the entire input into memory (such as from files). Since Stream types must be able to be cloned which makes it simple to support arbitrary look-ahead but it makes it impossible to use an iterator such as ::std::io::Chars.

Without arbitrary look ahead it would be trivial to just support LL(1) parsers by adding a peek function to Stream but I don't think its worth it given how useful the try parser can (even if it is inefficient).

Not quite sure how to implement this efficiently yet though.

expr test gives an uncoditional_recursion warning

Waiting for rust-lang/rust#21705

Comparison to chomp

Chomp is another parser combinator library for Rust. It aims at being as fast or faster than parsers hand-written in C.

How does Chomp compare to Combine?

Relicense under dual MIT/Apache-2.0

This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic on IRC to discuss.

You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.

Why?

The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.

Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.

How?

To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:

## License

Licensed under either of

 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):

// Copyright 2016 combine Developers
//
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. This file may not be
// copied, modified, or distributed except according to those terms.

It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these
from the Rust repo for a plain-text
version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT OR Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.

Fast string search?

I wonder if you use or could use something like https://github.com/shepmaster/jetscii for some of the matching primitives?

About the unconditional_recursion warning in primitives.rs

Can you have a look at this please?

rust-lang/rust#21705

Is @huonw right?

`parse_lazy` has imprecise semantics

So on attempting to fix a couple of other parser combinators' behaviours on error, I've come to realise that parse_lazy doesn't have the right semantics to correctly support what we want.

Let's look at a motivating example.

let mut emptyok_then_emptyerr = (value(()).message("not expected"), char('x'));

assert_eq!(emptyok_then_emptyerr.parse(State::new("hi")),
    Err(ParseError {
       position: SourcePosition { line: 1, column: 1 },
       errors: vec![Error::Unexpected('h'.into()),
                    Error::Expected('x'.into())],
    }));

// assert fails: actual error contains the message "not expected"

Here we sequentially chain two parsers, the first of which returns EmptyOk, and the second returns EmptyErr. Now this combined tuple parser will return EmptyErr, meaning "no input was consumed, and one of my child parsers failed". However due to the semantics of parse_lazy, this will also imply "the first of my child parsers failed". Hence, when add_error is called the message "not expected" will be added, which is clearly wrong.

What's the fix? parse_lazy should not conflate "no input was consumed" with "first parser failed." So it seems like we have two options:

Make parse_lazy semantics tighter. This will probably entail returning a separate flag "was actually lazy", which becomes the thing that specifies whether add_error should be called or not. (One thing which I'm not sure about in this scenario is whether the Consumed flag is still useful or not? Is it currently important anywhere other than for determining laziness?)
Keep the semantics of parse_lazy but rewrite some current parsers so they are not lazy. This loses us some amount of speed. Notably the tuple and then combinators will have to become non-lazy.

Choice with multiple parser types

It doesn't seem possible to use choice with different types: consider an assignment of a variable:

a = b.

I would express this with (env.identifier(), env.symbol("="), choice([env.integer(), env.identifier()]), but the compiler complains that it expects an i64 instead of a String.

How can I express this without writing two separate parsers?

Relicense under dual MIT/Apache-2.0

You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.

Why?

How?

## License

Licensed under either of

 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):

// Copyright 2016 parser-combinators Developers
//
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. This file may not be
// copied, modified, or distributed except according to those terms.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these
from the Rust repo for a plain-text
version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT OR Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

An example with stdin

What's the right way to parse input from stdin? I'm hitting lifetime issues when reading into a String and passing that to parse.

Many acts as it is consumed if passed a consumed input

Bug found by @avwhite with a minimal reproducing example.

Should be able to fix this later today.

EDIT: Example is actually wrong since you should expect it to return Consumed due to spaces consuming one char. Updated with correct example (which needs to manually create a state or this wont work).

extern crate "parser-combinators" as parser;

use parser::{many, any_char, Parser};
use parser::primitives::{State, Consumed, SourcePosition};

fn main() {
        let state = State { position: SourcePosition { line: 1, column: 1 }, input: "", consumed: Consumed::Consumed };
        let result = many(any_char as fn (_) -> _)
            .parse_state(state);

        assert!(result.is_ok());
        assert_eq!(result.unwrap().1.consumed, Consumed::Empty);

}

Add a combinator which returns the elements of a stream between two points

stream_parser((skip_many(digit()), token('.'), skip_many(digit()))).parse("123.456+-")
// The output of the parser is a string. "123.456"

https://gitter.im/Marwes/combine?at=58a0813cf045df0a223a0f0c

There should probably two variants of this parser, one which accumulates the tokens of these parsers using FromIterator (similiar to many) and one which is zero copy (similiar to take_while)

uncons_while should take an item reference

I'd like to understand some of the API decisions the library makes. Like in #74 , my motivating use case is to parse a token stream (not a character stream), and my tokens do not implement Copy because they include String data, like:

struct Token {
    Keyword(String),
    ...
}

While implementing RangeStream for a newtype wrapping &'a [Token] (like SliceStream but with item type Token rather item type &'a Token), I've run into the following question. Why does uncons_while take an FnMut(Item) rather than FnMut(&Item) at https://github.com/Marwes/combine/blob/master/src/primitives.rs#L562?

In general, the Rust-y pattern seems to be that predicate filters take references to the elements in their containing collection -- but I'm not an experienced Rust user so I'm probably missing something. Any guidance appreciated.

Unicode support

Does your crate support Unicode character classes and normalization?

Parsec parsers that are not implemented

Below is a list of (hopefully) all the parsers which exist in parsec but not in this library. I added some comments on a few of them about their usefulness. If there is a parser you would argue for its inclusion (or exclusion) please leave a comment.

Parser combinators

option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a
optionMaybe :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a)
optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()

The 'optionMaybe' parser is called 'optional' in this lib which can cover all cases.

count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a

Char parsers

oneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
noneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
endOfLine :: Stream s m Char => ParsecT s u m Char

satisfy covers all of these cases. These parsers could give better error reporting though which is an argument for their inclusion.

Added parsers:

newline :: Stream s m Char => ParsecT s u m Char
crlf :: Stream s m Char => ParsecT s u m Char
tab :: Stream s m Char => ParsecT s u m Char
upper :: Stream s m Char => ParsecT s u m Char
lower :: Stream s m Char => ParsecT s u m Char
alphaNum :: Stream s m Char => ParsecT s u m Char
letter :: Stream s m Char => ParsecT s u m Char
digit :: Stream s m Char => ParsecT s u m Char
hexDigit :: Stream s m Char => ParsecT s u m Char
octDigit :: Stream s m Char => ParsecT s u m Char

char :: Stream s m Char => Char -> ParsecT s u m Char

chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

skipMany :: ParsecT s u m a -> ParsecT s u m ()
skipMany1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()

Equivalent to many(parser.map(|_| ())) since the many parser should not allocate the vector for zero sized values.
Added since the above example will not compile without type annotations any longer and it is not obvious that it does not allocate any memory.

choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a

Added as choice_slice and choice_vec, might be generalized further later on.

anyToken :: (Stream s m t, Show t) => ParsecT s u m t

Added as the any parser.

eof :: (Stream s m t, Show t) => ParsecT s u m ()

Added as the eof parser.

Build broken on latest Rust nightly (27901849e 2015-03-25)

/home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/parser-combinators-0.2.4/src/combinator.rs:156:36: 156:37 error: unable to infer enough type information about `_`; type annotations required [E0282]
/home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/parser-combinators-0.2.4/src/combinator.rs:156     NotFollowedBy(try(parser).then(f as fn (_) -> _)
                                                                                                                                               ^
error: aborting due to previous error
Build failed, waiting for other jobs to finish...

See here for the whole build. I don't think this one's my fault.

Move/copy examples to examples/

It more standard, and it allows to easily run them.

token and SliceStream do not play together well

First, thanks for the library. It's a fun way to learn Rust!

I'd like to understand some of the API decisions the library makes. First, my motivating use case is to parse a token stream (not a character stream), and my token's do not implement Copy because they include String data, like:

struct Token {
    Keyword(String),
    ...
}

If I use SliceStream, I can parse &[Token]. However, I can't really use token, because the Item type of SliceStream is &'a Token and that's not something I can create with the correct lifetime. It seems to me like token should have where Item : Clone and explicitly .clone() it's argument. I see at https://github.com/Marwes/combine/blob/master/src/combinator.rs#L200 that we can't actually parse without Item : Clone.

Is this deliberate? Oversight? Am I using SliceStream incorrectly?

Unable to use vector as stream

I am using my own type Token as the item for my parser, with a type AST as the output. Token has a TokenPositioner which is an exact copy of BytePositioner. I have a top level parser with the signature

fn command<I>(input: State<I>) -> ParseResult<AST, I, Token> where I: Stream<Item=Token>

when I try to use this as a parser on a Vec named tokens like so

parser(command).parse(from_iter(tokens.iter()))

It prints the error

src/lexer.rs:127:21: 127:52 error: type mismatch resolving `<parser_combinators::primitives::IteratorStream<core::slice::Iter<'_, tokenizer::Token>> as parser_combinators::primitives::Stream>::Item == tokenizer::Token`:
 expected &-ptr,
    found enum `tokenizer::Token` [E0271]
src/lexer.rs:127     parser(command).parse(from_iter(tokens.iter()))
                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/lexer.rs:127:21: 127:52 help: run `rustc --explain E0271` to see a detailed explanation

Any advice you can offer would be helpful. Thanks.

Code organisation for larger parser

I have a 400 lines nom parser I consider to convert to combine (mostly for the better error messages). How would you organize it? I like being able to cleanly separate each small parser. How can I do that with combine?

Overload operators as parser combinators

Since all parsers are currently functions or methods I find that large parsers often become a bit of a word soup. Large chains of parser can become rather hard to read. I have some ideas for what might be useful to implement which I document below.

Tuples

Tuples could allow parsers which should be applied in sequence to be written as.

string("if").with(expr()).skip(string("then")).and(expr()).skip(string("else")).and(expr())
    .map(|(b, (t, f))| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))

//With tuples
(string("if"), expr(), string("then"), expr(), string("else"), expr())
    .map(|(_, b, _, t, _, f)| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))

Strings and character literals

Strings and character literals could implement parser directly allow them to be written without string and char.

("if", expr(), "then", expr(), "else", expr())
    .map(|(_, b, _, t, _, f)| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))

Use std::ops::* traits

The most likely candidate here is overloading | to work the same as the or parser. Unfortunately this won't work directly without changing the library rather radically since it is not possible to implement it as below.

impl <A: Parser, B: Parser> BitOr<B> for A {
     type Output = Or<A, B>;
    fn bitor(self, other: B) -> Or<A, B> { self.or(other) }
}

This will not work since A and B due to coherence (A and B must appear inside some local type). The same applies for any other operator.

Since all of these could be seen as being to clever with the syntax it would be nice to have some feedback on which of these (if any) that may be good to implement.

Document the role of PhantomType/PhantomData

Hi @Marwes, you've been so helpful understanding combine that I wonder if you could save me some more experimentation. Can you explain why PhantomType and PhantomData are present in the codebase? I'm aware this works around some limitations in the Rust type checker but it would help to spell out the motivating case for combine. If you can explain or link to the relevant changesets, I will try to add a note to the README. Thanks!

API request for combining satisfy and map.

I need to parse any of a set of characters, then return different values depending on that character. To do this, I use satisfy and then map. Simplified example:

satisfy(|c| c == 'x' || c == 'y' || c == 'z')
    .map(|c| match c { 'x' => 42, 'y' => 17, 'z' => 0 })

Is there an existing way to eliminate the redundancy in my code, while ensuring the parser only consumes known characters? If not, the API I'm looking for is something like:

satisfy_mapped(|c| match c { 'x' => Some(42), 'y' => Some(17), 'z' => Some(0), _ => None })

And Then or Flat Map function

Here's a feature request for something I think would be useful.

The map function can be used to apply a function to the successful result of a parser. I propose the addition of an additional function which applies a function returning Result to the successful result of a parser. If that function returned an error, the error value would be carried over as the ParseResult's error value, otherwise, it carries over the new success value.

I'm not quite sure what the idiomatic Rust name for this would be. In Scala, which I'm more familiar with, this is referred to as flatMap, while Rust's Result type calls it and_then. Whatever it's called, I think this would be a pretty useful feature.

Example of higher order parser generating functions.

I'm having trouble implementing higher order functions that create parsers. My latest attempt looks like this:

pub fn keyword(kw: &'static str) -> Box<Fn(&str) -> ParseResult<(), &str>>
{
    use combine::{ParserExt, string, value};

    assert!(KEYWORDS.contains(&kw));

    Box::new(|input| {
        string(kw)
            .with(value(()))
            .parse_state(input)
    })
}

Later I want to compose keyword in different ways. Here's an attempted use in a unit test:

parser(keyword("let")).skip(eof()).parse(test_case)

Variations on the types I've tried include:

dropping Box<…> from the return value of keyword - Sized is not satisfied.
dereferencing the result of keyword("let") - Sized is not satisfied.

How can I make this work?

Note: if there's a better way to implement keyword itself, that's of secondary interest. My primary interest is to learn how to create higher order parser generators.

How to parse remaining input based on part of parsed input

Hi @Marwes , first of all thanks for the library! Also I apologize if that's not the appropriate place to ask for help.

I'm trying to parse an amount of chars based on a parsed length. For instance, the parser I'm trying to build should return announce for a input like 8:announce. 8 is the length of the string after :. I tried a couple of things like parsing the length and then folding as many as the size to combine several steps of letter()s into one parser. I'm probably missing something obvious since I'm not an expert in Rust but would appreciate any input from your side.

fn string<I>(input: State<I>) -> ParseResult<BValue, I> where I: Stream<Item=char>                                                                    
{                                                                                                                                                     
  let length = many1(digit());                                                                                                                        
  length                                                                                                                                              
    .map(|string: str::String| {                                                                                                                      
      string.parse::<i32>().unwrap()                                                                                                                  
    })                                                                                                                                                
    .skip(char(':'))                                                                                                                                  
    .then(|decimal| {                                                                                                                                 
      (0..decimal).fold(letter().map(|l| l), move |p, _| {                                                                                            
        p.and(letter()).map(|(l1, l2)| l1 + l2 )                                                                                                      
      })                                                                                                                                              
    })                                                                                                                                                
    .parse_state(input)                                                                                                                               
}

Code above doesn't compile and probably doesn't make sense. However, I hope it will give you the picture of what I'm trying to accomplish.

Thanks!

Make all parsers cloneable

I'm quite new to Rust so excuse me if this makes no sense, but would it be possible to make all the combined parsers (e.g. Choice) implement the Clone trait?

Document/explain Positioner for parsing third-party types

Over in mozilla/mentat#164 I'm trying to use combine to parse token streams. The tokens are equivalent to Serde's json::Value type. My Stream is therefore &'a [Value] where Value is third-party.

Sadly, this means that I can't implement Positioner on Value, since the trait originates in a combine module (crate?) and the type originates in a serde module (crate?), and Rust doesn't allow trait implementations if you don't originate at least one of the trait or the type.

In #rust, two solutions were posed:

Define the implementation in the Value crate. We can do that since we originate the Value type, but we want to ship that crate as an independent library and it makes no sense to require combine for that purpose. Perhaps there is some way to opt-in using Rust features?
Wrap Value in a newtype and implement Positioner on the newtype. We can do this, but our types and parser are already pretty damn verbose and unwrapping yet another layer of indirection is a frustration.

Can you suggest another solution? Can you suggest a way to associate a Positioner that would do better things for &'a [T], arbitrary T? (Note that we want a real Positioner that uses information from the implementing Value type because eventually we're going to keep parsed ranges in the Value type.)

Thanks!

Raw line number

Using the last example from https://docs.rs/combine/2.0.0/combine/,

fn main() {
    if let Err(e) = parser(expr).parse("[") {
        println!("{}", e);
    }
}

outputs

Parse error at 94334017221597
Unexpected end of input
Expected ]

instead of the beautiful line: 1, column: 1 I was promised XD

Make it possible for `combine::primitives::ParseError` to be `std::error::Error` for more range types

Over in https://github.com/mozilla/mentat, we're using combine to parse &'a [edn::Value] slices. Here, edn::Value is a generic value type conceptually similar to JSON. What is happening here is not specific to edn::Value, so I'll use T: we're using combine to parse &'a [T] streams.

The error type of such a Parser is combine::primitives::ParseError<&'a [T]>. Now, around

combine/src/primitives.rs

Line 449 in 6f2cec6

impl<S> StdError for ParseError<S>

, you'll see that such an error will be a std::error::Error only when the underlying Stream::Range is std::fmt::Display. However, no slice &'a [T] can ever satisfy this trait bound due to rust-lang/rust#39328.

I can think of the following ways to work around this issue:

wait for the Rust specialization feature to land in stable, so that we can specialize the implementation of std::error::Error for &'a [edn::Value].
Even with specialization, this may not allow to solve the problem, depending on the restrictions on third-party types.
wrap our &'a [T] streams in a newtype implementing Stream, and wrap the Stream::Range type as well.
I have done this locally; it's not hard, but I expect it to have a non-trivial runtime cost, since all ranges need to be wrapped and unwrapped. I'd be surprised if the compiler could make this zero-cost, but I'd love to be amazed.
add a new function to the Range trait specifically for wrapping or otherwise helping format Range instances.
I doubt this would be difficult, but it would be a breaking API change: existing consumers with non-default Range implementations would have to add the new method, even if they didn't care about this issue (which they can't have been using without significant effort).
add a new RangeDisplay trait in combine, define it for the Range types in combine, and expect that in the std::error::Error implementation.
This is the least intrusive solution, but ties all Range implementations to a single display format. I think this is okay, though -- it's already the case that &str ranges have a single display format.

@Marwes, have you thought about this problem? Is there an alternate approach you can suggest? Would you accept a patch for one of the final two proposed solutions?

Types for messages inside errors and error parsers

Currently the Error enum contains String as the type holding the message for each error. This is obviously really easy to work with since regardless of what we want to say in the message we can always format it into a string. This is obviously not very efficient when our error messages are static as they often are. Currently I am leaning towards replacing the String type with Cow<'static, String, str> inside the Error enum which should at least avoid the allocation for errors with a static string as a message.

Other approaches could be something like allowing functions or boxed closure as "error messages" (calling the function would produce the message) but maybe this is overkill.

Type mismatch when using sep_by

Not sure if this is a bug, but it seems like it should work. This code fails to compile:

extern crate combine;
use combine::{digit, letter, sep_by, token, Parser, ParserExt};

fn main() {
    let p = sep_by::<Vec<_>, _, _>(digit(), token(','));
    let result = letter().or(p).parse("1,2,3");
}

With this error:

type mismatch resolving `<&str as combine::primitives::Stream>::Item == collections::vec::Vec<char>`:
 expected char,
    found struct `collections::vec::Vec` [E0271]
src/main.rs:6     let result = letter().or(p).parse("1,2,3");

However, this works just fine:

extern crate combine;
use combine::{digit, letter, token, Parser, ParserExt};

fn main() {
    let result = letter().or(digit()).parse("1,2,3");
}

first and follow sets with combine?

Do you see any way a user could implement first and follow sets using combine? I'm not sure if it's possible to add items into the stream while in the middle of parsing that same stream, but it seems impossible.

Choice with different parser types

I saw this mentioned in #54, but I'm not sure exactly how to get this to work. I'm a bit new to rust, so that's also probably related.

I'm trying to write something that can parse an escaped string, e.g. "Hello \n \"World\"" and for this one of my helpers is this:

    let esc_char = char('\\').with(choice([
        char('\\').with(value('\\')),
        char('"').with(value('"')),
        char('0').with(value('\0')),
        char('n').with(value('\n')),
        char('t').with(value('\t')),
        char('r').with(value('\r')),
    ]));

You'll notice that the first 2 with method calls are completely useless functionally, but they're the only thing making this expression type-check. I've been trying with

    let esc_char = char('\\').with(choice::<&mut [&mut Parser<Input = &str, Output = char>; 6], _>(&mut [
        char('\\').with(value('\\')),
        char('"').with(value('"')),

but I get these errors:

src/main.rs:36:9: 36:37 error: mismatched types:
 expected `_`,
    found `combine::combinator::With<combine::combinator::Token<_>, combine::combinator::Value<_, char>>`
(expected &-ptr,
    found struct `combine::combinator::With`) [E0308]
src/main.rs:36         char('\\').with(value('\\')),
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/main.rs:36:9: 36:37 help: run `rustc --explain E0308` to see a detailed explanation
src/main.rs:34:100: 42:6 error: mismatched types:
 expected `&mut [&mut combine::Parser<Input=&str, Output=char>; 6]`,
    found `&mut [combine::combinator::With<combine::combinator::Token<_>, combine::combinator::Value<_, char>>; 6]`
(expected &-ptr,
    found struct `combine::combinator::With`) [E0308]
src/main.rs:34     let esc_char = char('\\').with(choice::<&mut [&mut Parser<Input = &str, Output = char>; 6], _>(&mut [
src/main.rs:35         char('\\').with(value('\\')),
src/main.rs:36         char('"').with(value('"')),

Do you have any idea what I'm doing wrong?

sep_by(any(), char(','))

Is there an uncomplicated way to achieve a parser that does sep_by(any(), char(','))?

I currently have something like sep_by(parser(other), char(',')) where other is a separate parser that just matches any(), but that blows right through every instance of , due to the any() parser.

What I want is something that works the same as sep_by, but parsing the separator takes precedence over parsing the separated values. I don't want the other parser to need to be aware of the context in which it is used.

Bounds on `std::error::Error` implementation for `ParseError<P>` too restrictive

Commit 1e500c6 changed the bounds on the implementation of std::error::Error for ParseError<P>. It now requires, among others, P::Position: Positioner which I think is unnecessarily restrictive and in fact not the case for existing instances of P, e.g. char and u8.

Stabilization

Currently nothing in the library has any stability attributes set which will be a problem once stable rust starts to roll out. For the high level API (thinking about function names and what their purpose are) I feel that it is mostly stable enough that I could commit to the design for a 1.0 so to speak. The lower level details are more prone to change however.

The SourcePosition type is likely going to be made generic in some way. Currently though, the easiest/least intrusive design is currently blocked by rust-lang/rust#21903.
It is unlikely but I might want to change the signature of the parse functions to take the state by &mut instead. Currently this alternate design is both slower and less intuitive but when attempting to optimize the error creation I needed some additional state struct State which evidently made the struct large enough that passing by reference would be beneficial. Since I only achieved around 40% speedup I am not convinced that the added complexity makes it worth it.

If nothing changes between now and the release of rust 1.0 that makes me able implement these changes I will likely stabilize what is here now and probably integrate these changes into a potential 2.0 version instead.

fn parse(&mut self, input: &mut State<I>) -> Result<Output, Consumed<Error>>

"choice" / "or" examples missing "try" ?

I just had a heck of a time figuring out why the following examples both yield errors:

choice([
    string("one"),
    string("two"),
    string("three"),
]).parse("two");

string("this").or(string("that")).parse("that");

From what I can tell, for either of these examples to work properly, they need to be written as follows:

choice([
    try(string("one")),
    try(string("two")),
    try(string("three")),  // optional, but nice for consistency
]).parse("two");

try(string("this")).or(string("that")).parse("that");

I'm not sure which behavior is desirable / whether it's the documentation or the code that should be updated. In either case, I would be happy to take a stab at throwing together some pull requests here. Combine has been saving me a ton of hassle.

Documentation link in crates.io is broken

https://crates.io/crates/parser-combinators links to https://marwes.github.io/parser-combinators/parser_combinators/index.html which doesn't exist.

Starting on 2.0

Having encountered a few problems when using combine to implement a new parser for embed_lang I think its about time to start thinking how combine-2.0 would look like as the changes I would like to make will break the current API. I still plan to merge the [buffered_stream] and [range_stream] features into 1.0 and 2.0 should not be expected to be widely usable for quite some time even after the features below are implemented.

~~Move the associated type Input on Parser to an argument instead. As Input is an input type on the trait this will let parsers be a bit more flexible with not much downside.~~ Inference often fails with this change.
Make Stream types store the position internally instead of relying to the State wrapper. This should let streams be a bit more flexible when it comes to position handling.
Possibly changing the exported API. Personally I feel that the root module reexports a bit to much currently while still missing a few functions/types. The combinator module has also become rather large and should possibly be split.

If anyone has any other changes or ideas which would break the current API please comment and discuss them in this issue.

Extremely long compile times

I know this issue is a little vague, so please bear with me. As I've started using parser-combinators, I've noticed that the compile times for my project have sharply increased; more than I would expect from the amount of code I've written. cargo build now takes 8 to 11 minutes to complete. This is on a fairly new machine, too - I have a MacBook Pro with a 2.5GHz quad-core i7, and I've generally seen very good performance from cargo/rustc.

I guess I'm just curious to know what's causing these very long compile times. Is this to be expected when using this library, am I misusing it in some way that's confusing rustc, or is something else wrong? Of course, it's likely difficult to determine exactly what's going on, but I'd welcome any additional information.

Having trouble making recursive parsers

I am building a parser for a language of my imagination, I am having problem with recursive parsers. Take a look at this gist:

https://gist.github.com/Ralle/336941d1472a598d1121

I can parse ints and I can parse floats, but as soon as I make a pair using '(int, int)' it expects a float. I don't understand why this happens. I tried switching around the int_parser and float_parser but then it would obviously not parse floats as it succesfully parses the int and then don't know what to do with the '.'.

Please advice.

marwes / combine Goto Github PK

combine's Issues

Why?

How?

Contributor checkoff

Why?

How?

Contributor checkoff

Parser combinators

Char parsers

Added parsers:

Tuples

Strings and character literals

Use std::ops::* traits

Recommend Projects

Recommend Topics

Recommend Org

Jobs