marwes / combine Goto Github PK
View Code? Open in Web Editor NEWA parser combinator library for Rust
Home Page: https://docs.rs/combine/*/combine/
License: MIT License
A parser combinator library for Rust
Home Page: https://docs.rs/combine/*/combine/
License: MIT License
Could you provide a comparison with nom?
count
, none_of
, and one_of
all have the same document description:
Extract one token and succeeds if it is part of tokens.
which I assume is correct for one_of
but not the other two. :) Additionally, the docs for count
give an example of none_of
:
let result = many(none_of("abc".chars()))
.parse("edc");
assert_eq!(result, Ok((String::from("ed"), "c")));
The impl
s for Stream
for &str
and &[T]
now conflict with that for I: Iterator + Clone
.
This is due to the changes in rust-lang/rust#23867, in particular the negative reasoning as demonstrated in the Replacer
example.
The problem in this case, iiuc, is that &str
and &[T]
could one day decide to implement Iterator
.
Package names now have to use underscores rather than dashes. IntoCow
is feature gated.
Should only need to add the ?Sized bound for P at https://github.com/Marwes/parser-combinators/blob/master/src/primitives.rs#L230.
Blocked by rust-lang/rust#21379.
I needed to parse something that has rather simple structure, but I needed one element to be a grapheme (the rest is mostly ascii symbols with specific meanings). Now no parser library provides such parser out of the box and it is not surprising; the functionality to split string to graphemes is only provided by the unicode-segmentation crate.
I had something already slapped together in nom, except I was not (and still am not) happy with it, because it should operate on &str
, but has to operate on &[u8]
, because nom lacks &str
variants of some important primitives. So I considered rewriting it in combine
, but gave up, because I had trouble figuring how to write the function that would work with parser()
regarding:
How generic/specific it can be. The unicode-segmentation
only works on &str
(because of shortcomings of Rust iterators; another story) and I always have that to provide, but I didn't see documentation about what might/might not come in the inner parser. The example in description of parse()
appears to take just &str
in the closure, but I didn't see description.
How to properly construct the error. The ParseError
/Error
/Info
construct is pretty complicated and would deserve some explanation, but I failed to run across any. And perhaps also some helpers to create the Expected and Unexpected primary errors from a stream state and (in the first case; unexpected does not need one) message easily.
It would be nice if there was an easy way to construct indentation aware parsers as in http://hackage.haskell.org/package/IndentParser-0.2.1 (Disclaimer: I haven't used this library seriously so I don't know how easy/well it works).
This would likely be in a separate crate such as https://github.com/Marwes/combine-language.
I can't understand how to define something simple, for example, oneOf equivalent.
Can you help me out? ๐
It would be nice to support reading from streams where the input was produced incrementally to avoid needing to read the entire input into memory (such as from files). Since Stream
types must be able to be cloned which makes it simple to support arbitrary look-ahead but it makes it impossible to use an iterator such as ::std::io::Chars
.
Without arbitrary look ahead it would be trivial to just support LL(1) parsers by adding a peek function to Stream
but I don't think its worth it given how useful the try
parser can (even if it is inefficient).
Not quite sure how to implement this efficiently yet though.
Waiting for rust-lang/rust#21705
Chomp is another parser combinator library for Rust. It aims at being as fast or faster than parsers hand-written in C.
How does Chomp compare to Combine?
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:
## License
Licensed under either of
* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):
// Copyright 2016 combine Developers
//
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. This file may not be
// copied, modified, or distributed except according to those terms.
It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT OR Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
I wonder if you use or could use something like https://github.com/shepmaster/jetscii for some of the matching primitives?
So on attempting to fix a couple of other parser combinators' behaviours on error, I've come to realise that parse_lazy
doesn't have the right semantics to correctly support what we want.
Let's look at a motivating example.
let mut emptyok_then_emptyerr = (value(()).message("not expected"), char('x'));
assert_eq!(emptyok_then_emptyerr.parse(State::new("hi")),
Err(ParseError {
position: SourcePosition { line: 1, column: 1 },
errors: vec![Error::Unexpected('h'.into()),
Error::Expected('x'.into())],
}));
// assert fails: actual error contains the message "not expected"
Here we sequentially chain two parsers, the first of which returns EmptyOk
, and the second returns EmptyErr
. Now this combined tuple parser will return EmptyErr
, meaning "no input was consumed, and one of my child parsers failed". However due to the semantics of parse_lazy
, this will also imply "the first of my child parsers failed". Hence, when add_error
is called the message "not expected" will be added, which is clearly wrong.
What's the fix? parse_lazy
should not conflate "no input was consumed" with "first parser failed." So it seems like we have two options:
Make parse_lazy
semantics tighter. This will probably entail returning a separate flag "was actually lazy", which becomes the thing that specifies whether add_error
should be called or not. (One thing which I'm not sure about in this scenario is whether the Consumed flag is still useful or not? Is it currently important anywhere other than for determining laziness?)
Keep the semantics of parse_lazy
but rewrite some current parsers so they are not lazy. This loses us some amount of speed. Notably the tuple
and then
combinators will have to become non-lazy.
It doesn't seem possible to use choice with different types: consider an assignment of a variable:
a = b
.
I would express this with (env.identifier(), env.symbol("="), choice([env.integer(), env.identifier()])
, but the compiler complains that it expects an i64
instead of a String
.
How can I express this without writing two separate parsers?
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:
## License
Licensed under either of
* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):
// Copyright 2016 parser-combinators Developers
//
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. This file may not be
// copied, modified, or distributed except according to those terms.
It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT OR Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
What's the right way to parse input from stdin? I'm hitting lifetime issues when reading into a String
and passing that to parse
.
Bug found by @avwhite with a minimal reproducing example.
Should be able to fix this later today.
EDIT: Example is actually wrong since you should expect it to return Consumed due to spaces consuming one char. Updated with correct example (which needs to manually create a state or this wont work).
extern crate "parser-combinators" as parser;
use parser::{many, any_char, Parser};
use parser::primitives::{State, Consumed, SourcePosition};
fn main() {
let state = State { position: SourcePosition { line: 1, column: 1 }, input: "", consumed: Consumed::Consumed };
let result = many(any_char as fn (_) -> _)
.parse_state(state);
assert!(result.is_ok());
assert_eq!(result.unwrap().1.consumed, Consumed::Empty);
}
stream_parser((skip_many(digit()), token('.'), skip_many(digit()))).parse("123.456+-")
// The output of the parser is a string. "123.456"
https://gitter.im/Marwes/combine?at=58a0813cf045df0a223a0f0c
There should probably two variants of this parser, one which accumulates the tokens of these parsers using FromIterator
(similiar to many
) and one which is zero copy (similiar to take_while
)
I'd like to understand some of the API decisions the library makes. Like in #74 , my motivating use case is to parse a token stream (not a character stream), and my tokens do not implement Copy
because they include String
data, like:
struct Token {
Keyword(String),
...
}
While implementing RangeStream
for a newtype wrapping &'a [Token]
(like SliceStream
but with item type Token
rather item type &'a Token
), I've run into the following question. Why does uncons_while
take an FnMut(Item)
rather than FnMut(&Item)
at https://github.com/Marwes/combine/blob/master/src/primitives.rs#L562?
In general, the Rust-y pattern seems to be that predicate filters take references to the elements in their containing collection -- but I'm not an experienced Rust user so I'm probably missing something. Any guidance appreciated.
Does your crate support Unicode character classes and normalization?
Below is a list of (hopefully) all the parsers which exist in parsec but not in this library. I added some comments on a few of them about their usefulness. If there is a parser you would argue for its inclusion (or exclusion) please leave a comment.
option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a
optionMaybe :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a)
optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()
The 'optionMaybe' parser is called 'optional' in this lib which can cover all cases.
count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]
endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a
manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]
lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a
oneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
noneOf :: Stream s m Char => [Char] -> ParsecT s u m Char
endOfLine :: Stream s m Char => ParsecT s u m Char
satisfy covers all of these cases. These parsers could give better error reporting though which is an argument for their inclusion.
newline :: Stream s m Char => ParsecT s u m Char
crlf :: Stream s m Char => ParsecT s u m Char
tab :: Stream s m Char => ParsecT s u m Char
upper :: Stream s m Char => ParsecT s u m Char
lower :: Stream s m Char => ParsecT s u m Char
alphaNum :: Stream s m Char => ParsecT s u m Char
letter :: Stream s m Char => ParsecT s u m Char
digit :: Stream s m Char => ParsecT s u m Char
hexDigit :: Stream s m Char => ParsecT s u m Char
octDigit :: Stream s m Char => ParsecT s u m Char
char :: Stream s m Char => Char -> ParsecT s u m Char
chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
skipMany :: ParsecT s u m a -> ParsecT s u m ()
skipMany1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()
Equivalent to many(parser.map(|_| ())) since the many parser should not allocate the vector for zero sized values.
Added since the above example will not compile without type annotations any longer and it is not obvious that it does not allocate any memory.
choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a
Added as choice_slice and choice_vec, might be generalized further later on.
anyToken :: (Stream s m t, Show t) => ParsecT s u m t
Added as the any
parser.
eof :: (Stream s m t, Show t) => ParsecT s u m ()
Added as the eof
parser.
/home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/parser-combinators-0.2.4/src/combinator.rs:156:36: 156:37 error: unable to infer enough type information about `_`; type annotations required [E0282]
/home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/parser-combinators-0.2.4/src/combinator.rs:156 NotFollowedBy(try(parser).then(f as fn (_) -> _)
^
error: aborting due to previous error
Build failed, waiting for other jobs to finish...
See here for the whole build. I don't think this one's my fault.
It more standard, and it allows to easily run them.
First, thanks for the library. It's a fun way to learn Rust!
I'd like to understand some of the API decisions the library makes. First, my motivating use case is to parse a token stream (not a character stream), and my token's do not implement Copy
because they include String
data, like:
struct Token {
Keyword(String),
...
}
If I use SliceStream
, I can parse &[Token]
. However, I can't really use token
, because the Item
type of SliceStream
is &'a Token
and that's not something I can create with the correct lifetime. It seems to me like token
should have where Item : Clone
and explicitly .clone()
it's argument. I see at https://github.com/Marwes/combine/blob/master/src/combinator.rs#L200 that we can't actually parse without Item : Clone
.
Is this deliberate? Oversight? Am I using SliceStream
incorrectly?
I am using my own type Token
as the item for my parser, with a type AST as the output. Token has a TokenPositioner which is an exact copy of BytePositioner. I have a top level parser with the signature
fn command<I>(input: State<I>) -> ParseResult<AST, I, Token> where I: Stream<Item=Token>
when I try to use this as a parser on a Vec named tokens
like so
parser(command).parse(from_iter(tokens.iter()))
It prints the error
src/lexer.rs:127:21: 127:52 error: type mismatch resolving `<parser_combinators::primitives::IteratorStream<core::slice::Iter<'_, tokenizer::Token>> as parser_combinators::primitives::Stream>::Item == tokenizer::Token`:
expected &-ptr,
found enum `tokenizer::Token` [E0271]
src/lexer.rs:127 parser(command).parse(from_iter(tokens.iter()))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/lexer.rs:127:21: 127:52 help: run `rustc --explain E0271` to see a detailed explanation
Any advice you can offer would be helpful. Thanks.
I have a 400 lines nom
parser I consider to convert to combine
(mostly for the better error messages). How would you organize it? I like being able to cleanly separate each small parser. How can I do that with combine
?
Since all parsers are currently functions or methods I find that large parsers often become a bit of a word soup. Large chains of parser can become rather hard to read. I have some ideas for what might be useful to implement which I document below.
Tuples could allow parsers which should be applied in sequence to be written as.
string("if").with(expr()).skip(string("then")).and(expr()).skip(string("else")).and(expr())
.map(|(b, (t, f))| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))
//With tuples
(string("if"), expr(), string("then"), expr(), string("else"), expr())
.map(|(_, b, _, t, _, f)| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))
Strings and character literals could implement parser directly allow them to be written without string
and char
.
("if", expr(), "then", expr(), "else", expr())
.map(|(_, b, _, t, _, f)| Expr::IfElse(Box::new(b), Box::new(t), Some(Box::new(f))))
The most likely candidate here is overloading |
to work the same as the or
parser. Unfortunately this won't work directly without changing the library rather radically since it is not possible to implement it as below.
impl <A: Parser, B: Parser> BitOr<B> for A {
type Output = Or<A, B>;
fn bitor(self, other: B) -> Or<A, B> { self.or(other) }
}
This will not work since A and B due to coherence (A and B must appear inside some local type). The same applies for any other operator.
Since all of these could be seen as being to clever with the syntax it would be nice to have some feedback on which of these (if any) that may be good to implement.
Hi @Marwes, you've been so helpful understanding combine
that I wonder if you could save me some more experimentation. Can you explain why PhantomType
and PhantomData
are present in the codebase? I'm aware this works around some limitations in the Rust type checker but it would help to spell out the motivating case for combine
. If you can explain or link to the relevant changesets, I will try to add a note to the README. Thanks!
I need to parse any of a set of characters, then return different values depending on that character. To do this, I use satisfy
and then map
. Simplified example:
satisfy(|c| c == 'x' || c == 'y' || c == 'z')
.map(|c| match c { 'x' => 42, 'y' => 17, 'z' => 0 })
Is there an existing way to eliminate the redundancy in my code, while ensuring the parser only consumes known characters? If not, the API I'm looking for is something like:
satisfy_mapped(|c| match c { 'x' => Some(42), 'y' => Some(17), 'z' => Some(0), _ => None })
Here's a feature request for something I think would be useful.
The map
function can be used to apply a function to the successful result of a parser. I propose the addition of an additional function which applies a function returning Result
to the successful result of a parser. If that function returned an error, the error value would be carried over as the ParseResult
's error value, otherwise, it carries over the new success value.
I'm not quite sure what the idiomatic Rust name for this would be. In Scala, which I'm more familiar with, this is referred to as flatMap
, while Rust's Result
type calls it and_then
. Whatever it's called, I think this would be a pretty useful feature.
I'm having trouble implementing higher order functions that create parsers. My latest attempt looks like this:
pub fn keyword(kw: &'static str) -> Box<Fn(&str) -> ParseResult<(), &str>>
{
use combine::{ParserExt, string, value};
assert!(KEYWORDS.contains(&kw));
Box::new(|input| {
string(kw)
.with(value(()))
.parse_state(input)
})
}
Later I want to compose keyword
in different ways. Here's an attempted use in a unit test:
parser(keyword("let")).skip(eof()).parse(test_case)
Variations on the types I've tried include:
Box<โฆ>
from the return value of keyword
- Sized
is not satisfied.keyword("let")
- Sized
is not satisfied.How can I make this work?
Note: if there's a better way to implement keyword
itself, that's of secondary interest. My primary interest is to learn how to create higher order parser generators.
Hi @Marwes , first of all thanks for the library! Also I apologize if that's not the appropriate place to ask for help.
I'm trying to parse an amount of chars based on a parsed length. For instance, the parser I'm trying to build should return announce
for a input like 8:announce
. 8
is the length of the string after :
. I tried a couple of things like parsing the length and then folding as many as the size to combine several steps of letter()
s into one parser. I'm probably missing something obvious since I'm not an expert in Rust but would appreciate any input from your side.
fn string<I>(input: State<I>) -> ParseResult<BValue, I> where I: Stream<Item=char>
{
let length = many1(digit());
length
.map(|string: str::String| {
string.parse::<i32>().unwrap()
})
.skip(char(':'))
.then(|decimal| {
(0..decimal).fold(letter().map(|l| l), move |p, _| {
p.and(letter()).map(|(l1, l2)| l1 + l2 )
})
})
.parse_state(input)
}
Code above doesn't compile and probably doesn't make sense. However, I hope it will give you the picture of what I'm trying to accomplish.
Thanks!
I'm quite new to Rust so excuse me if this makes no sense, but would it be possible to make all the combined parsers (e.g. Choice
) implement the Clone
trait?
Over in mozilla/mentat#164 I'm trying to use combine
to parse token streams. The tokens are equivalent to Serde's json::Value type. My Stream
is therefore &'a [Value]
where Value
is third-party.
Sadly, this means that I can't implement Positioner
on Value
, since the trait originates in a combine
module (crate?) and the type originates in a serde
module (crate?), and Rust doesn't allow trait implementations if you don't originate at least one of the trait or the type.
In #rust, two solutions were posed:
Define the implementation in the Value
crate. We can do that since we originate the Value
type, but we want to ship that crate as an independent library and it makes no sense to require combine
for that purpose. Perhaps there is some way to opt-in using Rust features?
Wrap Value
in a newtype and implement Positioner
on the newtype. We can do this, but our types and parser are already pretty damn verbose and unwrapping yet another layer of indirection is a frustration.
Can you suggest another solution? Can you suggest a way to associate a Positioner
that would do better things for &'a [T]
, arbitrary T
? (Note that we want a real Positioner
that uses information from the implementing Value
type because eventually we're going to keep parsed ranges in the Value
type.)
Thanks!
Using the last example from https://docs.rs/combine/2.0.0/combine/,
fn main() {
if let Err(e) = parser(expr).parse("[") {
println!("{}", e);
}
}
outputs
Parse error at 94334017221597
Unexpected end of input
Expected ]
instead of the beautiful line: 1, column: 1
I was promised XD
Over in https://github.com/mozilla/mentat, we're using combine
to parse &'a [edn::Value]
slices. Here, edn::Value
is a generic value type conceptually similar to JSON. What is happening here is not specific to edn::Value
, so I'll use T
: we're using combine
to parse &'a [T]
streams.
The error type of such a Parser
is combine::primitives::ParseError<&'a [T]>
. Now, around
Line 449 in 6f2cec6
std::error::Error
only when the underlying Stream::Range
is std::fmt::Display
. However, no slice &'a [T]
can ever satisfy this trait bound due to rust-lang/rust#39328.
I can think of the following ways to work around this issue:
std::error::Error
for &'a [edn::Value]
.&'a [T]
streams in a newtype implementing Stream
, and wrap the Stream::Range
type as well.Range
trait specifically for wrapping or otherwise helping format Range
instances.Range
implementations would have to add the new method, even if they didn't care about this issue (which they can't have been using without significant effort).RangeDisplay
trait in combine
, define it for the Range
types in combine
, and expect that in the std::error::Error
implementation.Range
implementations to a single display format. I think this is okay, though -- it's already the case that &str
ranges have a single display format.@Marwes, have you thought about this problem? Is there an alternate approach you can suggest? Would you accept a patch for one of the final two proposed solutions?
Currently the Error
enum contains String
as the type holding the message for each error. This is obviously really easy to work with since regardless of what we want to say in the message we can always format it into a string. This is obviously not very efficient when our error messages are static as they often are. Currently I am leaning towards replacing the String
type with Cow<'static, String, str>
inside the Error
enum which should at least avoid the allocation for errors with a static string as a message.
Other approaches could be something like allowing functions or boxed closure as "error messages" (calling the function would produce the message) but maybe this is overkill.
Not sure if this is a bug, but it seems like it should work. This code fails to compile:
extern crate combine;
use combine::{digit, letter, sep_by, token, Parser, ParserExt};
fn main() {
let p = sep_by::<Vec<_>, _, _>(digit(), token(','));
let result = letter().or(p).parse("1,2,3");
}
With this error:
type mismatch resolving `<&str as combine::primitives::Stream>::Item == collections::vec::Vec<char>`:
expected char,
found struct `collections::vec::Vec` [E0271]
src/main.rs:6 let result = letter().or(p).parse("1,2,3");
However, this works just fine:
extern crate combine;
use combine::{digit, letter, token, Parser, ParserExt};
fn main() {
let result = letter().or(digit()).parse("1,2,3");
}
Do you see any way a user could implement first and follow sets using combine? I'm not sure if it's possible to add items into the stream while in the middle of parsing that same stream, but it seems impossible.
I saw this mentioned in #54, but I'm not sure exactly how to get this to work. I'm a bit new to rust, so that's also probably related.
I'm trying to write something that can parse an escaped string, e.g. "Hello \n \"World\""
and for this one of my helpers is this:
let esc_char = char('\\').with(choice([
char('\\').with(value('\\')),
char('"').with(value('"')),
char('0').with(value('\0')),
char('n').with(value('\n')),
char('t').with(value('\t')),
char('r').with(value('\r')),
]));
You'll notice that the first 2 with
method calls are completely useless functionally, but they're the only thing making this expression type-check. I've been trying with
let esc_char = char('\\').with(choice::<&mut [&mut Parser<Input = &str, Output = char>; 6], _>(&mut [
char('\\').with(value('\\')),
char('"').with(value('"')),
but I get these errors:
src/main.rs:36:9: 36:37 error: mismatched types:
expected `_`,
found `combine::combinator::With<combine::combinator::Token<_>, combine::combinator::Value<_, char>>`
(expected &-ptr,
found struct `combine::combinator::With`) [E0308]
src/main.rs:36 char('\\').with(value('\\')),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/main.rs:36:9: 36:37 help: run `rustc --explain E0308` to see a detailed explanation
src/main.rs:34:100: 42:6 error: mismatched types:
expected `&mut [&mut combine::Parser<Input=&str, Output=char>; 6]`,
found `&mut [combine::combinator::With<combine::combinator::Token<_>, combine::combinator::Value<_, char>>; 6]`
(expected &-ptr,
found struct `combine::combinator::With`) [E0308]
src/main.rs:34 let esc_char = char('\\').with(choice::<&mut [&mut Parser<Input = &str, Output = char>; 6], _>(&mut [
src/main.rs:35 char('\\').with(value('\\')),
src/main.rs:36 char('"').with(value('"')),
Do you have any idea what I'm doing wrong?
Is there an uncomplicated way to achieve a parser that does sep_by(any(), char(','))
?
I currently have something like sep_by(parser(other), char(','))
where other
is a separate parser that just matches any()
, but that blows right through every instance of ,
due to the any()
parser.
What I want is something that works the same as sep_by, but parsing the separator takes precedence over parsing the separated values. I don't want the other
parser to need to be aware of the context in which it is used.
Commit 1e500c6 changed the bounds on the implementation of std::error::Error
for ParseError<P>
. It now requires, among others, P::Position: Positioner
which I think is unnecessarily restrictive and in fact not the case for existing instances of P
, e.g. char
and u8
.
Currently nothing in the library has any stability attributes set which will be a problem once stable rust starts to roll out. For the high level API (thinking about function names and what their purpose are) I feel that it is mostly stable enough that I could commit to the design for a 1.0 so to speak. The lower level details are more prone to change however.
&mut
instead. Currently this alternate design is both slower and less intuitive but when attempting to optimize the error creation I needed some additional state struct State
which evidently made the struct large enough that passing by reference would be beneficial. Since I only achieved around 40% speedup I am not convinced that the added complexity makes it worth it.If nothing changes between now and the release of rust 1.0 that makes me able implement these changes I will likely stabilize what is here now and probably integrate these changes into a potential 2.0 version instead.
fn parse(&mut self, input: &mut State<I>) -> Result<Output, Consumed<Error>>
I just had a heck of a time figuring out why the following examples both yield errors:
choice([
string("one"),
string("two"),
string("three"),
]).parse("two");
string("this").or(string("that")).parse("that");
From what I can tell, for either of these examples to work properly, they need to be written as follows:
choice([
try(string("one")),
try(string("two")),
try(string("three")), // optional, but nice for consistency
]).parse("two");
try(string("this")).or(string("that")).parse("that");
I'm not sure which behavior is desirable / whether it's the documentation or the code that should be updated. In either case, I would be happy to take a stab at throwing together some pull requests here. Combine has been saving me a ton of hassle.
https://crates.io/crates/parser-combinators links to https://marwes.github.io/parser-combinators/parser_combinators/index.html which doesn't exist.
Having encountered a few problems when using combine
to implement a new parser for embed_lang I think its about time to start thinking how combine-2.0 would look like as the changes I would like to make will break the current API. I still plan to merge the [buffered_stream] and [range_stream] features into 1.0 and 2.0 should not be expected to be widely usable for quite some time even after the features below are implemented.
Input
on Parser
to an argument instead. As Input
is an input type on the trait this will let parsers be a bit more flexible with not much downside.Stream
types store the position internally instead of relying to the State
wrapper. This should let streams be a bit more flexible when it comes to position handling.If anyone has any other changes or ideas which would break the current API please comment and discuss them in this issue.
I know this issue is a little vague, so please bear with me. As I've started using parser-combinators
, I've noticed that the compile times for my project have sharply increased; more than I would expect from the amount of code I've written. cargo build
now takes 8 to 11 minutes to complete. This is on a fairly new machine, too - I have a MacBook Pro with a 2.5GHz quad-core i7, and I've generally seen very good performance from cargo
/rustc
.
I guess I'm just curious to know what's causing these very long compile times. Is this to be expected when using this library, am I misusing it in some way that's confusing rustc
, or is something else wrong? Of course, it's likely difficult to determine exactly what's going on, but I'd welcome any additional information.
I am building a parser for a language of my imagination, I am having problem with recursive parsers. Take a look at this gist:
https://gist.github.com/Ralle/336941d1472a598d1121
I can parse ints and I can parse floats, but as soon as I make a pair using '(int, int)' it expects a float. I don't understand why this happens. I tried switching around the int_parser and float_parser but then it would obviously not parse floats as it succesfully parses the int and then don't know what to do with the '.'.
Please advice.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.