Comments (9)
Folks use the m in uncons to allow "streams" that aren't fully in memory. E.g. Streaming from disk or pipes.
from parsec.
Thanks @ekmett I have added the m
to PureScript PR .
What you think on moving Position -> Position
into Stream
class?
from parsec.
Also in mtl
we have class Monad m => MonadState s m | m -> s where
should the Stream also have m -> s
dep?
from parsec.
Lots of people use different states that happen to run in the IO monad.
from parsec.
I'm not sure about the Position -> Position
thing. To me it seems to entangle a couple of concerns, and complicates uncons. It seems you'd really want
Position -> f -> Maybe { head :: c, tail :: f, newPos :: Position }
which would allow for different parsing based on position, and avoids capturing a very expensive closure. An example where that might matter is if your lexer prepass did something like c preprocessing where it might need to know if it is at the start of file to know how to properly handle finding # directives. Regardless, building a function there means that this will never fully unbox into something that just runs on the stack no matter what instance you work with unlike the existing solution.
You're always paying full price.
from parsec.
This is what i came to:
class Stream s m t | s -> t where
uncons :: ParserState s -> m (Maybe (Tuple t (ParserState s)))
stripPrefix :: Prefix s -> ParserState s -> m (Maybe (ParserState s))
type ParserState s = { input :: s, pos :: Position }
from parsec.
Poking at this from the other side:
First uncons
:
You've now converged almost back to the existing design. The major difference is that uncons
doesn't get fed the parsec
SourcePos
, but rather just whatever s you ask for. Why? When it is a string, then the 's' can just be a tail of the string. When it is a file, you need something like a number of bytes consumed so far for seeking or a file handle and a pushback buffer. In neither case do you use the source position and we don't even look at the Char
returned at present.
Here you're bolting in machinery to update position, but 'uncons' doesn't need to inspect the char it gives back, so there is no "double dipping" on work, there. Only the "update position" combinator passed to tokenPrim
actually looks at the Char
or whatever is given back by your stream.
So the common use-cases don't use the extra information you are supplying here to access the stream.
This consciously leaves open the notion of what is a 'column' to tokenPrim
to fill in, enabling you to work with a more traditional notion of tabstops out of the box with the char
combinators, etc. or plug in your own, so you can support, say, modelines, or choose to have every character (even tabs) be one 'column' wide, or using the number of utf-16 code units seen so far in the line so you can write a language server protocol implementation, etc.
At present Stream
factors out the concern of updating position from getting data. In the scenario you propose wed couple them more tightly forcing users to use newtypes to split these cases back apart.
It feels to me that this part of your design couples the stream to the notion of columns in a really tight way that then require more code duplication (copying the update position function into stream types that previously lived agnostic of parsec's rather awful SourcePos
type) and newtypes to tease apart, but doesn't actually pay out in a win in terms of avoiding duplicating any work at runtime.
Now stripPrefix
:
Adding something like stripPrefix
could be quite useful for performance. The major objections to overcome there I'd see is that currently Parsec doesn't make any use of type families and the aforementioned issues with entangling the update of position with the internals of the stream types in the API you provide.
Here things become more nebulous as at this point you are looking at the characters given back and so my argument about how uncons
need not concern itself with what is fetched falls apart a bit. Interestingly there is a second option when evaluating stripPrefix
-like combinator, which is to have whatever calls it take an argument like Prefix s -> SourcePos -> SourcePos
so that some calculations can be shared across calls rather than having it come back as part of the state update. This way it can be lifted out by the compiler and computed once for a given prefix you might be trying to strip, e.g. a given keyword "foo" will always be 3 columns wide. For sharing, many practical implementations could represent the SourcePos -> SourcePos
using something like a fairly simple
data Delta = Delta !Int !Int
instance Monoid Delta where
mempty = Delta 0 0
mappend (Delta a b) (Delta 0 d) = Delta a (b + d)
mappend (Delta a _) (Delta c d) = Delta (a + c) d
Then convert Prefix s -> Delta
to get a cacheable result, then apply the delta with Delta -> SourcePos -> SourcePos
to get the Prefix s -> SourcePos -> SourcePos
while carefully caching that intermediate delta in the environment.
With that in mind, it seems even stripPrefix
benefits from going to something like the current approach and splitting position updates from reading from the stream.
from parsec.
Thanks for detailed response! I would need to think on this, meanwhile, where can I find definition of Delta
?
from parsec.
I wrote it inline above as an example. It isn't used by parsec.
You can read Delta x y as 'move down x lines, and if x is > 1 the column is absolute, otherwise its relative.
from parsec.
Related Issues (20)
- Allow semigroups-0.19: Also in tests. HOT 1
- New library with Hspec expectations for testing Parsec parsers HOT 1
- Export internals
- Documentation Error: <||> should be <|?>
- "unexpected end of input, expecting end of input" HOT 4
- Update metadata on Hackage
- (>>=) leaks memory
- Documentation regarding updatePosChar does not match the function's behaviour
- tokenise comments
- add a parameter to makeTokenParser to specify options for treating space by lexeme
- `updatePosChar` does not increment line number like doccumentation says
- GHC 9.2.1 release? HOT 2
- Which unfoldM is meant in "unfoldM uncons gives the [t] corresponding to the stream"?
- Compatibility with mtl-2.3 HOT 2
- string function not working correctly with (<|>) when the head of the strings are the same but their tails are not HOT 2
- Link in readme broken HOT 1
- How to handle include?
- Parsec crashes HOT 2
- cabal build -c 'mtl == 2.2.1' fails because of Safe Haskell HOT 7
- Generate syntax highlighting files HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parsec.