GithubHelp home page GithubHelp logo

Alternative formulation of Stream about parsec HOT 9 OPEN

haskell avatar haskell commented on June 30, 2024
Alternative formulation of Stream

from parsec.

Comments (9)

ekmett avatar ekmett commented on June 30, 2024

Folks use the m in uncons to allow "streams" that aren't fully in memory. E.g. Streaming from disk or pipes.

from parsec.

safareli avatar safareli commented on June 30, 2024

Thanks @ekmett I have added the m to PureScript PR .

What you think on moving Position -> Position into Stream class?

from parsec.

safareli avatar safareli commented on June 30, 2024

Also in mtl we have class Monad m => MonadState s m | m -> s where should the Stream also have m -> s dep?

from parsec.

ekmett avatar ekmett commented on June 30, 2024

Lots of people use different states that happen to run in the IO monad.

from parsec.

ekmett avatar ekmett commented on June 30, 2024

I'm not sure about the Position -> Position thing. To me it seems to entangle a couple of concerns, and complicates uncons. It seems you'd really want

Position -> f -> Maybe { head :: c, tail :: f, newPos :: Position }

which would allow for different parsing based on position, and avoids capturing a very expensive closure. An example where that might matter is if your lexer prepass did something like c preprocessing where it might need to know if it is at the start of file to know how to properly handle finding # directives. Regardless, building a function there means that this will never fully unbox into something that just runs on the stack no matter what instance you work with unlike the existing solution.

You're always paying full price.

from parsec.

safareli avatar safareli commented on June 30, 2024

This is what i came to:

class Stream s m t | s -> t where
  uncons :: ParserState s -> m (Maybe (Tuple t (ParserState s)))
  stripPrefix :: Prefix s -> ParserState s -> m (Maybe (ParserState s))

type ParserState s = { input :: s, pos :: Position }

from parsec.

ekmett avatar ekmett commented on June 30, 2024

Poking at this from the other side:

First uncons:

You've now converged almost back to the existing design. The major difference is that uncons doesn't get fed the parsec SourcePos, but rather just whatever s you ask for. Why? When it is a string, then the 's' can just be a tail of the string. When it is a file, you need something like a number of bytes consumed so far for seeking or a file handle and a pushback buffer. In neither case do you use the source position and we don't even look at the Char returned at present.

Here you're bolting in machinery to update position, but 'uncons' doesn't need to inspect the char it gives back, so there is no "double dipping" on work, there. Only the "update position" combinator passed to tokenPrim actually looks at the Char or whatever is given back by your stream.

So the common use-cases don't use the extra information you are supplying here to access the stream.

This consciously leaves open the notion of what is a 'column' to tokenPrim to fill in, enabling you to work with a more traditional notion of tabstops out of the box with the char combinators, etc. or plug in your own, so you can support, say, modelines, or choose to have every character (even tabs) be one 'column' wide, or using the number of utf-16 code units seen so far in the line so you can write a language server protocol implementation, etc.

At present Stream factors out the concern of updating position from getting data. In the scenario you propose wed couple them more tightly forcing users to use newtypes to split these cases back apart.

It feels to me that this part of your design couples the stream to the notion of columns in a really tight way that then require more code duplication (copying the update position function into stream types that previously lived agnostic of parsec's rather awful SourcePos type) and newtypes to tease apart, but doesn't actually pay out in a win in terms of avoiding duplicating any work at runtime.

Now stripPrefix:

Adding something like stripPrefix could be quite useful for performance. The major objections to overcome there I'd see is that currently Parsec doesn't make any use of type families and the aforementioned issues with entangling the update of position with the internals of the stream types in the API you provide.

Here things become more nebulous as at this point you are looking at the characters given back and so my argument about how uncons need not concern itself with what is fetched falls apart a bit. Interestingly there is a second option when evaluating stripPrefix-like combinator, which is to have whatever calls it take an argument like Prefix s -> SourcePos -> SourcePos so that some calculations can be shared across calls rather than having it come back as part of the state update. This way it can be lifted out by the compiler and computed once for a given prefix you might be trying to strip, e.g. a given keyword "foo" will always be 3 columns wide. For sharing, many practical implementations could represent the SourcePos -> SourcePos using something like a fairly simple

data Delta = Delta !Int !Int
instance Monoid Delta where
  mempty = Delta 0 0
  mappend (Delta a b) (Delta 0 d) = Delta a (b + d)
  mappend (Delta a _) (Delta c d) = Delta (a + c) d

Then convert Prefix s -> Delta to get a cacheable result, then apply the delta with Delta -> SourcePos -> SourcePos to get the Prefix s -> SourcePos -> SourcePos while carefully caching that intermediate delta in the environment.

With that in mind, it seems even stripPrefix benefits from going to something like the current approach and splitting position updates from reading from the stream.

from parsec.

safareli avatar safareli commented on June 30, 2024

Thanks for detailed response! I would need to think on this, meanwhile, where can I find definition of Delta?

from parsec.

ekmett avatar ekmett commented on June 30, 2024

I wrote it inline above as an example. It isn't used by parsec.

You can read Delta x y as 'move down x lines, and if x is > 1 the column is absolute, otherwise its relative.

from parsec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.