GithubHelp home page GithubHelp logo

binary's People

Contributors

23skidoo avatar andreaspk avatar andrewthad avatar aslatter avatar basvandijk avatar bergmark avatar bgamari avatar bitonic avatar bodigrim avatar bos avatar bringert avatar daniel-diaz avatar donsbot avatar ericson2314 avatar ggreif avatar harpocrates avatar hvr avatar int-index avatar kolmodin avatar lemmih avatar mboes avatar pcapriotti avatar phadej avatar puffnfresh avatar qnikst avatar ryanglscott avatar shimuuar avatar sjakobi avatar spencerjanssen avatar tibbe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

binary's Issues

Improve README.md

The README.md file has been converted to Markdown and been slightly extended.

Here are some further changes I'd like to see:

Not sure if it's still portable to Hugs. We claim it is, but I doubt it.

The section about "Using binary". I think this section is not too helpful. It could be more structured of the alternatives and with links to the relevant haddocks.

Deriving binary instance section. This is outdated and missleading. We can now generate instances with GHCs generics. A small example of how to do so, or link to the haddocks, would be better suiting.

runGetState uses toChunks and fromChunks

This quickly gets expensive if called multiple times, and can build up a huge stack if called multiple times, like so:

countTrades :: BL.ByteString -> Int
countTrades input = stepper (0, input) where
  stepper (!count, !buffer)
    | BL.null buffer = count
    | otherwise      =
        let (trade, rest, _) = runGetState getTrade buffer 0
        in stepper (count+1, rest)

Code from http://stackoverflow.com/questions/9567040/poor-performance-parsing-binary-file-in-haskell/9573661#9573661

Consider using Int64 throughout the API

We use Int in a lot of places, but Haskell only guarantees 29 bits, and 28 bits of positive integers, or referring up to 256MB.
We should consider using Int64 instead of relying on having >28bits.

lookAheadE

Got a request to re-implement lookAheadE.

Apparently it can be used to implement lookAheadM in monad transformer stacks, like in the hackage package bytes.

Relation to blaze-builder

binary contains an implementation of a builder just like found in blaze-builder. IIRC, this implementation was actually the original source for blaze-builder.

Would it make any sense to swap out the locally-maintained Builder for that one? If not, why not?

Add isolate function

Taken from cereal documentation:

isolate :: Int -> Get a -> Get a

Isolate an action to operating within a fixed block of bytes. The action is required to consume all the bytes that it is isolated to. 

It's quite useful function since pattern N of bytes in chunk followed by said chunk is quite common in binary format.

I propose to add two variants of isolate. One should have same semantics as cereal's and require parser to consume all input. Second should only ensure that parser consume no more than N bytesand rest is discarded. It's useful when do not want fully decode block data.

binary-0.7.4.0 can't compile its test suite

Citing from http://hydra.cryp.to/build/852686/nixlog/1/raw:

tests/QC.hs:364:28:
    Ambiguous occurrence ‘arbitrarySizedNatural’
    It could refer to either ‘Test.QuickCheck.arbitrarySizedNatural’,
                             imported from ‘Test.QuickCheck’ at tests/QC.hs:21:1-32
                             (and originally defined in ‘Test.QuickCheck.Arbitrary’)
                          or ‘Arbitrary.arbitrarySizedNatural’,
                             imported from ‘Arbitrary’ at tests/QC.hs:26:56-76
                             (and originally defined at tests/Arbitrary.hs:70:1-21)

Calculate the length of a Builder without executing it

In the ELF format, there are absolute offsets pointing to positions after the offset.
Thus, calculating this offset requires knowing the length of several Builders, including the length of the Builder that will contain the offset itself.

With the current API and a naive approach that would force the Builders into lazy ByteStrings, you'd end up with <>.

One simple solution, suggested by Joe Hendrix, is to wrap the Builder in a type that contains the Builder and it's length.

data SizedBuilder = SB Int64 Builder

length (SB l _) = l
… + additional methods exported by Data.Binary.Builder

This API could be exposed from Data.Binary.Builder.Sized.

Implement a delimiter for Get's bytesRead

Basically something like delimit :: Get a -> Get a that resets the bytesRead counter on the inner Get. This would primarily be useful in conjunction with the isolate combinator (which can then act undelimited by default, and we can modify it to be delimited if needed) or the alignment combinator I propose in #50.

A size counter for Get

I've often wanted a combinator called sized :: Get a -> Get (Int, a) that will behave the same as its input, except will also tell you how many bytes it consumed.

GHC-7.2 build failure

src/Data/Binary/Put.hs:61:1:
    bytestring-0.9.2.0:Data.ByteString can't be safely imported! The module itself isn't safe.
xcabal: Error: some packages failed to install:
binary-0.7.6.0 failed during the building phase. The exception was:
ExitFailure 1

Possible binary incompatibility between 0.5 and 0.7

Hi,

I can’t actually find anything in the documentation that discusses the binary compatibility between binary version, i.e. whether code that successfully parses something with binary 0.5 will also do so with 0.7, but I was optimistically assuming so.

Anyways, it does not seem to be the case. Compare https://s3.amazonaws.com/archive.travis-ci.org/jobs/17619621/log.txt with https://api.travis-ci.org/jobs/17619789/log.txt?deansi=true – identical setups, besides the version of binary, and the tests show that Data.Binary.Get behaves differently.

Is there any documentation of incompatibilities between binary versions? I can’t even find a changelog.

Conditionally instantiate NFData for ByteString and L.ByteString

On bytestring < 0.10.0.0 there were no NFData declarations, and we instantiated our own.

Now bytestring >= 0.10.0.0 is shipped with the Haskell Platform - our instances are no longer needed for newer bytestrings.

We have some broken code that checks for GHC version, but it should actually be checking the ByteString version.

Benchmark failing on master branch

When I run the benchmarks on the master branch I get the following error:

Binary (de)serialisation benchmarks:
100MB of Word8  in chunks of 16 (  Host endian): bench: too few bytes. Failed reading at byte position 6553601

This is the command I ran

make -C benchmarks/ clean bench run-bench

I haven't touched the Data.Binary.Get code so I'm not sure what's wrong.

decode (encode NaN) /= NaN

The IEEE "not-a-number" (NaN) value is not encoded properly, since encode uses decodeFloat, which is unspecified for NaN (cf. Prelude).

Examples:

(0 / 0 :: Double) = NaN
(decode (encode (0 / 0 :: Double)) :: Double) = -Infinity
(log (-1) :: Double) = NaN
(decode (encode (log (-1) :: Double)) :: Double) = Infinity
(0 / 0 :: Float) = NaN
(decode (encode (0 / 0 :: Float)) :: Float) = -Infinity

`Data.Binary.Put` lacks `Put`s for `Int` types

Data.Binary.Put and Data.Binary.Builder provide a variety of Puts for various width Word types. I don't see any reason why they shouldn't include similar functionality for the signed types from Data.Int.

Export PairS

It would be useful if this would be exported, as I can then take apart the Put monad and reassemble it, without incurring the cost of running the Builder itself.

Support decoding from a Get monad

I prefer not to ask for convenience functions willy-nilly, but I think refactoring decodeFileOrFail with this signature would be useful:

decodeGetFileOrFail :: Get a -> FilePath -> IO (Either (ByteOffset, String) a)

The reason is because there is a nontrivial chunk of code for running the incremental parser that preferably we'd avoid duplicating. I'll submit a PR soon.

Remove all compilation warnings when compiling binary

From @simonpj:

In the binary library I’m seeing lots of these warnings:

libraries/binary/src/Data/Binary/Get.hs:420:1: warning:

    Rule "getWord16le/readN" may never fire

      because ‘getWord16le’ might inline first

    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on this function

libraries/binary/src/Data/Binary/Builder/Base.hs:510:1: warning:

    Rule "flush/flush" may never fire

      because ‘flush’ might inline first

    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on this function

The warnings look right to me: currently everything is very fragile and may not work as you intend.

Update the example that uses `runGetState` to new API

The tutorial in Data.Binary.Get includes the following example:

 example2 :: BL.ByteString -> [Trade]
 example2 input
   | BL.null input = []
   | otherwise =
      let (trade, rest, _) = runGetState getTrade input 0
      in trade : example2 rest

Unfortunately, runGetState is marked as deprecated, with a suggestion to use runGetIncremental instead. It'd be nice if the tutorial examples showed the recommended usage of the library.

Safe Haskell compilation warnings

These needs to be addressed;

src/Data/Binary/Builder/Internal.hs:3:14: Warning:
    ‘Data.Binary.Builder.Internal’ is marked as Trustworthy but has been inferred as safe!
src/Data/Binary/Put.hs:3:14: Warning:
    ‘Data.Binary.Put’ is marked as Trustworthy but has been inferred as safe!
src/Data/Binary/Class.hs:3:14: Warning:
    ‘Data.Binary.Class’ is marked as Trustworthy but has been inferred as safe!
src/Data/Binary/Generic.hs:2:26: Warning:
    ‘Data.Binary.Generic’ is marked as Trustworthy but has been inferred as safe!

An closer look at all Safe Haskell use within binary would be good.

Remove Binary instance for ByteString

I find myself making the same error over and over again. I use getwhen I mean getByteString or I use put when I mean putByteString. It's occured to me that the Binary instance for ByteString is actually a very bad idea. Instead, I suggest wrapping ByteString with a newtype that will define the current instance. Although this may break some code, it would likely save more time than it costs over all for the library's end users.

Risk of stack-overflow in roll

I was just reading through the Binary instance of Integer and stumbled on the roll function:

roll :: (Integral a, Num a, Bits a) => [Word8] -> a
roll   = foldr unstep 0
  where
    unstep b a = a `shiftL` 8 .|. fromIntegral b

There's a risk of a stack-overflow here since it's lazily building the result value. Although the list of bytes will usually not be that big I think it would be better to build the value strictly using something like the following (untested):

roll :: (Integral a, Num a, Bits a) => [Word8] -> a
roll = foldl' unstep 0
  where
    unstep a b = a `shiftL` 8 .|. fromIntegral b

Include module and function name in error messages

One of my programs just failed with

too few bytes. Failed reading at byte position 1852252265

It took me a while to figure out that this message was coming from binary. I suggest we always output the module and function name in error messages:

Data.Binary.Get.getBytes: too few bytes. Failed reading at byte position 1852252265

Binary instances for Foreign.C.Types

The data in Foreign.C.Types are just newtype wrappers around types which mostly have Binary instances already. Is there a reason the C types don't have Binary instances?

listUntilEnd

I often find myself needing this function:

listUntilEnd :: (Binary a) => Get [a]
listUntilEnd = do
   done <- isEmpty
   if done then return [] else do
      next <- get
      rest <- listUntilEnd
      return (next:rest)

Add changelog

Hi,

binary (like many other Haskell libraries, unfortunately) does not have a proper changelog file that collects, per release, the user-relevant changes. With hackage now showing links to changelogs, it is a good time so introduce one. It would also prevent me from bothering you with #44...

Thanks,
Joachm

Push stable version tags to github

Hi Lennart,

I assume you have the tags corresponding to the versions of 'binary' on hackage. It would be great, if they were also available in your github repo.

best regards,
Simon

don't re-export Data.Word from Data.Binary

Data.Binary is small enough, and exports names that are unique enough, that it can commonly be simply imported wholesale:

import Data.Binary

However, this also happens to re-export Data.Word, which is surprising, and generates a warning from GHC, if code that uses it also imports Data.Word:

Import Data.Binary
Import Data.Word

yields:

src/Foo.hs:7:1: Warning:
     The import of ‘Data.Word’ is redundant
      except perhaps to import instances from ‘Data.Word’
    To import instances alone, use: import Data.Word()

In a module that uses both Binary and Word explicitly, it makes for poor developer experience to rely on Data.Binary to export the names from Data.Word. If you move the Data.Binary dependent code out of the module, and delete the import - the remaining Data.Word code doesn't compile.

Instance for Double/Float is absolutely batty

Apparently, binary represents a Double as a tuple of (Integer, Int)? This means that doubles suffer a x3 or more size explosion, when really you could just record an IEEE floating point with the proper endian. This would also fix #64

Backwards compatibility might be a concern for fixing this, however.

UTF-8 validation when deserializing

As pointed out in #70 by @ttuegel we don't do validation of UTF-8 when decoding. This needs to be fixed.

Consider introducing getList in Binary if it makes a big difference for performance. With getList we could use some of the faster UTF-8 validators without having to write our own. See how text does utf8 validation . Our case might be more difficult though as we don't know beforehand whether all input bytes are available.

class Binary a where
  -- ...
  getList :: [a]
  getList = getDefaultList

getDefaultList :: Binary a => Get [a]
getDefaultList = get >>= getMany

class Binary Char where
  -- ...
  getList = -- faster code

Support alignment modifiers

I've found myself wanting something like (might be buggy, but you get the idea)

aligned :: Int -> Get a -> Get a
aligned n g = do
  br <- fromIntegral <$> bytesRead
  skip $ n - br `rem` n
  g

Might it be worth adding to the library, with a Builder/Put counterpart? The Builder side of things would require more changes to make it work than what I wrote above, but it's not all that hard.

encode'

Would be useful to provide encode' :: (Binary a) => a -> Data.ByteString.ByteString

Usually I want lazy, but sometimes I do not.

Performance issue with skip

I'm using the lazy interface of Data.Binary.Get to parse large binaries from disk, where most parts are skipped on the first pass. When using skip directly, I get a severe performance and space usage problem when the skipped byte count is large (many megabytes). I'm not an expert on lazy bytestring internals, but it seems like the input data is held on for too long before being skipped ("PINNED" memory usage in -hc heap profile). Using this wrapper around skip makes it several orders of magnitude faster and does not explode on the heap (I'm using GHC 7.10.1):

import qualified Data.Binary.Get as G

skipMany :: Int -> G.Get ()
skipMany bytes = 
  replicateM_ rep (G.skip cs) >> G.skip rest
 where
   cs = 1024
   (rep, rest) = bytes `quotRem` cs

get for UArray blows the heap for large arrays

instance (Binary i, Ix i, Binary e, IArray UArray e) => Binary (UArray i e) where
    get = do
        bs <- get
        n  <- get
        xs <- getMany n
        return (listArray bs xs)

getMany is fully strict in the list, since it uses an accumulator and reverses it at the end. The intermediate xs list can be huge in cases where the eventual UArray is much more manageable (eg 28M Booleans).

Two questions:

  1. Is there a known alternative for (un)serializing UArrays to(from) disk? Such an alternative would make this Issue far less important.

  2. Have you considered a version that serializes the bytes directly? I drafted one up; it's tremendously more efficient, though I'm concerned about robustness wrt endianness etc. Furthermore, it requires a base monad that can mutate arrays, which requires an "unsafe" invocation. And lastly it's not portable, using ghc-prim.

HTH. Thanks.

MonadPlus

Could be implemented using Alternative.

Memory consumption of decoding bigger than expected

The following program encodes and decodes a long list of words. Memory consumption seems 4x bigger than what I'd expect. Results shown below. ghc-7.10.2, binary-0.7.6.1.

import Control.Exception (evaluate)
import Control.Monad (void)
import Data.Binary (encode, decode)
import qualified Data.ByteString.Lazy as BSL
import Data.List (isPrefixOf, foldl')
import Data.Word (Word32)
import GHC.Stats
import System.Mem (performGC)

type T = (Word32,[Word32])

main :: IO ()
main = do
  let sz = 1024 * 1024 * 15
      xs = [ (i,[i]) :: T | i <- [0 .. sz] ]
      bs = encode xs

  void $ evaluate $ sum' $ map (\(x, vs) -> x + sum' vs) xs
  putStrLn "After building the value to encode:"
  printMem

  putStrLn $ "Size of the encoded value: " ++
    show (BSL.length bs `div` (1024 * 1024)) ++ " MB"
  putStrLn ""

  putStrLn "After encoding the value:"
  printMem

  let xs' = decode bs :: [T]
  void $ evaluate $ sum' $ map (\(x, vs) -> x + sum' vs) xs'
  putStrLn "After decoding the value:"
  printMem

  -- retain the original list so it is not GC'ed
  void $ evaluate $ last xs
  -- retain the decoded list so it is not GC'ed
  void $ evaluate $ last xs'

printMem :: IO ()
printMem = do
  readFile "/proc/self/status" >>=
    putStr . unlines . filter (\x -> any (`isPrefixOf` x) ["VmHWM", "VmRSS"])
           . lines
  performGC
  stats <- getGCStats
  putStrLn $ "In use according to GC stats: " ++
    show (currentBytesUsed stats `div` (1024 * 1024)) ++ " MB"
  putStrLn $ "HWM according the GC stats: " ++
    show (maxBytesUsed stats `div` (1024 * 1024)) ++ " MB"
  putStrLn ""

sum' :: Num a => [a] -> a
sum' = foldl' (+) 0

Here are the results:

# time ./test +RTS -TAfter building the value to encode:
VmHWM:   1557456 kB
VmRSS:   1557456 kB
In use according to GC stats: 1320 MB
HWM according the GC stats: 1320 MB

Size of the encoded value: 240 MB

After encoding the value:
VmHWM:   2791620 kB
VmRSS:   2791620 kB
In use according to GC stats: 1560 MB
HWM according the GC stats: 1560 MB

After decoding the value:
VmHWM:   6229164 kB
VmRSS:   6229164 kB
In use according to GC stats: 2880 MB
HWM according the GC stats: 2880 MB


real    0m27.143s
user    0m25.112s
sys 0m2.016s

The GC reports mostly what I expect. However the OS reports a much higher memory usage. The difference seems to exacerbate after decoding.

Any hints appreciated.

0.7.0.2 build broken with ghc 6.10.4

[1 of 8] Compiling Data.Binary.Builder.Base ( src/Data/Binary/Builder/Base.hs, dist/build/Data/Binary/Builder/Base.o )

src/Data/Binary/Builder/Base.hs:68:0:
    Warning: Module `Data.Word' is imported, but nothing from it is used,
               except perhaps instances visible in `Data.Word'
             To suppress this warning, use: import Data.Word()
[2 of 8] Compiling Data.Binary.Builder.Internal ( src/Data/Binary/Builder/Internal.hs, dist/build/Data/Binary/Builder/Internal.o )
[3 of 8] Compiling Data.Binary.Builder ( src/Data/Binary/Builder.hs, dist/build/Data/Binary/Builder.o )
[4 of 8] Compiling Data.Binary.Get.Internal ( src/Data/Binary/Get/Internal.hs, dist/build/Data/Binary/Get/Internal.o )

src/Data/Binary/Get/Internal.hs:251:2:
    `some' is not a (visible) method of class `Alternative'

src/Data/Binary/Get/Internal.hs:252:2:
    `many' is not a (visible) method of class `Alternative'

unsafeReadN holds on to consumed input

In the definition of unsafeReadN:

unsafeReadN :: Int -> (B.ByteString -> a) -> Get a
unsafeReadN !n f = C $ \inp ks -> do
  ks (B.unsafeDrop n inp) $! f inp -- strict return

We pass the rest of the input (B.unsafeDrop n inp) to the success continuation without first forcing it. This could lead to us holding on to input longer than necessary.

In practice it's not much of a problem as the success continuation will most likely evaluate the thunk, but I think it's more correct to do:

unsafeReadN :: Int -> (B.ByteString -> a) -> Get a
unsafeReadN !n f = C $ \inp ks -> do
  let !t = B.unsafeDrop n inp
  ks t $! f inp -- strict return

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.