GithubHelp home page GithubHelp logo

bitbytedata's Introduction

BitByteData

Swift 5.5+ GitHub license Build Status

A Swift framework with classes for reading and writing bits and bytes. Supported platforms include Apple platforms, Linux, and Windows.

Installation

BitByteData can be integrated into your project using either Swift Package Manager, CocoaPods, or Carthage.

Swift Package Manager

To install using SPM, add BitByteData to you package dependencies and specify it as a dependency for your target, e.g.:

import PackageDescription

let package = Package(
    name: "PackageName",
    dependencies: [
        .package(url: "https://github.com/tsolomko/BitByteData.git",
                 from: "2.0.0")
    ],
    targets: [
        .target(
            name: "TargetName",
            dependencies: ["BitByteData"]
        )
    ]
)

More details you can find in Swift Package Manager's Documentation.

CocoaPods

Add pod 'BitByteData', '~> 2.0' and use_frameworks! lines to your Podfile.

To complete installation, run pod install.

Carthage

Add to your Cartfile github "tsolomko/BitByteData" ~> 2.0.

Then:

  1. If you use Xcode 12 or later you should run carthage update --use-xcframeworks. After that drag and drop the BitByteData.xcframework file from the Carthage/Build/ directory into the "Frameworks, Libraries, and Embedded Content" section of your target's "General" tab in Xcode.

  2. If you use Xcode 11 or earlier you should run carthage update. After that drag and drop the BitByteData.framework file from from the Carthage/Build/<platform>/ directory into the "Embedded Binaries" section of your target's "General" tab in Xcode.

Migration to 2.0

There is a number of breaking changes in the 2.0 update. In this section you can find a list of modifications you need to perform to your code to make it compile with BitByteData 2.0. For more information, please refer to either 2.0 Release Notes or API Reference Documentation.

  1. ByteReader class has been renamed to LittleEndianByteReader.

    Solution: Change all occurrences in your code of ByteReader to LittleEndianByteReader.

  2. BitReader protocol has two new method requirements: signedInt(fromBits:representation:) and advance(by:).

    Solution: If you have your own type that conforms to the BitReader protocol you need to implement these two methods.

  3. BitWriter protocol has two new method requirements: write(unsignedNumber:bitsCount:) and write(signedNumber:bitsCount:representation:).

    Solution: If you have your own type that conforms to the BitWriter protocol you need to implement the write(unsignedNumber:bitsCount:) function (the second function has a default implementation).

  4. The setter of the offset property of the LsbBitReader and MsbBitReader classes will now crash if the reader is not aligned.

    Solution: If you set this property directly, make sure that the reader is aligned, for example, by checking the isAligned property.

  5. The default implementation of the BitWriter.write(number:bitsCount:) function and the write(unsignedNumber:bitsCount:) function of the LsbBitWriter and MsbBitWriter classes now crash if the bitsCount argument exceeds the bit width of the integer type on the current platform.

    Solution: If you use these functions directly, make sure that the bitsCount argument has a valid value.

In addition, BitByteData 2.0 provides new functionality for working with signed integers more correctly. If you were working with signed integers before, consider using the new BitReader.signedInt(fromBits:representation:) and BitWriter.write(signedNumber:bitsCount:representation:) functions instead of int(fromBits:) and write(number:bitsCount:), respectively.

Usage

To read bytes use either LittleEndianByteReader or BigEndianByteReader class, which implement the ByteReader protocol.

For reading bits there are also two classes: LsbBitReader and MsbBitReader, which implement the BitReader protocol for two bit-numbering schemes ("LSB 0" and "MSB 0" correspondingly), though they only support Little Endian byte order. Since the BitReader protocol inherits from ByteReader, you can also use the LsbBitReader and MsbBitReader classes to read bytes (but they must be aligned when doing so, see documentation for more details).

Writing bits is implemented for two bit-numbering schemes as well: the LsbBitWriter and MsbBitWriter classes. Both of them conform to the BitWriter protocol.

Note: All readers and writers aren't structs, but classes intentionally to make it easier to pass them as references to functions. This allows to eliminate potential copying and avoid writing extra inouts and ampersands all over the code.

Documentation

Every function or type of BitByteData's public API is documented. This documentation can be found at its own website or via a slightly shorter link: bitbytedata.tsolomko.me

Contributing

Whether you find a bug, have a suggestion, idea, feedback or something else, please create an issue on GitHub. If you have any questions, you can ask them on the Discussions page.

If you'd like to contribute, please create a pull request on GitHub.

Note: If you are considering working on BitByteData, please note that the Xcode project (BitByteData.xcodeproj) was created manually and you shouldn't use the swift package generate-xcodeproj command.

Performance and benchmarks

One of the most important goals of BitByteData's development is high speed performance. To help achieve this goal there are benchmarks for every function in the project as well as a handy command-line tool, benchmarks.py, which helps to run, show, and compare benchmarks and their results.

If you are considering contributing to the project please make sure that:

  1. Every new function has also a new benchmark added.
  2. Other changes to existing functionality do not introduce performance regressions, or, at the very least, these regressions are small and such performance tradeoff is necessary and justifiable.

Finally, please note that any meaningful comparison can be made only between benchmarks run on the same hardware and software.

bitbytedata's People

Contributors

cowgp avatar tsolomko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bitbytedata's Issues

[RFC] Plans for 2.0

2.0 Plans

Last updated: 27.06.2021.

In this document I am going to outline changes that I am planning to implement in the next major update of BitByteData, version 2.0. I am writing this document with two goals in mind. First, to provide an opportunity to give feedback on the proposed changes. Secondly, for historical reasons: if at some point in the future I am wondering what were the reasons for doing any particular change, I will be able to open this document and remind myself of them. Finally, I would like to mention that all these changes are more or less breaking (according to SemVer, which I am trying to follow in this project) and this is why I considering them only for the major update.

Note: For breviety, in this text the phrase "LsbBitReader and MsbBitReader" is combined into L/MsbBitReader.

Improve class and protocol structure of BitByteData

Motivation

The key idea that was always supposed to be emphasized in the class-protocol structure of BitByteData is that all the bit readers are also byte readers. Currently, this idea is expressed via L/MsbBitReader subclassing ByteReader. The disadvantage of this approach is that the BitReader protocol, which both L/MsbBitReader conform to, cannot inherit from ByteReader class. This is simply impossible within Swift for a protocol to inherit from a class. The problem with that arises when you have a variable of the protocol type BitReader instead of the concrete class L/MsbBitReader. In this situation you cannot use any byte-reading methods, despite L/MsbBitReader being subclasses of ByteReader. To solve this problem, I had to add all byte-reading methods to BitReader protocol.

The second motivating problem is the evident hole in the API of BitByteData: absence of any means to read bytes in the Big Endian order. The proposed restructuring is a good opportunity to introduce a new class for this purpose, which may be useful in some (though, admittedly, niche) situations.

Proposed solution

  1. Add a new protocol ByteReader (sic!) with the following API:
public protocol ByteReader: AnyObject {

    var size: Int { get }

    var data: Data { get }

    var offset: Int { get set }

    init(data: Data)

    func byte() -> UInt8

    func bytes(count: Int) -> [UInt8]

    func int(fromBytes count: Int) -> Int

    func uint64(fromBytes count: Int) -> UInt64

    func uint32(fromBytes count: Int) -> UInt32

    func uint16(fromBytes count: Int) -> UInt16

}

extension ByteReader {

    public var bytesLeft: Int {
        // ...
    }

    public var bytesRead: Int {
        // ...
    }

    public var isFinished: Bool {
        // ...
    }

}

Note: This protocol differs from the byte reading APIs of the BitReader protocol's current version. These changes are discussed in the separate sections of this document.

  1. Rename the ByteReader class to LittleEndianByteReader.

  2. Add the conformance to the new ByteReader protocol to the renamed LittleEndianByteReader class.

  3. Add a new class, BigEndianByteReader, which also conforms to the ByteReader protocol.

  4. Make the BitReader protocol extend the ByteReader protocol.

  5. Make both L/MsbBitReader no longer subclass the ByteReader class (which will be renamed to LittleEndianByteReader).

  6. Change implementation of both L/MsbBitReader to explicitly read bytes in the Little Endian order (currently, they use superclass's implementation).

  7. Mark all classes (Little/BigEndianByteReader and L/MsbBitReader) as final.

These changes should solve the issues from "Motivation" section.

Alternatives considered

Add a Self constraint requirement to the BitReader protocol

public protocol BitReader where Self: ByteReader {
    // ...
}

Ideally, this should express the idea that only something that is ByteReader (e.g. something that subclasses it) can conform to the BitReader protocol. Additionally, it naturally allows to remove any ByteReader's methods and properties from the BitReader protocol. Unfortunately, the problem was that this language feature hadn't been implemented until Swift 5 (see [1], [2]). Since BitByteData 2.0 isn't going to support pre-Swift 5.0 we could go this way instead, but it feels like protocol-based solution described above is more appropriate in Swift.

Note: See one of the comments below for the update on the current state of this approach with Swift 5.0.

Use ByteReaderProtocol as a name for the new protocol

The proposed soultions is, probably, the most obvious one. The only hard part in it is, as always, naming: what would be the name of a protocol if we already have a class with ByteReader name? Using Swift Standard Library and its IteratorProtocol, we could name our protocol in a similar manner: ByteReaderProtocol. While I haven't tested it, I have a feeling that this name would incur more source code changes on BitByteData's users. The reason for this feeling is that I expect that the most common usage scenario is to initialize a byte (bit) reader once and then pass it as an argument to other functions:

func f(_ reader: ByteReader) {
    // ...
}

let reader = LsbBitReader(data: data)
f(reader)

Since L/MsbBitReader are no longer subclasses of ByteReader the users would be required to
change function declaration to:

func (f_ reader: ByteReaderProtocol) {
    //...
}

And if there are a lot of functions with ByteReader arguments this could quickly get out of control. Additionally, if we were to introduce a new byte reader for Big Endian byte order (as proposed), we would likely still need to change the name of ByteReader class for a symmetry with the new Big Endian byte reader, which makes this alternative even worse.

Use Le/BeByteReader as names for the new and renamed classes

I don't think that these names are much better than the full spelling of Little and Big Endian. In practice, as was stated above, I don't expect the class names of byte readers to be used extensively, thus, I don't think that the tradeoff ratio between clarity and brevity is good in this case. On the contrast, the abbreviations were used for L/MsbBitReader because their full names are ridiculously long: LeastSignificantBit and MostSignificantBit respectively.

Add Big-Endian alternatives for L/MsbBitReader

As stated above, both L/MsbBitReader are going to explicitly support only Little Endian byte order. In theory, we could add classes with names like LsbBigEndianBitReader (LsbBeBitReader?) to support Big Endian byte order, but this would be extremely narrow feature for the price of the increased project size (2 more classes). Alternatively, we could add a new byte order enum, add a byte order argument to initializers, and then switch at runtime on byte order. Unfortunately, this would be detrimental to the overall performance, and thus, this is not really considered.

Don't mark classes as final

This would allow to create new subclasses of byte and bit readers, but honestly, I can't imagine any use case for this. On the other hand marking classes as final enables some compiler optimizations.

Add (more) means of conversion between readers

Currently, we have converting initializers from the ByteReader to L/MsbBitReader. In 2.0 update I would like to add a couple of more: from L/MsbBitReader to byte reader(s). While I can't imagine any use cases for these conversions, with the plan for restructuring BitByteData proposed above these initializers would be extremely easy to implement and, potentially, even possible to implement as extension methods to corresponding protocols.

Add missing methods to protocols

In 1.x updates several new methods were introduced both to readers and writers. For the reasons of backwards compatibility and SemVer, these new methods haven't been added to any of the protocols. The major 2.0 update is the perfect opportunity to introduce them as new protocol requirements. It is also possible that some of the new and old protocols' methods could be even provided with default implementation, but I haven't yet assessed that.

List of currently planned additions to the protocols:

  1. BitReader.advance(by:)
  2. BitWriter.write(unsignedNumber:bitsCount:)
  3. Anything else that is added in 2.0 update

Remove no-argument versions of uintX(fromBytes:) methods

Currently, there are two versions of methods for reading bytes into an unsigned integer: with an argument, which allows to specify a number of bytes to read, and without the argument, which reads the maximum possible amount of bytes possible to store in an unsigned integer. These are two of them, because it is possible to implement the "no-argument" version in a more efficient way. This situation is a bit confusing, and I would like to improve it in the 2.0 update.

There are two potential ways of improvement:

  1. Remove the no-argument versions and add natural default argument values to the remaining ones:
func uint16(fromBytes count: Int = 2) -> UInt16

The problem here is that we still need the performance of no-argument functions to be accessible.

  1. We could remove the versions with the argument instead and let users deal with splitting big integers into the smaller ones themselves. There will be some awkwardness, though, with reading big uints at the tail of the data: a user will probably have to read them as signed integers and then manually convert into the required uint type. But maybe this is a good thing, since Int is supposed to be the most widely used integer type in Swift.

After additional investigation it's become clear that both versions are useful and should be left in as there is no real benefit to be gained from the removal of one of them. In the ByteReader protocol only the versions with arguments will be included, since they are more general and the existence of no-argument versions is only due to the implementation reasons. If it becomes possible to make with-argument versions as efficient, the no-argument versions can instead be made to call the with-argument ones, and in the future they can be more or less easily removed.

Check if a bit reader is aligned in the willSet observer of the offset property

The current philosophy for byte reading in bit readers is that it is only possible to read bytes (and, probably, the only case when it makes sense) when a bit reader is "aligned" to the byte boundary. This is a carefully maintained invariant in all methods of both L/MsbBitReader, but not in the offset's setter. In other words, it is currently possible to change L/MsbBitReader.offset property when said bit reader is not aligned. This can lead to surprising and unpredictable behavior, and I am going to prevent this by adding a precondition to the willSet observer of the offset property.

Additionally, it seems like having two property observers (both willSet and didSet) instead of only one (didSet in our case) actually improves performance. The reason for this is still unknown to me, but it's a very good argument in support of adding the willSet observer.

Add methods for reading and writing bits of a negative number

The addition of L/MsbBitWriter.write(unsignedNumber:bitsCount:) methods made me realize that there is currently no way to correctly read and write negative signed integers from and to bits and bytes. Moreover, the current behavior of L/MsbBitWriter.write(number:bitsCount:) and L/MsbBitReader.int(fromBits:) functions is unpredictable and unreliable. It also depends on the platform-specific bit width of the Int type and on the values of the arguments in a very subtle way. To resolve these problems, I would like to do several changes.

  1. Add a new enum, SignedNumberRepresenation, which will allow to choose an encoding for a signed integer (see the wikipedia article). Proposed cases of the new enum:
public enum SignedNumberRepresentation {
    case signMagnitude
    case oneComplementNegatives
    case twoComplementNegatives
    case biased(bias: Int)
    case radixNegativeTwo
}

Alternatively, one can name 1's and 2's complement cases as just oneComplement and twoComplement correspondingly. We believe that the current naming is better, since it emphasizes the fact that only negative numbers are encoded as the complements, and positive numbers are encoded as is.

  1. Add instance methods to this new enum which will allow to query the minimum and maximum numbers that can be represented by a given representation using the specified amount of bits:
public enum SignedNumberRepresentation {
    // ...

    public func minRepresentableNumber(bitsCount: Int) -> Int { ... }

    public func maxRepresentableNumber(bitsCount: Int) -> Int { ... }
}
  1. Add methods to bit readers and writers which will allow to read signed integers using the specified representation:
// BitWriter:
func write(signedNumber: Int, bitsCount: Int, representation: SignedNumberRepresentation = .twoComplementNegatives)

// BitReader:
func signedInt(fromBits count: Int, representation: SignedNumberRepresentation = .twoComplementNegatives) -> Int

The default value of the representation argument is .twoComplementNegatives since it the most common way to encode signed integers (it is even used internally by Swift).

  1. Modify the behavior of the existing functions as following:
// BitWriter:
func write(number: Int, bitsCount: Int) {
    self.write(unsignedNumber: UInt(bitPattern: number), bitsCount: bitsCount)
}

// BitReader:
func int(fromBits count: Int) -> Int {
    if MemoryLayout<Int>.size == 8 {
        return Int(truncatingIfNeeded: self.uint64(fromBits: count))
    } else if MemoryLayout<Int>.size == 4 {
        return Int(truncatingIfNeeded: self.uint32(fromBits: count))
    } else {
        fatalError("Unknown Int bit width")
    }
}

This will allow to partially retain the current behavior of these functions, since it turns out to be useful in some cases. These implementations will handle positive integers in the same way as before, and will provide consistent behavior for negative integers. That said, the usage of these functions will be discouraged (via documentation), since they perform transformations which essentially lose data (the sign of a negative integer).

Technically speaking, byte readers have the same potential problem with reading negative integers. That said, to read negative integers correctly it is necessary to have an insight into the bits, which compose the bytes, and it cannot be provided by byte readers by definition. The best thing that can be done is to streamline the implementation of int(fromBytes:) similar to the new implementation of int(fromBits:) and update the documentation accordingly to prevent any misunderstanding. We should refer in the documentation to the signedInt functions as a proper way to read negative integers.

Other crazy ideas

This is the list of ideas that came to my mind, but they are either too breaking or I haven't assessed them at all. For these reasons they are very unlikely to be implemented in 2.0 update (or at all, FWIW), but I still decided to briefly mention them for completeness.

Make offset property zero-based

Currently, offset property of all the readers mirrors the Data's behavior: it is not necessarily zero-based. This "feature" is somewhat inconvenient from the implementation point of view. We could change it to be zero-based, but this would be extremely breaking change, and in a very subtle way. On the other hand, current behavior is consistent with the behavior of Data which maybe is a good thing.

Remove ByteReader.offset property

Indices in general in Swift world aren't necessarily zero-based. This leads to a problem where the value of the reader's offset is somewhat useless without knowing what the starting value is.

Combine bitMask and offset properties of bit readers

It is possible to add another property which represents continuous bit offset, and compute byte offset property from it (via / 8). With this change it is possible to eliminate internal bitMask property and we won't have to check and update byte offset on every bit read. Unfortunately, there is a problem: the maximum byte offset in this case would be Int.max / 8 and this is less than currently possible (Int.max).

More ideas for BitByteData's architecture

Inclusion instead of inheritance

One other approach for the design of BitByteData is to use inclusion instead of inheritance to connect bit and byte readers. I haven't assessed this idea at all, and with the changes proposed above it is very unlikely that this idea is ever implemented, but it would be still an interesting thought experiment.

ByteReader as Data subclass

One could try to make ByteReader a subclass of Data. The only problem here, is that Data is a struct and it is impossible to subclass structs in Swift. As an alternative one could also try to sublcass NSData instead, but there is another set of completely unexplored questions (what about NSMutableData? what is the future for NSData? what about its Objective-C legacy? what about relationship between Linux and NSData?)

Make ByteReader conform to Sequence/IteratorProtocol

It seems like byte and bit readers do have certain features of a Sequence (or maybe some other stdlib's protocol). In theory it may be useful to conform the readers to these protocols, or even make BitByteData's protocols to extend these protocols. This idea is actually one of the least crazy ones and likely to be considered at one point in the (distant?) future.

Chage offset property's type to UInt64

In theory there is a problem that some extremely big data instances can lead to undesirable behavior, since offset is of type Int and Int.max < UInt64.max. Luckily, such big datas are very rare in practice (I haven't seen any). The other problem is that Data.Index == Int. Overall, this is problem to tackle in very-very distant future.

uint32() -> UInt32 crash due to range overflow on v.2.0.1

Not sure if necessary, got this crash on v.2.0.1

Thread 0#0	(null) in Data._Representation.subscript.getter ()
#1	(null) in Data.subscript.getter ()
#2	0x0000000102e30ee0 in LittleEndianByteReader.uint32() at /SourcePackages/checkouts/BitByteData/Sources/LittleEndianByteReader.swift:84
#3	0x0000000102e24fe8 in specialized static ZipContainer.infoWithHelper(_:) at /SourcePackages/checkouts/SWCompression/Sources/ZIP/ZipContainer.swift:172
#4	(null) in static ZipContainer.infoWithHelper(_:) ()
#5	0x0000000102e25f00 in specialized static ZipContainer.info(container:) at /SourcePackages/checkouts/SWCompression/Sources/ZIP/ZipContainer.swift:140
#6	(null) in static ZipContainer.info(container:) ()
#7	0x0000000102ab9b94 in specialized processZipToDb(_:) <<--My project entry point

As I understand, public func uint32() -> UInt32 had not enough bytes to read and crashed, and it lacked available range checks.

XCode 11 support ?

Hi ,

Will there be a release that will support XCode 11? Right now I'm pointing my cartfile in SWCompression to "develop" just to get the latest codebase that was build with swift 5.x

Thanks!

Replace `precondition` with throwing

First of all, thank you for so cool library. Save lots of work for me!

The thing I want to propose it to replace precondition statement in code with just marking methods as throws.

  • rarely in production we want to get crash accidentally.
  • it could be pretty boring all the time check remaining bit size and then perform read

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.