GithubHelp home page GithubHelp logo

bindata-spox's Introduction

Hi. I'm spox and this is a modified version of bindata to
work on Ruby 1.9. It is not throughly tested, and I am sure
there are still incompatibility bugs hiding out in the
source. This has been modified just enough to work with
my modified version of packetfu (http://github.com/spox/packetfu-spox)
on 1.9

git clone git://github.com/spox/bindata-spox
cd bindata-spox
gem build bindata.gemspec
gem install bindata-spox-x.x.x.gem

= BinData

A declarative way to read and write structured binary data.

== What is it for?

Do you ever find yourself writing code like this?

  io = File.open(...)
  len = io.read(2).unpack("v")
  name = io.read(len)
  width, height = io.read(8).unpack("VV")
  puts "Rectangle #{name} is #{width} x #{height}"

It's ugly, violates DRY and feels like you're writing Perl, not Ruby.
There is a better way.

  class Rectangle < BinData::MultiValue
    uint16le :len
    string   :name, :read_length => :len
    uint32le :width
    uint32le :height
  end

  io = File.open(...)
  r = Rectangle.read(io)
  puts "Rectangle #{r.name} is #{r.width} x #{r.height}"

BinData makes it easy to specify the structure of the data you are
manipulating.

Read on for the tutorial, or go straight to the
download[http://rubyforge.org/frs/?group_id=3252] page.

== Syntax

BinData declarations are easy to read.  Here's an example.

  class MyFancyFormat < BinData::MultiValue
    stringz :comment
    uint8   :count, :check_value => lambda { (value % 2) == 0 }
    array   :some_ints, :type => :int32be, :initial_length => :count
  end

The structure of the data in this example is
1. A zero terminated string
2. An unsigned 8bit integer which must by even
3. A sequence of unsigned 32bit integers in big endian form, the total
   number of which is determined by the value of the 8bit integer.

The BinData declaration matches the english description closely.  Just for
fun, lets look at how we'd implement this using #pack and #unpack.  Here's
the writing code, have a go at the reading code.

  comment = "this is a comment"
  some_ints = [2, 3, 8, 9, 1, 8]
  File.open(...) do |io|
    io.write([comment, some_ints.size, *some_ints].pack("Z*CN*"))
  end


The general format of a BinData declaration is a class containing one or more
fields.

  class MyName < BinData::MultiValue
    type field_name, :param1 => "foo", :param2 => bar, ...
    ...
  end

*type* is the name of a supplied type (e.g. <tt>uint32be</tt>,  +string+)
or a user defined type.  For user defined types, convert the class name
from CamelCase to lowercase underscore_style.

*field_name* is the name by which you can access the data.  Use either a
String or a Symbol.  You may specify a name as nil, but this is described
later in the tutorial.

Each field may have *parameters* for how to process the data.  The
parameters are passed as a Hash using Symbols for keys.

== Handling dependencies between fields

A common occurance in binary file formats is one field depending upon the
value of another.  e.g. A string preceded by it's length.

As an example, let's assume a Pascal style string where the byte preceding
the string contains the string's length.

  # reading
  io = File.open(...)
  len = io.getc
  str = io.read(len)
  puts "string is " + str

  # writing
  io = File.open(...)
  str = "this is a string"
  io.putc(str.length)
  io.write(str)

Here's how we'd implement the same example with BinData.

  class PascalString < BinData::MultiValue
    uint8  :len,  :value => lambda { data.length }
    string :data, :read_length => :len
  end

  # reading
  io = File.open(...)
  ps = PascalString.new
  ps.read(io)
  puts "string is " + ps.data

  # writing
  io = File.open(...)
  ps = PascalString.new
  ps.data = "this is a string"
  ps.write(io)

This syntax needs explaining.  Let's simplify by examining reading and
writing separately.

  class PascalStringReader < BinData::MultiValue
    uint8  :len
    string :data, :read_length => :len
  end

This states that when reading the string, the initial length of the string
(and hence the number of bytes to read) is determined by the value of the
+len+ field.

Note that <tt>:read_length => :len</tt> is syntactic sugar for
<tt>:read_length => lambda { len }</tt>, but more on that later.

  class PascalStringWriter < BinData::MultiValue
    uint8  :len, :value => lambda { data.length }
    string :data
  end

This states that the value of +len+ is always equal to the length of +data+.
+len+ may not be manually modified.

Combining these two definitions gives the definition for +PascalString+ as
previously defined.

Once thing to note with dependencies, is that a field can only depend on one
before it.  You can't have a string which has the characters first and the
length afterwards.

== Predefined Types

These are the predefined types.  Custom types can be created by composing
these types.

BinData::String::   A sequence of bytes.
BinData::Stringz::  A zero terminated sequence of bytes.

BinData::Array::    A list of objects of the same type.
BinData::Choice::   A choice between several objects.
BinData::Struct::   An ordered collection of named objects.

BinData::Int8::     Signed  8 bit integer.
BinData::Int16le::  Signed 16 bit integer (little endian).
BinData::Int16be::  Signed 16 bit integer (big endian).
BinData::Int32le::  Signed 32 bit integer (little endian).
BinData::Int32be::  Signed 32 bit integer (big endian).
BinData::Int64le::  Signed 64 bit integer (little endian).
BinData::Int64be::  Signed 64 bit integer (big endian).

BinData::Uint8::    Unsigned  8 bit integer.
BinData::Uint16le:: Unsigned 16 bit integer (little endian).
BinData::Uint16be:: Unsigned 16 bit integer (big endian).
BinData::Uint32le:: Unsigned 32 bit integer (little endian).
BinData::Uint32be:: Unsigned 32 bit integer (big endian).
BinData::Uint64le:: Unsigned 64 bit integer (little endian).
BinData::Uint64be:: Unsigned 64 bit integer (big endian).

BinData::Bit1::     1 bit unsigned integer (big endian).
BinData::Bit2::     2 bit unsigned integer (big endian).
...
BinData::Bit63::    63 bit unsigned integer (big endian).

BinData::Bit1le::   1 bit unsigned integer (little endian).
BinData::Bit2le::   2 bit unsigned integer (little endian).
...
BinData::Bit63le::  63 bit unsigned integer (little endian).

BinData::FloatLe::  Single precision floating point number (little endian).
BinData::FloatBe::  Single precision floating point number (big endian).
BinData::DoubleLe:: Double precision floating point number (little endian).
BinData::DoubleBe:: Double precision floating point number (big endian).

BinData::Rest::     Consumes the rest of the input stream.

== Parameters

  class PascalStringWriter < BinData::MultiValue
    uint8  :len, :value => lambda { data.length }
    string :data
  end

Revisiting the Pascal string writer, we see that a field can take
parameters.  Parameters are passed as a Hash, where the key is a symbol.
It should be noted that parameters are designed to be lazily evaluated,
possibly multiple times.  This means that any parameter value must not have
side effects.

Here are some examples of legal values for parameters.

  * :param => 5
  * :param => lambda { 5 + 2 }
  * :param => lambda { foo + 2 }
  * :param => :foo

The simplest case is when the value is a literal value, such as 5.

If the value is not a literal, it is expected to be a lambda.  The lambda
will be evaluated in the context of the parent, in this case the parent is
an instance of +PascalStringWriter+.

If the value is a symbol, it is taken as syntactic sugar for a lambda
containing the value of the symbol.
e.g <tt>:param => :foo</tt> is <tt>:param => lambda { foo }</tt>

== Saving Typing

The endianess of numeric types must be explicitly defined so that the code
produced is independent of architecture.  Explicitly specifying the
endianess of each numeric type can become tedious, so the following
shortcut is provided.

  class A < BinData::MultiValue
    endian :little

    uint16   :a
    uint32   :b
    double   :c
    uint32be :d
    array    :e, :type => :int16
  end

is equivalent to:

  class A < BinData::MultiValue
    uint16le  :a
    uint32le  :b
    double_le :c
    uint32be  :d
    array     :e, :type => :int16le
  end

Using the endian keyword improves the readability of the declaration as well
as reducing the amount of typing necessary.  Note that the endian keyword will
cascade to nested types, as illustrated with the array in the above example.

== Creating custom types

Custom types should be created by subclassing BinData::MultiValue or
BinData::SingleValue.  Ocassionally it may be useful to subclass
BinData::Single.  Subclassing other classes may have unexpected results
and is unsupported.

Let us revisit the Pascal String example.

  class PascalString < BinData::MultiValue
    uint8  :len,  :value => lambda { data.length }
    string :data, :read_length => :len
  end

We'd like to make PascalString a custom type that behaves like a
BinData::Single object so we can use :initial_value etc.  Here's an
example usage of what we'd like:

  class Favourites < BinData::MultiValue
    pascal_string :language, :initial_value => "ruby"
    pascal_string :os,       :initial_value => "unix"
  end

  f = Favourites.new
  f.os = "freebsd"
  f.to_s #=> "\004ruby\007freebsd"

We create this type of custom string by inheriting from BinData::SingleValue
and implementing the #get and #set methods.

  class PascalString < BinData::SingleValue
    uint8  :len,  :value => lambda { data.length }
    string :data, :read_length => :len

    def get;   self.data; end
    def set(v) self.data = v; end
  end

If the type we are creating represents a single value then inherit from
BinData::SingleValue, otherwise inherit from BinData::MultiValue.

== License

BinData is released under the same license as Ruby.

Copyright (c) 2007, 2008 Dion Mendel

bindata-spox's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.