GithubHelp home page GithubHelp logo

det-lab / kaitai_struct_awkward_runtime Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 2.0 22.07 MB

Kaitai Struct: runtime for Awkward Arrays

License: MIT License

CMake 3.69% Python 24.43% Kaitai Struct 67.86% Makefile 3.19% C++ 0.83%

kaitai_struct_awkward_runtime's Introduction

Kaitai Struct: runtime library for Awkward

This library building Awkward Arrays using Kaitai Struct API for Awkward using C++/STL.

Steps

1. Write a .ksy file for your custom file format. Refer to the Kaitai User Guide for more details.

Here, we take an example of animal.ksy

meta:
  id: animal
  endian: le
  license: CC0-1.0
  ks-version: 0.8

seq:
  - id: entry
    type: animal_entry
    repeat: eos

types:
  animal_entry:
    seq:
      - id: str_len
        type: u1

      - id: species
        type: str
        size: str_len
        encoding: UTF-8

      - id: age
        type: u1

      - id: weight
        type: u2

2. Clone kaitai_struct_awkward_runtime repository:

git clone --recursive https://github.com/det-lab/kaitai_struct_awkward_runtime.git
cd kaitai_struct_awkward_runtime
git checkout ManasviGoyal/test

3. Update submodule and compile scala code (only the first time)

git submodule update --init
cd kaitai_struct_compiler
sbt package
cd ../
chmod u+x kaitai-struct-compiler

4. Generate the source and header files for Awkward target

./kaitai-struct-compiler -t awkward --outdir src-animal example_data/schemas/animal.ksy

5. Install the library

pip install awkward-kaitai

6. Build awkward-kaitai by passing the path of the main .cpp from the generated code.

If python gives a warning about the installation not being on the path, in order to get the file to build you may need to run:

echo "export PATH=$PATH:/whatever/path/python/says" >> ~/.bashrc
awkward-kaitai-build src-animal/animal.cpp -b build

Note:

awkward-kaitai-build [-h] [-d DEST] [-b BUILD] file

options:

  • -h, --help: shows help message
  • -d DEST, --dest DEST: explicitly specify a destination for the build shared library.
  • -b BUILD, --build BUILD: explicitly specify a build location.

7. Open python and print the returned ak.Array:

python
import awkward_kaitai

animal = awkward_kaitai.Reader("./src-animal/libanimal.so") # pass the path of the shared file
awkward_array = animal.load("example_data/data/animal.raw")

awkward_array.to_list()

Output

[{'animalA__Zentry': [{'animal_entryA__Zstr_len': 3, 'animal_entryA__Zspecies': 'cat', 'animal_entryA__Zage': 5, 'animal_entryA__Zweight': 12}, {'animal_entryA__Zstr_len': 3, 'animal_entryA__Zspecies': 'dog', 'animal_entryA__Zage': 3, 'animal_entryA__Zweight': 43}, {'animal_entryA__Zstr_len': 6, 'animal_entryA__Zspecies': 'turtle', 'animal_entryA__Zage': 10, 'animal_entryA__Zweight': 5}]}]

Related Papers and Talks

  1. Describe Data to get Science-Data-Ready Tooling: Awkward as a Target for Kaitai Struct YAML, Advanced Computing and Analysis Techniques for Physics Research Workshop 2024, New York, US.
  2. Awkward Target for Kaitai Struct, PyHEP Users Workshop 2023.

kaitai_struct_awkward_runtime's People

Contributors

manasvigoyal avatar agoose77 avatar zonca avatar maramaraschino avatar dependabot[bot] avatar pibion avatar

Stargazers

 avatar Saransh Chopra avatar

Watchers

Jim Pivarski avatar  avatar  avatar

Forkers

agoose77 zonca

kaitai_struct_awkward_runtime's Issues

`scdms_v8.ksy` does not compile

If I try to compile scdms_v8.ksy with:

java -cp kaitai_struct_compiler/jvm/target/scala-2.12/kaitai-struct-compiler_2.12-0.11-SNAPSHOT.jar:/usr/share/kaitai-struct-compiler/lib/* io.kaitai.struct.JavaMain -t awkward --outdir test_artifacts example_data/schemas/scdms_v8.ksy

I get some warnings but no error:

 example_data/schemas/scdms_v8.ksy: /seq/2/id:
        warning: use `len_odb` instead of `odb_size`, given that it's only used as a byte size of `odb` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

example_data/schemas/scdms_v8.ksy: /types/sdu_channel_block/seq/1/id:
        warning: use `num_sdu_channel_sample` instead of `sdu_num_samples`, given that it's only used as repeat count of `sdu_channel_sample` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

example_data/schemas/scdms_v8.ksy: /types/sdu_block/instances/num_sdu_channels/id:
        warning: use `num_sdu_channel_blk` instead of `num_sdu_channels`, given that it's only used as repeat count of `sdu_channel_blk` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

example_data/schemas/scdms_v8.ksy: /types/midas_header/seq/7/encoding:
        warning: use canonical encoding name `UTF-8` instead of `utf-8` (see https://doc.kaitai.io/ksy_style_guide.html#encoding-name)

However, the scdms_v8.cpp file is not created.

feat: add support for `import`

The current implementation does not support ksy files which have an import (e.g. midas.ksy). Also fix the multiple definition error for C functions.

fix: `kaitai_struct_compiler` upstream issues

After merging kaitai_struct_compiler upstream, the kaitai-struct-compiler tool fails. With the upstream, we also need to reflect the new changes in CppCompiler.scala to AwkwardCompiler.scala since the later is a modified copy of the former.

Dependency on jar files for testing purposes

@ManasviGoyal @jpivarski

I see that for testing you needed to include some jar files in the repository

$ ls lib/
com.github.scopt.scopt_2.12-3.6.0.jar       com.lihaoyi.sourcecode_2.12-0.1.4.jar
com.lihaoyi.fastparse-utils_2.12-1.0.0.jar  org.scala-lang.scala-library-2.12.12.jar
com.lihaoyi.fastparse_2.12-1.0.0.jar        org.yaml.snakeyaml-1.28.jar

is there a way to define those packages as requirements in the sbt build script of the compiler?

fix: `UnionBuilder` for `SwitchType`

For ksy files with SwitchType, we need a UnionBuilder. Currently, we have an implementation for it but it only works when the UnionBuilder types and primitive. It fails for UserType.

Kaitai build error

In trying to create an example file that works alongside the instructions under the kaitai struct runtime library for Awkward, I have generated the file "ParametricParser.ksy" in a separate repository which is meant to be able to parse key-value pairs with lengths of 3 or 8.
When following the instructions, I simply replace the command:
./kaitai-struct-compiler -t awkward --outdir src-animal example_data/schemas/animal.ksy
With:
$ ./kaitai-struct-compiler -t awkward --outdir src-parametric ~/dataReaderWriter/kaitai/ksy/ParametricParser.ksy
And then, instead of running:
awkward-kaitai-build src-animal/animal.cpp -b build
I run:
$ awkward-kaitai-build src-parametric/kv_pairs.cpp -b build
Which produces two errors:
error: no matching function for call to ‘kv_pairs_t::kv_pair_t::kv_pair_t(awkward::LayoutBuilder::Record<std::map<long unsigned int, std::__cxx11::basic_string<char> >, awkward::LayoutBuilder::Field<0, awkward::LayoutBuilder::ListOffset<long int, awkward::LayoutBuilder::Numpy<unsigned char> > >, awkward::LayoutBuilder::Field<1, awkward::LayoutBuilder::ListOffset<long int, awkward::LayoutBuilder::Numpy<unsigned char> > > >&, int, kaitai::kstream*&, kv_pairs_t*, kv_pairs_t*&)’

error: no matching function for call to ‘kv_pairs_t::kv_pair_t::kv_pair_t(awkward::LayoutBuilder::Record<std::map<long unsigned int, std::__cxx11::basic_string<char> >, awkward::LayoutBuilder::Field<0, awkward::LayoutBuilder::ListOffset<long int, awkward::LayoutBuilder::Numpy<unsigned char> > >, awkward::LayoutBuilder::Field<1, awkward::LayoutBuilder::ListOffset<long int, awkward::LayoutBuilder::Numpy<unsigned char> > > >&, int, kaitai::kstream*&, kv_pairs_t*, kv_pairs_t*&)’
The two errors point to lines 36 and 45 of the generated file respectively.

test: add more tests

I think it would to good to have more tests that the user can run with:

python -m pytest tests

The tests can be added for the following ksy files:

  • fake.ksy
  • index_option.ksy
  • hello_world.ksy (need to create raw data for this)
  • numpy.ksy
  • pixie4e.ksy
  • records.ksy
  • scdms_v8.ksy
  • scdms.ksy (scdms_v_two_trigger.bin can be renamed to scdms.bin and scdms_v_two_trigger.ksy can be removed)

Create documentation on readthedocs

Current documentation is on the README.md of the repository.

It would be good to setup readthedocs with Sphinx (like https://awkward-array.org/doc/main/getting-started/index.html) or mkdocs or similar.

We can start from README.md as "Install" and "Getting started", then details on the examples we have in https://github.com/ManasviGoyal/kaitai_struct_awkward_runtime/tree/main/example_data.

@pibion's student can provide feedback on the docs.

Let's postpone this until we have moved the 2 repositories under an organization.

fix: improve error message in case of file not found

Currently if a file is missing, the error message is a C++ I/O exception:

In [6]: import awkward_kaitai

In [7]:     reader = awkward_kaitai.Reader("test_artifacts/libpixie4e.so")

In [8]:     awkward_array = reader.load("example_data/data/pixie4e.raw")
   ...: 
terminate called after throwing an instance of 'std::__ios_failure'
  what():  basic_ios::clear: iostream error
Aborted

Not urgent, but we could catch this in Python and give a better error message.

feat: add `IndexedBuilder` for `EnumType` cases

Currently we use NumpyBuilder for EnumType cases. We eventually want to use IndexedBuilder to store the associated strings in enum. This requires adding an IndexedBuilder in LayoutBuilder.h in awkward.

Interactive debugging of cpp files generated by kaitai

I would like to be able to run the cpp files generated by kaitai through an interactive debugger, to simplify troubleshooting but also development in general.

Any experience doing that / suggestions?

What I have tried below

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.