GithubHelp home page GithubHelp logo

kaitai_struct_python_runtime's Introduction

Kaitai Struct: runtime library for Python

PyPI PyPI - Python Version

This library implements Kaitai Struct API for Python.

Kaitai Struct is a declarative language used for describe various binary data structures, laid out in files or in memory: i.e. binary file formats, network stream packet formats, etc.

It is similar to Python's Construct 2.10 but it is language-agnostic. The format description is done in YAML-based .ksy format, which then can be compiled into a wide range of target languages.

Further reading:

kaitai_struct_python_runtime's People

Contributors

4m1g0 avatar antonv6 avatar arekbulski avatar athre0z avatar cugu avatar dgelessus avatar dgladkov avatar felixonmars avatar generalmimon avatar greycat avatar kolanich avatar mgorny avatar wbarnha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaitai_struct_python_runtime's Issues

read->write->read raises UnsupportedOperation: read error on a file but not on a BytesIO

Hi,

I read, write, and then read a file, but it throws an error.

from kaitaistruct import KaitaiStream
from png import Png
data = Png.from_file('1x1.png')
data._read()
buffer = open('out.png', 'wb')
buffer.truncate(data._io.size())
_io = KaitaiStream(buffer)
data._write(_io)
_io.seek(0)
data._read()

throws

File ~/projects/speedier/jupyter_env/lib/python3.10/site-packages/kaitaistruct.py:392, in KaitaiStream._read_bytes_not_aligned(self, n)
    389     is_satisfiable = (n <= num_bytes_available)
    391 if is_satisfiable:
--> 392     r = self._io.read(n)
    393     num_bytes_available = len(r)
    394     is_satisfiable = (n <= num_bytes_available)

UnsupportedOperation: read

If I use an bytesio buffer instead:

from kaitaistruct import KaitaiStream
import io
from png import Png
data = Png.from_file('1x1.png')
data._read()
buffer = io.BytesIO(bytearray(data._io.size()))
_io = KaitaiStream(buffer)
data._write(_io)
_io.seek(0)
data._read()

No error is thrown.

I am running with kaitai_struct_python_runtime at 92a2d71519ba.

KaitaiStream ( 'str' object has no attribute 'read')

According this documentation we can use KaitaiStream for reading data from stream. But ....


$ virtualenv  test
$ source test/bin/activate

$ pip2 install  kaitaistruct
....
Successfully installed kaitaistruct-0.7
$ pip2 install  enum34
...
Successfully installed enum34-1.1.6


$ ksc --version
kaitai-struct-compiler 0.7
$ ksc -t python  test.ksy
$ python2.7 testrun.py

Traceback (most recent call last):
  File "testrun.py", line 9, in <module>
    obj = Testparser(stream)
  File "<skip>/testparser.py", line 20, in __init__
    self.key_header = self._root.KeyHeader(self._io, self, self._root)
  File "<skip>/testparser.py", line 27, in __init__
    self.magic = self._io.ensure_fixed_contents(struct.pack('5b', 77, 65, 71, 73, 67))
  File "<skip>.local/lib/python2.7/site-packages/kaitaistruct.py", line 271, in ensure_fixed_contents
    actual = self._io.read(len(expected))
AttributeError: 'str' object has no attribute 'read'

$ cat test.ksy

meta:
  id: testparser
  endian: be
  encoding: ASCII

seq:
  - id: key_header
    type: key_header

types:
   key_header:
     seq:
       - id: magic
         contents: MAGIC
$ cat  testrun.py
__author__ = '@ret5et'

from kaitaistruct import KaitaiStream
from testparser import  Testparser

data =  "MAGIC\x00\x41\x41\x00"

stream = KaitaiStream(data)
obj = Testparser(stream)

Unable to install 0.9 with Pipenv

Using pipenv --version
pipenv, version 2020.11.15

pipenv run python -version
Python 3.6.3

When I try to run the following command to install kaitaistruct version 0.9, I get the following error.

$pipenv run python -m pip install kaitaistruct==0.9
Collecting pip
Using cached https://nexus.fanops.net/nexus/repository/pypi-all/packages/pip/20.3.1/pip-20.3.1-py2.py3-none-any.whl
Installing collected packages: pip
Found existing installation: pip 9.0.1
Uninstalling pip-9.0.1:
Successfully uninstalled pip-9.0.1
Successfully installed pip-20.3.1
Looking in indexes: http://pypi/nexus/repository/pypi-all/simple
Collecting kaitaistruct
Using cached https://nexus.fanops.net/nexus/repository/pypi-all/packages/kaitaistruct/0.9/kaitaistruct-0.9.tar.gz (5.5 kB)
WARNING: Generating metadata for package kaitaistruct produced metadata for project name unknown. Fix your #egg=kaitaistruct fragments.
ERROR: Requested unknown from https://nexus.fanops.net/nexus/repository/pypi-all/packages/kaitaistruct/0.9/kaitaistruct-0.9.tar.gz#sha256=3d5845817ec8a4d5504379cc11bd570b038850ee49c4580bc0998c8fb1d327ad has different name in metadata: 'UNKNOWN'

When I try to run the following command to install kaitaistruct version 0.8, I get no errors.
$pipenv run python -m pip install kaitaistruct==0.8
Collecting pip
Using cached https://nexus.fanops.net/nexus/repository/pypi-all/packages/pip/20.3.1/pip-20.3.1-py2.py3-none-any.whl
Installing collected packages: pip
Found existing installation: pip 9.0.1
Uninstalling pip-9.0.1:
Successfully uninstalled pip-9.0.1
Successfully installed pip-20.3.1
Looking in indexes: http://pypi/nexus/repository/pypi-all/simple
Collecting kaitaistruct==0.8
Downloading https://nexus.fanops.net/nexus/repository/pypi-all/packages/kaitaistruct/0.8/kaitaistruct-0.8.tar.gz (5.2 kB)
Building wheels for collected packages: kaitaistruct
Building wheel for kaitaistruct (setup.py): started
Building wheel for kaitaistruct (setup.py): finished with status 'done'
Created wheel for kaitaistruct: filename=kaitaistruct-0.8-py2.py3-none-any.whl size=7059 sha256=44206a142e396c5afe17da3b3e7f1873e8b803f38c8aabe95f28b01e20d148d1
Stored in directory: /var/lib/jenkins/.cache/pip/wheels/71/b6/db/5f356e240aeacbe63f3e4cd874a16f15b4ee21f6ec05c317d3
Successfully built kaitaistruct
Installing collected packages: kaitaistruct
Successfully installed kaitaistruct-0.8

Could somebody help me know I can do to resolve?

Define a custom exception class instead of raising generic exceptions

Currently, when a Kaitai Struct parser detects that the input data is invalid, it can raise a variety of exceptions: ValueError, EOFError, generic Exception, and possibly some others. This makes it very difficult for callers to specifically catch parse errors.

It would be helpful to have a custom exception class (e. g. kaitaistruct.ParseError) that is consistently raised for all invalid input data. This would allow the calling code to catch and handle that exception specifically, without catching other unrelated errors by accident.

KaitaiStream.bytes_io is not needed

I don't see a purpose for KaitaiStream.bytes_io static method. Following examples are equivalent:

from kaitaistruct import BytesIO
BytesIO(data)
from kaitaistruct import KaitaiStream
KaitaiStream.bytes_io(data)

Alternative runtime without `struct`

Python's struct module seems to be pretty inefficient for our purposes. Namely, in all APIs it provides, it requires passing a format string into unpack-like function, which then parses that format string in runtime, calls relevant unpack methods, and then constructs a tuple with a single value, which we extract right away.

Actually, struct even has everything we need — for example, these are functions which read ("unpack") integers, but it's not exposed as Python API.

Would it make sense / be faster to introduce alternative, native Kaitai Struct API which would be written in C, but would be faster than existing one?

Cc @koczkatamas @KOLANICH @arekbulski

MappedBinaryIO, Testimplementation for alternating KaitaiStream - maybe

Hello everyone,

im decently new to working with binary files and KaitaiStruct. I love it but i unfortunately dont like the ReadWriteStruct.

I created a different approach based on the Python Runtime and i would like to have some feedback about possible improvements (and / or / or why) thats not suitable for Kaitai.

Please be kind with me, thats my first "package" and definitely the first mmap impl. i created.

The overall intention is (if you guys like the approach) that i would try to convert it and improve it further ( and create a new /different compiler-mode).

If you see mistakes or not logical implementations, please tell me. I want to learn!

Edit1: Note, there are obviously a lot of functions missing that Kaitai needs. This is just my usecase i currently build this around. Take it as a Prototype for a possible mmap approach.

Edit2: About the performance: I cant really say much at the moment but just by testing this, i already noticed a gain in speed (IDE runs the code a lot faster). Thats obviously a really bad comparison but if someone is interested, i could do tests aswell

import os
import struct
from mmap import mmap, ACCESS_COPY
from typing import List, Union


class Parser:
    """Parser class for binary data"""

    struct_mapping = {
        "u2be": struct.Struct(">H"),
        "u4be": struct.Struct(">I"),
        "u8be": struct.Struct(">Q"),
        "u2le": struct.Struct("<H"),
        "u4le": struct.Struct("<I"),
        "u8le": struct.Struct("<Q"),
        "s1": struct.Struct("b"),
        "s2be": struct.Struct(">h"),
        "s4be": struct.Struct(">i"),
        "s8be": struct.Struct(">q"),
        "s2le": struct.Struct("<h"),
        "s4le": struct.Struct("<i"),
        "s8le": struct.Struct("<q"),
        "f4be": struct.Struct(">f"),
        "f8be": struct.Struct(">d"),
        "f4le": struct.Struct("<f"),
        "f8le": struct.Struct("<d"),
        "u1": struct.Struct("B"),
    }

    range_mapping = {
        "u2be": (0, 65535),
        "u4be": (0, 4294967295),
        "u8be": (0, 18446744073709551615),
        "u2le": (0, 65535),
        "u4le": (0, 4294967295),
        "u8le": (0, 18446744073709551615),
        "s1": (-128, 127),
        "s2be": (-32768, 32767),
        "s4be": (-2147483648, 2147483647),
        "s8be": (-9223372036854775808, 9223372036854775807),
        "s2le": (-32768, 32767),
        "s4le": (-2147483648, 2147483647),
        "s8le": (-9223372036854775808, 9223372036854775807),
        "u1": (0, 255),
        "f4be": (-3.4e38, 3.4e38),
        "f8be": (-1.8e308, 1.8e308),
        "f4le": (-3.4e38, 3.4e38),
        "f8le": (-1.8e308, 1.8e308),
    }

    @classmethod
    def is_value_in_range(cls, pattern_id: str, value: Union[int, float]) -> bool:
        """Check if value is in range of pattern_id"""
        min_value, max_value = cls.range_mapping.get(pattern_id, (None, None))
        if min_value is None or max_value is None:
            raise ValueError(f"Pattern ID {pattern_id} not found.")
        return min_value <= value <= max_value

    @classmethod
    def pack_value(cls, pattern_id: str, value: Union[int, float]) -> bytes:
        """Convert value to bytes"""
        if not cls.is_value_in_range(pattern_id, value):
            raise ValueError(f"Value {value} out of range for pattern ID {pattern_id}.")
        struct_pattern = cls.struct_mapping.get(pattern_id)
        if struct_pattern is None:
            raise ValueError(f"Invalid pattern ID {pattern_id}.")
        return struct_pattern.pack(value)

    def read(self, data: bytes, pattern_id: str) -> bytes:
        """Read bytes from data"""
        size = self.struct_mapping.get(pattern_id, struct.Struct("")).size
        return data[:size]

    def read_value(self, data: bytes, pattern_id: str) -> Union[int, float]:
        """Read value from data"""
        packed_data = self.read(data, pattern_id)
        return self.struct_mapping[pattern_id].unpack(packed_data)[0]

    def read_array(
        self, data: bytes, count: int, pattern_id: str
    ) -> List[Union[int, float]]:
        """Read array of values from data"""
        size = self.struct_mapping[pattern_id].size
        return [
            self.read_value(data[i : i + size], pattern_id)
            for i in range(0, count * size, size)
        ]


class BaseMappedBinary:
    def __init__(self, file_path: str, output_file_path: str = None):
        self.file_path = file_path
        self.output_file_path = output_file_path
        if not os.path.exists(self.file_path):
            self.file = open(self.file_path, "w+b")
        else:
            self.file = open(self.file_path, "r+b")
        self.mapped_file = mmap(self.file.fileno(), 0, access=ACCESS_COPY)
        self.offset = 0
        self.parser = Parser()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.close()

    def _read_from_offset(self, size: int) -> bytes:
        return self.mapped_file[self.offset : self.offset + size]

    def _update_offset(self, size: int):
        self.offset += size

    def close(self):
        self.mapped_file.close()
        self.file.close()

    def seek(self, offset: int) -> int:
        """Seek to offset"""
        self.offset = offset
        return self.offset

    def tell(self) -> int:
        """Return current offset"""
        return self.offset

    def flush(self):
        self.mapped_file.flush()


class MappedBinaryReader(BaseMappedBinary):
    def __init__(self, file_path: str):
        super().__init__(file_path, output_file_path=None)

    def read(self, pattern_id: str) -> bytes:
        return self.parser.read(
            self._read_from_offset(self.parser.struct_mapping[pattern_id].size),
            pattern_id,
        )

    def read_value(self, pattern_id: str) -> Union[int, float]:
        size = self.parser.struct_mapping[pattern_id].size
        value = self.parser.read_value(self._read_from_offset(size), pattern_id)
        self._update_offset(size)
        return value

    def read_array(self, count: int, pattern_id: str) -> List[Union[int, float]]:
        size = self.parser.struct_mapping[pattern_id].size
        values = self.parser.read_array(
            self._read_from_offset(count * size), count, pattern_id
        )
        self._update_offset(count * size)
        return values

    def read_string(self, count: int) -> str:
        """Read string from data"""
        value = self._read_from_offset(count).decode("utf-8")
        self._update_offset(count)
        return value

    def read_string_array(self, count: int) -> List[str]:
        """Read array of strings from data"""
        return [self.read_string(count) for _ in range(count)]

    def read_string_array_with_count(self) -> List[str]:
        """Read array of strings from data"""
        count = self.read_value("u4le")
        return self.read_string_array(count)

    def read_string_with_count(self) -> str:
        """Read string from data"""
        count = self.read_value("u4le")
        return self.read_string(count)

    def read_bytes(self, count: int) -> bytes:
        """Read bytes from data"""
        return self._read_from_offset(count)

    def read_bytes_with_count(self) -> bytes:
        """Read bytes from data"""
        count = self.read_value("u4le")
        return self._read_from_offset(count)

    def read_value_array_with_count(self, pattern_id: str) -> List[Union[int, float]]:
        """Read array of values from data"""
        count = self.read_value("u4le")
        return self.read_array(count, pattern_id)

    def read_value_array(self, count: int, pattern_id: str) -> List[Union[int, float]]:
        """Read array of values from data"""
        return self.read_array(count, pattern_id)


class MappedBinaryWriter(BaseMappedBinary):
    def __init__(self, file_path: str):
        super().__init__(file_path, output_file_path=None)
        self.data = b""

    def get_data(self) -> bytes:
        """Return the collected data as bytes"""
        return self.data

    def write(self, pattern_id: str, value: Union[int, float]) -> None:
        """Write value to data"""
        self.data += self.parser.pack_value(pattern_id, value)

    def write_value(self, pattern_id: str, value: Union[int, float]) -> None:
        """Write value to data"""
        self.write(pattern_id, value)

    def write_array(self, pattern_id: str, values: List[Union[int, float]]) -> None:
        """Write array of values to data"""
        for value in values:
            self.write_value(pattern_id, value)

    def write_value_array(
        self, pattern_id: str, values: List[Union[int, float]]
    ) -> None:
        """Write array of values to data"""
        self.write_array(pattern_id, values)

    def write_bytes(self, value: bytes) -> None:
        """Write bytes to data"""
        self.data += value

    def write_bytes_with_count(self, value: bytes) -> None:
        """Write bytes to data"""
        self.write_value("u4le", len(value))
        self.write_bytes(value)

    def write_string(self, value: str) -> None:
        """Write string to data"""
        self.data += value.encode("utf-8")

    def write_string_array(self, values: List[str]) -> None:
        """Write array of strings to data"""
        for value in values:
            self.write_string(value)

    def write_string_array_with_count(self, values: List[str]) -> None:
        """Write array of strings to data"""
        self.write_value("u4le", len(values))
        self.write_string_array(values)

    def write_string_with_count(self, value: str) -> None:
        """Write string to data"""
        self.write_value("u4le", len(value))
        self.write_string(value)

    def write_value_array_with_count(
        self, pattern_id: str, values: List[Union[int, float]]
    ) -> None:
        """Write array of values to data"""
        self.write_value("u4le", len(values))
        self.write_array(pattern_id, values)


class MappedBinaryIO(MappedBinaryReader, MappedBinaryWriter):
    def __init__(self, file_path: str, output_file_path: str = None):
        self.file_path = file_path

        if output_file_path is None:
            self.output_file_path = file_path + ".bin"
        else:
            self.output_file_path = output_file_path
        self.reader = MappedBinaryReader(self.file_path)
        self.writer = MappedBinaryWriter(self.file_path)

    def read_value(self, pattern_id: str) -> Union[int, float]:
        return self.reader.read_value(pattern_id)

    def write_value(self, pattern_id: str, value: Union[int, float]) -> None:
        self.writer.write_value(pattern_id, value)

    def flush(self) -> None:
        self.writer.flush()

    def seek(self, offset: int) -> int:
        return self.reader.seek(offset)

    def tell(self) -> int:
        return self.reader.tell()

    def close(self) -> None:
        self.reader.close()
        self.writer.close()

and a testfile class:



class ExpFile(MappedBinaryIO):
    def __init__(self, file_path: str, output_file_path: str = None):
        super().__init__(file_path)
        self._read()
        self.data = self.writer.get_data()
        if output_file_path is None:
            self.output_file_path = file_path + ".bin"
        else:
            self.output_file_path = output_file_path
        self.mapped_file = self.reader.mapped_file

    def _read(self):
        self.magic = self.reader.read_string(4)
        self.version = self.reader.read_value("u2le")
        self.uk = self.reader.read_value("u4le")
        self.header_size = self.reader.read_value("u4le")

    def __repr__(self):
        return (
            f"ExpFile({self.magic=}, {self.version=}, {self.uk=}, {self.header_size=})"
        )

    def _write(self):
        self.writer.write_string(self.magic)
        self.writer.write("u2le", self.version)
        self.writer.write("u4le", self.uk)
        self.writer.write("u4le", self.header_size)
        return self.writer.get_data()

    def write_to_file(self):
        with open(self.output_file_path, "wb") as f:
            f.write(self._write())


if __name__ == "__main__":
    mt = ExpFile(r"D:\binparser\eso0001.dat")
    mt.write_to_file()
    print(mt)
    print(mt.tell())

read_bytes(): add extra length check

read_bytes() could benefit from an extra length check. I am using kaitai struct for parsing and there are plenty of false positives for certain file types. I sometimes end up reading a lot of extra data. With an extra check to see if the amount of bytes to be read is smaller than the stream size or file size this would be avoided:

    def read_bytes(self, n):
        if n < 0:
            raise ValueError(
                "requested invalid %d amount of bytes" %
                (n,)
            )
        r = self._io.read(n)
        if len(r) < n:
            raise EOFError(
                "requested %d bytes, but got only %d bytes" %
                (n, len(r))
            )
        return r

for example could be rewritten to something like:

    def read_bytes(self, n):
        if n < 0:
            raise ValueError(
                "requested invalid %d amount of bytes" %
                (n,)
            )
        if n > self.size():
            raise ValueError(
                 "requested to read %d bytes, but only %d available" %
                 (n, self.size()))
        r = self._io.read(n)
        if len(r) < n:
            raise EOFError(
                "requested %d bytes, but got only %d bytes" %
                (n, len(r))
            )
        return r

or something similar.

Right now I am trying to work around this by adding extra boundary checks in the .ksy files that looks at the size of the file, but that's a rather ugly hack.

Missing Runtime Dependency on `pkg_resources`

Kaitai's generated Python parsers currently depend on pkg_resources, which is not specified as a dependency of kaitaistruct. This problem isn't very visible as most environments still have setuptools around (which includes pkg_resources), but if setuptools is not around in the current virtualenv, importing parsers crashes with an ImportError.

One could argue that Kaitai's behavior is technically correct: The actual import is done in the downstream codebase -- in the Python file generated by ksc -- and not within the kaitaistruct package. However, all generated Kaitai parsers have this dependency, and it seems relatively stupid and brittle to have every single Kaitai user add an additional dependency for pkg_resources themselves. Long story short, I think kaitaistruct should have packages that are required by all ksc-generated parsers to its own dependency list.

With this aside, @asottile (who initially reported this to us in @mitmproxy) made the very helpful suggestion in mitmproxy/mitmproxy#4918 (comment) that the generated parser should ideally just use importlib.metadata from stdlib instead of dependening on pkg_resources. Unfortunately importlib.metadata has only been added in Python 3.8, so just changing ksc is not enough. Long story short, I propose the following changes:

  1. Make the kaitaistruct PyPI package depend on the importlib-metadata PyPI package for Python <3.8 (see https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#platform-specific-dependencies).
    This means no new (de facto less) dependencies for anyone on a recent Python version.
  2. Once a new kaitaistruct version is released on PyPI that includes these changes, update ksc to emit parsers that use importlib.metadata instead of pkg_resources.

Issue while installing with setuptools==46

Hi guys,

when I tried to install kaitaistruct for python I got this error message:

E:\Documents\SourceCode\vwgroup\EthernetTraceAnalysis\ethernet_evaluation_meb>pip install --upgrade --pre git+https://github.com/kaitai-io/kaitai_struct_python_runtime.git
Collecting git+https://github.com/kaitai-io/kaitai_struct_python_runtime.git
  Cloning https://github.com/kaitai-io/kaitai_struct_python_runtime.git to c:\users\harryk~1\appdata\local\temp\pip-req-build-uaees51r
  Running command git clone -q https://github.com/kaitai-io/kaitai_struct_python_runtime.git 'C:\Users\HARRYK~1\AppData\Local\Temp\pip-req-build-uaees51r'
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\harrykane\appdata\local\programs\python\python35\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\HARRYK~1\\AppData\\Local\\Temp\\pip-req-build-uaees51r\\setup.py'"'"'; __file__='"'"'C:\\Users\\HARRYK~1\\AppData\\Local\\Temp\\pip-req-build-uaees51r\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\HARRYK~1\AppData\Local\Temp\pip-req-build-uaees51r\pip-egg-info'
         cwd: C:\Users\HARRYK~1\AppData\Local\Temp\pip-req-build-uaees51r\
    Complete output (21 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\HARRYK~1\AppData\Local\Temp\pip-req-build-uaees51r\setup.py", line 10, in <module>
        setup(**cfg)
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\site-packages\setuptools\__init__.py", line 144, in setup
        return distutils.core.setup(**attrs)
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\distutils\core.py", line 108, in setup
        _setup_distribution = dist = klass(attrs)
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\site-packages\setuptools\dist.py", line 425, in __init__
        k: v for k, v in attrs.items()
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\distutils\dist.py", line 281, in __init__
        self.finalize_options()
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\site-packages\setuptools\dist.py", line 706, in finalize_options
        ep.load()(self)
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\site-packages\setuptools\dist.py", line 713, in _finalize_setup_keywords
        ep.load()(self, ep.name, value)
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\site-packages\setuptools\dist.py", line 288, in check_specifier
        packaging.specifiers.SpecifierSet(value)
      File "c:\users\harrykane\appdata\local\programs\python\python35\lib\site-packages\setuptools\_vendor\packaging\specifiers.py", line 572, in __init__
        specifiers = [s.strip() for s in specifiers.split(",") if s.strip()]
    AttributeError: 'SpecifierSet' object has no attribute 'split'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

This disappeared when I downgraded my setuptool version from 46.0.0 to 40.0.0

Best regards,
Martin

AttributeError: 'SpecifierSet' object has no attribute 'split'

I'm unable to install this via pip since commit b18ba4f

# pip install --upgrade --pre git+https://github.com/kaitai-io/kaitai_struct_python_runtime.git@b18ba4f4ab0c87d5cc04009451d0968beba31f0e
Collecting git+https://github.com/kaitai-io/kaitai_struct_python_runtime.git@b18ba4f4ab0c87d5cc04009451d0968beba31f0e
  Cloning https://github.com/kaitai-io/kaitai_struct_python_runtime.git (to revision b18ba4f4ab0c87d5cc04009451d0968beba31f0e) to /tmp/pip-req-build-4kvsrrc6
  Running command git clone -q https://github.com/kaitai-io/kaitai_struct_python_runtime.git /tmp/pip-req-build-4kvsrrc6
  Running command git checkout -q b18ba4f4ab0c87d5cc04009451d0968beba31f0e
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-4kvsrrc6/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-4kvsrrc6/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-gi747nqb
         cwd: /tmp/pip-req-build-4kvsrrc6/
    Complete output (21 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-4kvsrrc6/setup.py", line 10, in <module>
        setup(**cfg)
      File "/usr/local/lib/python3.8/site-packages/setuptools/__init__.py", line 144, in setup
        return distutils.core.setup(**attrs)
      File "/usr/local/lib/python3.8/distutils/core.py", line 108, in setup
        _setup_distribution = dist = klass(attrs)
      File "/usr/local/lib/python3.8/site-packages/setuptools/dist.py", line 425, in __init__
        _Distribution.__init__(self, {
      File "/usr/local/lib/python3.8/distutils/dist.py", line 292, in __init__
        self.finalize_options()
      File "/usr/local/lib/python3.8/site-packages/setuptools/dist.py", line 717, in finalize_options
        ep(self)
      File "/usr/local/lib/python3.8/site-packages/setuptools/dist.py", line 724, in _finalize_setup_keywords
        ep.load()(self, ep.name, value)
      File "/usr/local/lib/python3.8/site-packages/setuptools/dist.py", line 289, in check_specifier
        packaging.specifiers.SpecifierSet(value)
      File "/usr/local/lib/python3.8/site-packages/setuptools/_vendor/packaging/specifiers.py", line 572, in __init__
        specifiers = [s.strip() for s in specifiers.split(",") if s.strip()]
    AttributeError: 'SpecifierSet' object has no attribute 'split'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

It seems related to pypa/setuptools#1869

Open .ksy files directly

http://doc.kaitai.io/lang_python.html doesn't specify how exactly to import .ksy files from Python. I would expect something like this.

import kaitai
Gif = kaitai.compile('git.ksy')
g = Gif.from_file('some.gif')

If I understand it right, direct loading of .ksy files is not supported, because .ksy parser is not written in Python.

PyPI project page does not link to GitHub repo

I've just noticed that https://pypi.org/project/kaitaistruct/ doesn't contain any backlink to the source repository, i.e. https://github.com/kaitai-io/kaitai_struct_python_runtime. The "Homepage" in Project links leads to https://kaitai.io:

image

..., which I guess is fine because it's indeed the homepage of the project, but the repo with source code of the package at PyPI is not directly reachable.

I wonder how it can be expressed in setup.cfg - it looks like project_urls is intended for that purpose?

List additional relevant URLs about your project. This is the place to link to bug trackers, source repositories, or where to support package development. The string of the key is the exact text that will be displayed on PyPI.

project_urls needs setuptools >= 38.3.0, which is fine, because we already require an even later version (because of the long_description_content_type we're using):

setup_requires =
setuptools >= 38.6.0

AttributeError: 'SpecifierSet' object has no attribute 'split'

Maybe it is a more setuptools-related issue but I thought I'd drop a message so you guys are aware that this tends to happen. And maybe it will help anyone who faces a similar problem.
When trying to build a wheel or install with pip directly form GitHub, there is some sort of problem reading the declarative config, which requires me to manually patch it each time.
The following is thrown:

ERROR: Command errored out with exit status 1:
     command: 'C:\tools\Anaconda3\envs\kissyb\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:
\\Users\\KCZARN~1\\AppData\\Local\\Temp\\pip-req-build-r8io7rtk\\setup.py'"'"'; __file__='"'"'C:\\Users\\KCZARN~1\\Ap
pData\\Local\\Temp\\pip-req-build-r8io7rtk\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.
read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-
base 'C:\Users\KCZARN~1\AppData\Local\Temp\pip-req-build-r8io7rtk\pip-egg-info'
         cwd: C:\Users\KCZARN~1\AppData\Local\Temp\pip-req-build-r8io7rtk\
    Complete output (21 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\KCZARN~1\AppData\Local\Temp\pip-req-build-r8io7rtk\setup.py", line 10, in <module>
        setup(**cfg)
        return distutils.core.setup(**attrs)
      File "C:\tools\Anaconda3\envs\kissyb\lib\distutils\core.py", line 108, in setup
      File "C:\tools\Anaconda3\envs\kissyb\lib\site-packages\setuptools\dist.py", line 426, in __init__
        k: v for k, v in attrs.items()
      File "C:\tools\Anaconda3\envs\kissyb\lib\distutils\dist.py", line 292, in __init__
        self.finalize_options()
      File "C:\tools\Anaconda3\envs\kissyb\lib\site-packages\setuptools\dist.py", line 717, in finalize_options
        ep(self)
      File "C:\tools\Anaconda3\envs\kissyb\lib\site-packages\setuptools\dist.py", line 724, in _finalize_setup_keywords
        ep.load()(self, ep.name, value)
      File "C:\tools\Anaconda3\envs\kissyb\lib\site-packages\setuptools\dist.py", line 289, in check_specifier
        packaging.specifiers.SpecifierSet(value)
      File "C:\tools\Anaconda3\envs\kissyb\lib\site-packages\setuptools\_vendor\packaging\specifiers.py", line 572, in __init__
        specifiers = [s.strip() for s in specifiers.split(",") if s.strip()]
    AttributeError: 'SpecifierSet' object has no attribute 'split'

What solves it is disabling the python_requires line in the declarative config like so:

[options]
zip_safe = True
include_package_data = True
py_modules = kaitaistruct
# python_requires = >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*

So it seems parsing of these version specifiers doesn't work as it should.
I made sure to update setuptools and I'm using a Python 3.7.6 interpreter (Anaconda, Windows).

Python API and docs

I could not find any docs on Python API. Are there any helpful methods beyond mere field accessors? For example, I need:

  1. Get the size of a substructure
  2. Dump or pretty print parsed structure with values from binary file

Enums in "switch-on" with extra math

In my ksy I have a sequence like this:

- id: message_type
  type: b4
  enum: message_type
- id: is_update_request
  type: b1
- id: is_from_server
  type: b1
- id: params
  type:
    switch-on: "message_type.as<u1> + (is_from_server.as<u1> << 4) + (is_update_request.as<u1> << 5)"
    cases:
      32: ping_device

And below is the generated code for this switch:

_on = ((self.message_type + (self.is_from_server << 4)) + (self.is_update_request << 5))

I had to change self.message_type to self.message_type.value to get this working. The same applies to instances too.

Python style enforcement

Python runtime might use some Python style enforcement. I actually don't know a thing about Python coding, so I'm asking for help from knowledgeable folks out there — cc @KOLANICH @koczkatamas @arekbulski @dgladkov.

I see that there is PEP8, PEP257, Google style guide, and a handful of less widespread style guides. There are tons of tools that offer some kind of enforcement:

Currently, we have codacy.com checking our Python runtime with Bandit, Prospector & PyLint, but their configuration is mostly defaults which were been chosen by Codacy team.

Which coding style / tool should we choose? Let's decide on this and add some sort of config for it into this repository?

read_bytes_term() should raise EOFError instead of Exception

read_bytes_term() should raise EOFError instead of Exception to be consistent with read_bytes().

Btw. it's funny that found this issue 1h later than the person that has reported #40

Even if #40 is implemented using a custom class, I'd suggest having both read_bytes_term() and read_bytes() raise the same type of error.

My use case is that I'm trying to detect when the packet that I'm parsing has been truncated so it would be helpful to be able to just do except EOFError (or except kaitaistruct.EOFError) instead of having to parse the exception message.

size() does not work in Python 2 in a KaitaiStream backed by a 'file' object

As @armijnhemel suggested in #61 (comment), in Python 3 the io.tell() is redundant:

# Seek to the end of the File object
io.seek(0, SEEK_END)
# Remember position, which is equal to the full length
full_size = io.tell()

and can be simplified to this (because io.IOBase.seek returns the new absolute position already):

# Seek to the end of the stream and remember the full length
full_size = io.seek(0, SEEK_END)

I applied this suggestion in 255f5b7. This has been released in 0.10.


However, I was reading through some old discussions and came across this comment by @arekbulski:

Unfortunately I also found out that on Python 2, seek method doesnt return current offset but None. That precludes that optimisation. Eh.

I tried this in my Python 2.7 installation and indeed - if you open a file using the built-in open() function, you get a 'file' object, and if you use it to initialize the KaitaiStream, the size() returns None:

Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from kaitaistruct import KaitaiStream
>>> with open('C:\\temp\\kaitai_struct\\tests\\src\\position_to_end.bin', 'rb') as f:
...     print(type(f))
...     _io = KaitaiStream(f)
...     print(_io.size())
...
<type 'file'>
None

This is documented in Python 2 docs (https://docs.python.org/2/library/stdtypes.html#file.seek):

file.seek(offset[, whence])
Set the file’s current position, like stdio’s fseek(). (...) There is no return value.

On the other hand, the typical from_file helper works because it uses io.open() (as recommended in https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data since it's consistent from Python 2 to 3) instead of open():

Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from position_to_end import PositionToEnd
>>> with PositionToEnd.from_file('C:\\temp\\kaitai_struct\\tests\\src\\position_to_end.bin') as r:
...     print(type(r._io._io))
...     print(r._io.size())
...
<type '_io.BufferedReader'>
21

I'm not sure why none of the tests caught this - yes, they typically use the from_file helper, but I remember that the read_bytes method was raising errors in almost all tests (IIRC) in the CI when I tried calling _io.seekable() unconditionally:

# in Python 2, there is a common error ['file' object has no
# attribute 'seekable'], so we need to make sure that seekable() exists
and callable(getattr(self._io, 'seekable', None))
and self._io.seekable()

and that error clearly mentioned the 'file' object.

io.UnsupportedOperation: read - When write stream to file with size-eos:true

Hello, I was tring to use serialization feature to generate a bgp keepalive packet with python runtime. I defined a size-eos: true attribute, and tried to write the stream to a file. Then there is an Exception: io.UnsupportedOperation: read.
The ksy file:

meta:
  id: bgp_packet
  title: BGP (Border Gateway Protocol) packet
  xref:
    rfc: 4271
  encoding: utf-8
  endian: be
seq:
  - id: marker
    size: 16
  - id: length
    type: u2
  - id: bgp_type
    type: u1
    enum: bgp_packet_type
  - id: body
    type:
      switch-on: bgp_type
      cases:
        "bgp_packet_type::open": open_msg
        "bgp_packet_type::update": update_msg
        "bgp_packet_type::notification": notification_msg
        "bgp_packet_type::keepalive": keepalive_msg
        "bgp_packet_type::route_refresh": routerefresh_msg
enums:
  bgp_packet_type:
    1: open
    2: update
    3: notification
    4: keepalive
    5: route_refresh
types:
  ...
  keepalive_msg:
    seq:
      - id: data
        size-eos: true
  ...

After kaitai_struct_compiler/jvm/target/universal/stage/bin/kaitai-struct-compiler --read-write --no-auto-read -t python ./bgp_packet.ksy, The py file:

    ...
    class KeepaliveMsg(ReadWriteKaitaiStruct):
        def __init__(self, _io=None, _parent=None, _root=None):
            self._io = _io
            self._parent = _parent
            self._root = _root

        def _read(self):
            self.data = self._io.read_bytes_full()


        def _fetch_instances(self):
            pass


        def _write__seq(self, io=None):
            super(BgpPacket.KeepaliveMsg, self)._write__seq(io)
            self._io.write_bytes(self.data)
            if not self._io.is_eof():
                raise kaitaistruct.ConsistencyError(u"data", self._io.size() - self._io.pos(), 0)


        def _check(self):
            pass
    ...

My test:

def test_write():
    np = BgpPacket()
    np.marker = b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
    np.length = 19
    np.bgp_type = BgpPacket.BgpPacketType.keepalive
    kp = BgpPacket.KeepaliveMsg(None, np, np._root)
    kp.data = b''
    kp._check()
    np.body = kp
    np._check()

    nf = open('./new_bgp_packet.bin', 'wb')
    nf.truncate(19)

    with KaitaiStream(nf) as _io:
        np._write(_io)

I ran it and got exception:

Traceback (most recent call last):
  File "bgp_parser.py", line 53, in <module>
    test_write()
  File "bgp_parser.py", line 48, in test_write
    np._write(_io)
  File "kaitaistruct.py", line 64, in _write
    self._write__seq(io)
  File "bgp_packet.py", line 88, in _write__seq
    self.body._write__seq(self._io)
  File "bgp_packet.py", line 419, in _write__seq
    if not self._io.is_eof():
  File "kaitaistruct.py", line 110, in is_eof
    t = io.read(1)
io.UnsupportedOperation: read

In the function _write__seq(), there is a function self._io.is_eof(), the function is in the kaitaistruct.py:
https://github.com/kaitai-io/kaitai_struct_python_runtime/blob/master/kaitaistruct.py#L75-L85
In this function, it tries to read a byte and check if it is reaching an eof, but the stream was opened as 'wb'.
I changed the code as follow and it works:

        # io = self._io
        # t = io.read(1)
        # if t == b'':
        #     return True
        if self.size() == self.pos():
            return True

XOR benchmarks

Followup of discussion in 04289cc

Python 2 bytearray mutations

== run_benchmark_process_xor.py
max = 2147483543
34.8218421936
0.414654970169

Python 3 bytearray mutations

== run_benchmark_process_xor.py
max = 2147483543
33.529802142000335
0.39478560099996685

Python 3 iterating bytes

== run_benchmark_process_xor.py
max = 2147483543
32.42236607199993
0.39769275300022855

Python 2 numpy bitwise_xor

== run_benchmark_process_xor.py
max = 2147483543
27.3349659443
0.424569129944

Python 3 numpy bitwise_xor

== run_benchmark_process_xor.py
max = 2147483543
24.56664780200026
0.4790448710000419

Conclusions:

  • On real world example difference between bytearray mutations and map over bytes for Python 3 is not worth maintaining the separate code path.
  • numpy's bitwise_xor gives significant performance increase, however it requires manual key wrapping as bitwise_xor expects arrays of the same size

As numpy contains C extensions and may require compilation, I suggest adding numpy as an optional dependency and falling back to native Python bytearray mutations + xor if numpy is not installed.

Use int.from_bytes() instead of struct for various ints (u1, u2, u4, etc.)

I am filing this issue as something to possibly consider for the future.

Currently struct is used for converting bytes to ints (and a few other values like floats, but this issue is only about ints):

https://github.com/kaitai-io/kaitai_struct_python_runtime/blob/master/kaitaistruct.py#L170

For ints Python 3 has a different mechanism for converting from byte strings to ints namely from_bytes():

https://docs.python.org/3/library/stdtypes.html#int.from_bytes

What is very convenient is that it allows arbitrary length byte strings, so it becomes absolutely trivial to implement something like u3 or u5 or u11.

No kaitaistruct pypi package?

Hello,

I am trying to use Kaitai Struct with Python. However, I am unable to install it through pip on both Python 2 and Python 3:

dc@dc:~$ pip --version
pip 8.1.2 from /usr/local/lib/pypy2.7/dist-packages (python 2.7)
dc@dc:~$ pip install kaitaistruct
Collecting kaitaistruct
  Could not find a version that satisfies the requirement kaitaistruct (from versions: )
No matching distribution found for kaitaistruct
dc@dc:~$ pip3 --version
pip 1.5.6 from /usr/lib/python3/dist-packages (python 3.4)
dc@dc:~$ pip3 install kaitaistruct
Downloading/unpacking kaitaistruct
  Could not find any downloads that satisfy the requirement kaitaistruct
Cleaning up...
No distributions at all found for kaitaistruct
Storing debug log for failure in /home/dc/.pip/pip.log
dc@dc:~$ 

Pypi search also gives no results: https://pypi.python.org/pypi?%3Aaction=search&term=kaitaistruct&submit=search

Upload prebuilt wheel distributions to PyPI

It looks like currently the PyPI kaitaistruct package only contains a source tarball distribution, but not a prebuilt wheel distribution. Wheels are Python's format for built packages, and their advantage is that a wheel is a static collection of files, which can be installed (by Python package managers like pip) by just extracting them into the proper location. In contrast, installing a package from a source tarball requires running a build script/tool, and potentially installing extra dependencies for the build. This all happens automatically, but is a bit slower than just extracting a wheel. Wheels also have some other advantages besides just installation speed, but they're a bit complicated to explain. In any case, it's considered good practice to provide wheels, and almost every package on PyPI has them.

In the case of pure Python modules like kaitaistruct, the difference between installing a source tarball and a wheel actually isn't that big - because Python code doesn't need to be compiled, the build from source has no special requirements and normally takes less than a second to complete. On the other hand, this also means that it's very quick and easy as a developer to build and upload a wheel distribution, and it makes the installation a little bit faster for everyone 🙂

The process for building and uploading a wheel is very straightforward - it should be enough to go into a source directory for version 0.9 of the runtime and run:

$ python3 setup.py bdist_wheel
$ twine check dist/kaitaistruct-0.9-*.whl # runs a few validity checks on the wheel's metadata
$ twine upload dist/kaitaistruct-0.9-*.whl

This should add the built wheel in addition to the existing source tarball for version 0.9. I think PyPI lets you retroactively upload additional files for existing releases, as long as no existing files would get overwritten. I've never actually tried this myself though (I always upload source and wheel distributions at the same time) so it might not actually be possible to add a wheel to an existing release like this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.