equinor / dlisio Goto Github PK

Python library for working with the well log formats Digital Log Interchange Standard (DLIS V1) and Log Information Standard (LIS79)

Home Page: https://dlisio.readthedocs.io/en/latest/

License: Other

CMake 0.80% C 1.69% C++ 47.14% Python 50.37%

dlis rp66v1 lis lis79

dlisio's People

Contributors

Stargazers

Watchers

dlisio's Issues

curves access iostream error

Hi, I am trying to access some curves in "206_05a-_3_DWL_DWL_WIRE_258276501.DLIS" example file in the following code:

import dlisio    
with dlisio.load("206_05a-_3_DWL_DWL_WIRE_258276501.DLIS") as files:
    for f in files:header = f.fileheader
    
#------------------------------------------------------------------#

print(f.object_sets['FRAME']) # just to find the fingerprints
print(f.curves('T.FRAME-I.2000T-O.2-C.0'))

,
and I'm getting the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-4aad290d18a0> in <module>
      1 print(f.object_sets['FRAME'])
----> 2 print(f.curves('T.FRAME-I.2000T-O.2-C.0'))

~/miniconda3/lib/python3.6/site-packages/dlisio/__init__.py in curves(self, fingerprint)
    133         indices = self.fdata_index[fingerprint]
    134         a = np.empty(shape = len(indices), dtype = frame.dtype)
--> 135         core.read_all_fdata(fmt, self.file, indices, a)
    136         return a
    137 

RuntimeError: basic_ios::clear: iostream error

,
on Linux, and RuntimeError: ios_base::failbit set:iostream stream error on Windows.

'utf-8' codec can't decode byte 0xf4 in position 24: invalid continuation byte

dlisio interpret fails to assure enough space for strings of variable length

For types where data size is not known by default (for example, IDENT), current code doesn't assure enough space is allocated to fit the whole string.

Channels metadata in Python

Build an interface for channels metadata, so users don't have to deal with object set pools, type filtering and all that jazz.

Improve buildsystem for python extension

So the build system for the python extension is pretty awful and clumsy for anything non-trivial, in particular after linking etc. kicks in. sckit has an interesting project https://github.com/scikit-build/scikit-build that integrates setuptools and cmake a bit more, and there are a couple of other resources out there too on how to leverage cmake to do the native stuff, managed from setuptools.

It could certainly be a viable avenue for us, and no doubt less painful than vanilla setuptools.

Running valgrind on the testuite uncovered some memory issues

Exhaustive testing of dlis_packf

The dlis_packf function (in lib/include/dlisio.h + lib/src/dlisio.cpp) interprets bytes, transforms them to corresponding native data types (int8_t, uint32_t, float, double etc.), and packs the data in an untyped output array.

The types are enumerated in the RP66 documentation in Appendix B
http://w3.energistics.org/rp66/V1/Toc/main.html

What's needed is an exhaustive test suite for all kinds of input format strings, to verify things are correctly packed, and a safety net for further development.

A few has already been written, but could use some more verification too probably. Tests should go in lib/test/protocol.cpp

Bad message when opening none existing file

File "C:\appl\repos\Python\dlisiotest.py", line 3, in
files = dlisio.load("NONE_EXISTING_FILE")
File "C:\ProgramData\Anaconda3\lib\site-packages\dlisio_init_.py", line 376, in load
mmap.map(path)
RuntimeError: The system cannot find the path specified.

Invalid format specifier

Not implemented yet: invalid format specifier in fSSSSffffffffffSSSSfffffffff

Exhaustive testing of dlis_index_records

Similar to #36, the test coverage of dlis_index_records is just not good enough, and I would like to change that.

Multiple frames in one FDATA

Not implemented yet: multiple frames in one FDATA

Possible regression on dlisio.load()

When I try to load many of my DLIS files on DLISIO 0.1.9 as (for example):

import dlisio
test_file = r'R:\3-BRSA-1053-RJS\PERFIS_DIGITAIS\DLIS\3-brsa-1053-rjs_8aittdd_dsi-7845.dlis'
f = dlisio.load(test_file)

I get the following error:

ERROR:root:multiple distinct objects in set FILE-HEADER (). Duplicate fingerprint = T.FILE-HEADER-I.1-O.35-C.0
WARNING:root:continuing with the last object
ERROR:root:multiple distinct objects in set ORIGIN (). Duplicate fingerprint = T.ORIGIN-I.DLIS_DEFINING_ORIGIN-O.35-C.0
WARNING:root:continuing with the last object
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-0faac13e171f> in <module>()
      1 test_file = r'R:\3-BRSA-1053-RJS\PERFIS_DIGITAIS\DLIS\3-brsa-1053-rjs_8aittdd_dsi-7845.dlis'
----> 2 f = dlisio.load(test_file)

C:\ProgramData\Anaconda3\lib\site-packages\dlisio\__init__.py in load(path)
    334     try:
    335         stream.reindex(tells, residuals)
--> 336         f = dlis(stream, explicits, sul_offset = sulpos)
    337 
    338         explicits = set(explicits)

C:\ProgramData\Anaconda3\lib\site-packages\dlisio\__init__.py in __init__(self, stream, explicits, sul_offset)
     40             'COMPUTATION'            : plumbing.Computation.create,
     41         }
---> 42         self.load()
     43 
     44     def __enter__(self):

C:\ProgramData\Anaconda3\lib\site-packages\dlisio\__init__.py in load(self, sets)
    103 
    104                     logging.info(duplicate.format(fingerprint))
--> 105                     if original.attic != obj.attic:
    106                         msg = problem + where
    107                         msg = msg.format(os.type, os.name, fingerprint)

TypeError: __eq__(): incompatible function arguments. The following argument types are supported:
    1. (self: dlisio.core.objref, arg0: Tuple[str, Tuple[int, int, str]]) -> bool

Invoked with: dlisio.core.objref(fingerprint=T.TOOL-I.AITM-O.35-C.0), dlisio.core.objref(fingerprint=T.TOOL-I.AITM-O.35-C.0)

This seems to be some problem on fingertip comparison. I did some tests, this problem started occurring on DLISIO 0.1.5 onwards. On DLISIO 0.1.4 I could read the same file without further problems and even read all the curves inside the frames.

.describe() for python object-classes

The generic print of objects, currently handle by BasicObject's str is messy, at best! A describe() that uses python's pritty-print or similar should be implemented on each object-type.

'utf-8' codec can't decode byte 0xb0 in position 2: invalid start byte

Hi,

I was using this library to read the dlis files. I iterate over the records one by one and extract the headers as:

    dl = dlisio.load(dlis_file_name)
    ...
    record = dl.fp.eflr(dl.bookmarks[i])

As we know, there can be multiple PARAMETER headers in a dlis file. So the first occurrence of parameter header is throwing this exception:

'utf-8' codec can't decode byte 0xb0 in position 2: invalid start byte

other occurrences of parameters have been extracted fine.

The error is in the line:

    record = dl.fp.eflr(dl.bookmarks[i])

where i is the loop variable that I used to refer to the records one by one.

I checked a few other dlis files and similar happens there also. Please point me if I am doing something wrong and let me know if you need me to provide any more information by debugging or anything else.

Thanks.

Output all channels in frame as ndarray

f.logs[:] -> np.ndarray(dtype = ['i4, f4' ...]

Essentially, be able to an ndarray of all rows (corresponding to all frames) for some object, with an intuitive syntax.

Unify code for python object linkage

Current situation:
"link" method in many objects, like Tools, Calibration, etc goes through all the objects that have to be linked and performs linkage, that is very similar for all the objects. We expect more of this in future. That leads to way too much code duplication.

Suggestion is to implement "link" method similar to "load" one and allow 'link' in baseobject perform general linking. Responsibility of children classes should be only to provide information on what should be linked to what.

Deal with ENCRYPTED attribute in FRAME

Per 5.7.1 Frame Objects Encrypted attribute is special

If attribute is absent (code 000 or just not in the template), we should consider frame data not being encrypted.
-However, mere presence of this attribute should make record encrypted. It shouldn't matter if value is never explicitly specified, or if count is set to 0, or the value itself is 0/"".

Right now we drop all absent attributes anyway and do not proceed with attribute load if value is None. Hence we don't always process the value to explicitly make it True or False.

Current tests work because we specified encrypted = False by default, not None.

Consistent model for missing or absent attributes

Current situation:

count = 0, value bit not set
f.objects[key].attic["MY_ATTRIBUTE"] == []

count = 0, value bit set (no value provided, obviously)
f.objects[key].attic["MY_ATTRIBUTE"] == None

if MY_ATTRIBUTE is marked as absent in object,
f.objects[key].attic["MY_ATTRIBUTE"] will throw KeyError

Plan is to make cases 1 & 2 both return None, but behaviour should be revisited as we need a consistent model for missing or absent attributes.

Setup CircleCi

Implement remaining rp66 objects in Python

The following object-types are not yet implemented:

Well reference
Path
Group
Process
Splice

Update dlis_index_records docs "next and explicits are optional" part

"next and explicits are optional, and can be NULL. However, unless deciding information is sourced elsewhere, it will now be impossible to distinguish information is sourced elsewhere, it will now be impossible to distinguish some cases"
It's unclear what case are need to be distinguished.

Logical file structure is lost when creating the objectpool in python

rp66 defines that within a logical file, all objects must have an unique type/name tuple (type, id, origin, copynumber), not counting redundant and replacement sets. However, the type/name tuple is not required to be unique across logical files, as these are considered to be independent of each other.

As of now, dlisio creates a single objectpool for all objects from all logical files. The objectpool stores the objects in a dict, thus loosing the order of the objects. When the order is lost, there is currently no way of determining which logical file a object belongs to. This may lead to incorrect linking where objects are linked across logical files.

Objects should be split into groups based on logical files before they are added to an objectpool. We should possibly have one objectpool for each logical file

More debugging tools/information for dealing with issues in bytes processing

Let say DLISIO throws an error on opening some file.
If user wants to dig deeper into binary and figure out where is the problem, user might appreciate more information than is currently provided.

It would be nice if user knew exact byte position where failure happened (or at least values of neighbour bytes to be able to identify failure position in file on their own).
It would be beneficial to have easy access to already processed attributes in the same set, because actual failure might happen many bytes before reported place (imagine if bytes were incorrectly interpreted as 100-bytes long indent. Failure would be thrown at least 100 bytes after the actual error occurred.)

Hence it might be worth considering to extend support for binary debug.

Add stash attribute for python objects

Starting point:
Let's say that DLIS file has an Equipment object with various attributes set. But in one of them there is a typo, "AGULAR-DRIFT" instead of "ANGULAR-DRIFT".
Current code would iterate through all the attributes in DLIS file, would reach "AGULAR-DRIFT" attribute and won't be able to map it to any attribute in the "attributes" dictionary. Hence current code would throw KeyError, which eventually will be caught and object would be transformed into Unknown object.

Suggestion is to add "stash" attribute to objects. Stash should be a dictionary which contains all the DLIS attributes that were not mapped into known ones and be populated along in the load method.
Thus custom/mistyped attributes present in official objects won't turn objects into Unknown.

Additionally Unknown object can be reconstructed:

to have all its values in stash
to have empty "attributes" field to correspond with "attributes" field in other objects (know it has totally different meaning, so some confusion).
This should simplify the code and allow us to use the same "load" method for all the objects.

Make dlis.types a class attribute to be able to patch the dlis-object with new object-types

dlis.types should be a class attribute in the same manner as the attributes dict for object classes. This will enable us to patch in new object-types that's not implemented by dlisio, in the same way as we can patch object-types with new attributes.

Ability to query frame containing values for each channel

When getting the channels from a file - it would be helpful to know which frame the channel has its curve values in. For example TDEP (depth curve) can be in multiple frames with different sampling rates. The problem is that I don't know which TDEP long description from channel.long_name matches which TDEP in the numpy array. I can't match on name (names are identical) so do you parse in a specific order so I can at least sort it out by giving the channel an index which matches the reading order of the frames?

Make dlisio pypi-deployable on travis

The deploy to pypi on unix fails because the version built ends up being something like v0.0.1aN+1-dev-hash, rather than v0.0.1aN

https://travis-ci.org/Statoil/dlisio/jobs/478279853

This is setuptools_scm reasoning to the wrong version number, probably because it ends up dirtying the source tree while building.

Simplify object construction and invariants

So now record objects live in a hazy unspecified state for quite a while, which makes them awkward to work with. Improve this situation by making various states more well defined, and rely on named constructors to read from unknown-dicts, not the raw init.

Simplify python object model

Right now the various record types are in a mixed state of unstructured global lookup-then-filter, and linking. It should be only linked, and the has-family of functions could maybe go away.

dlis_pack_size uses incorrect size for fshort

For fshort type dlis_pack_size adds 2 bytes, but dlis_fshort assumes float (4 bytes).
Test also tests for 2 bytes instead of 4.

Use intrinsics for byte-swap when available

Falling back to shift-or-shift is nice, but it's probably an improvement to use intrinsics when available.

Replace core.basic_object with a dict

Some rudimentary measurements indicates that working with a lot of pybind/core types is rather slow, meaning basic_object -> enriched object mapping is pretty slow.

These could probably be pure python dicts, populated from the extension, which could speed things up considerably.

Defining origin

A logical file can contain multiple origins, but only one defining origin. The defining origin is defined as the first origin in the first origin set of each logical file [1]. The current implementation puts all objects in a dict, hence the defining origin is lost.

[1] rp66 v1 - 5.2.1: The first Object in the first ORIGIN Set is the Defining Origin for the Logical File in which it is contained

Update the sphinx documentation

The sphinx docs could use some improments.

An introduction to the project
Some concepts of dlis cannot be abstracted away by dlisio, hence should be explained
A quick guide to get started with using dlisio. Keep an emphasis on referring to the module docstrings for more detailed documentation. It is preferable that these are the main source of documentation as they are way easier to keep up to date.
General improvements for the existing docs

Some of this have deliberately been postponed because we keep breaking the interface with new changes, but now seems a good time to at least give some sort of guide to how to use dlisio

Behaviour change between 0.1.10 and 0.1.11: Batch object has no attribute [...]

Support for multiple logical files starts with this release, as described in #73. That means dlisio.load no longer returns a file handle, it returns a tuple of file handles.

That means code like this no longer works:

with dlisio.load('.dlis') as f:
    for channel in f.channels:
        ...

because there could be more logical files present.

The smoothest change is to replace these blocks with:

with dlisio.load('.dlis') as files:
    for f in files:    
        for channel in f.channels:
            ...

and dlisio will behave as before. Since files is a tuple, comma-based unpacking also works for it.

Bad segment trims

Example:
padbytes (which is 168) >= segment.size() (which is 136)

Measure trade-off for findoffsets

Suggestion is to write some kind of tests to measure trade-off for used memory and time (number of reallocations) for findoffsets feature.
Suggested parameters:

Initial logical record size guess (currently 4K)
Growth factor (currently 1.5)

Measure:

time of indexing
how much memory we take during indexing.

Note:
Acceptable memory size during indexing of 3.5GB file is couple of 10MB. 100MB is too much.

Frame metadata

Build an interface for querying various stuff related to the frames of this particular log.

Output subset of channels in log query

f.logs(channels = ['TDEP', '*CBL'])[:], i.e. pre-filter channels before reading and parsing frames, and return it as a larger numpy array.

Various faults that makes load crash from Python

Version: 0.1.11

Faults:
-bad segment trim: padbytes (which is 232) >= segment.size() (which is 136)
-record-length in record 135571 corrupted
-invalid argument
-file truncated
-'utf-8' codec can't decode byte 0xfd in position 43: invalid start byte
-record-length in record 1 corrupted
-unexpected end-of-record

set and remove functions for objects needs tests

Better dealing with duplicated mnemonics when retrieving log values

When I try to load the curve values from a frame with repeated mnemonics, something like:

fingerprint = fr.fingerprint # A frame fingerprint
curves = f.curves(fingerprint) # Load all curves

I get the following error: ValueError: field 'SPT4' occurs more than once. SPT4 may be any duplicated mnemonic.

This kind of issue is really common with DLIS files. I suggest using the same approach from lasio, any duplicates will have ':1', ':2', ':3', etc., appended to them.

That way the API user can inspect the data and decide what which logs he really wants to work with.

I'm using DLISIO version 0.1.4, installed directly using pip.

Setup CicleCI for macOS and Windows

CircleCi now support all platforms (Linux, macOS, Windows). We should aim to move away from travis (and appveyor) and do all our building, testing and deploying from CircleCI.

This task can be devided into 3 parts:

Apply for (free) access to macOS, build and test on all supported Python versions (3.5, 3.6, 3.7)
Migrate from appveyor to Circle
Build, test and deploy python wheel. Currently we build and test wheels with multibuild [1] in travis. multibuld is originally attended for building on travis (and appveyor) but maybe it can still be used for circleci? An alternative is to checkout cibuildwheel [2], which do support circleci

[1] https://github.com/matthew-brett/multibuild
[2] https://github.com/joerick/cibuildwheel#it-didnt-work

Move attribute docstring from init to class docstring

help(class) does not pick up docstrings on the form #:. It would be beneficial to move all the attribute docstrings (in the Python implementation of rp66 object-types) from init to the class docstring, using numpy-style documentation

Performance test linking

So linking defines a new function every time.

dlisio/python/dlisio/plumbing/basicobject.py

Line 186 in 3207f16

def get_link(v):

It's likely to be a sufficiently small set of link types to make that it's ok, but it's worth trying out an alternative implementation.

equinor / dlisio Goto Github PK

dlisio's People

Contributors

Stargazers

Watchers

Forkers

dlisio's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs