GithubHelp home page GithubHelp logo

segpy's Introduction

Segpy 2

Segpy is open source software created by Sixty North and licensed under the GNU Affero General Public License.

Alternative commercial license terms are available from Sixty North AS if you wish to redistribute Segpy as part of a proprietary closed source product or deliver software software-as-a-service (SaaS) using Segpy as part of a proprietary closed source service.

Status

Build status:

https://travis-ci.org/sixty-north/segpy.svg?branch=master Documentation Status https://coveralls.io/repos/github/sixty-north/segpy/badge.svg?branch=master

Installation

The segpy package is available on the Python Package Index (PyPI):

The package supports Python 3 only. To install:

$ pip install segpy

What is Segpy?

The SEG-Y file format is one of several standards developed by the Society of Exploration Geophysicists for storing geophysical seismic data. It is an open standard, and is controlled by the SEG Technical Standards Committee, a non-profit organization.

This project aims to implement an open SEG-Y module in Python 3 for transporting seismic data between SEG-Y files and Python data structures in pure Python.

Basic Usage

Here's a short example which converts non-standard little-endian SEG-Y to standard big-endian SEG-Y:

from segpy.reader import create_reader
from segpy.writer import write_segy

with open('seismic_little.sgy', 'rb') as segy_in_file:
    # The seg_y_dataset is a lazy-reader, so keep the file open throughout.
    seg_y_dataset = create_reader(segy_in_file, endian='<')  # Non-standard Rev 1 little-endian
    print(seg_y_dataset.num_traces())
    # Write the seg_y_dataset out to another file, in big-endian format
    with open('seismic_big.sgy', 'wb') as segy_out_file:
        write_segy(segy_out_file, seg_y_dataset, endian='>')  #  Standard Rev 1 big-endian

The create_reader() function creates Dataset which lazily fetches traces from the file, which is why the file must stay open for read for the duration of use of this dataset. We override the default endian paramers, to specify that the SEG-Y file we're reading is in non-standard little-endian byte order. On the last line of the example we write the Dataset out to a different file, this time with standard compliant big-endian byte order. Note that the input file must remain open as the write_segy() will only request one trace at a time from the input dataset. This means overal memory usage is very low, and the program can handle arbitrarily large SEG-Y files.

Contributing

The easiest way to contribute is to use Segpy submit reports for defects or any other issues you come across. Please see CONTRIBUTING.rst for more details.

Development

Segpy was created by – and to meet the needs of – Sixty North. If you require additional features, improved performance, portability to earlier versions of Python, or specific defects fixed (such defects are marked 'unfunded' in the GitHub issue tracker) Sixty North's experienced Segpy maintainers may be available to perform funded development work. Enquire with Sixty North at http://sixty-north.com.

Segpy Versions

Segpy 2.0 is a complete re-imagining of a SEG-Y reader in Python 3 and represents a complete break from any and all older versions of Segpy. No attempt has been made to maintain API compatibility with earlier versions of Segpy and no code is shared across versions. Although earlier versions of Segpy were open source, they were never 'released' as such. Earlier versions of Segpy are deprecated and completely unsupported.

Development

Deployment

$ pip install -e .[dev] $ bumpversion minor $ python setup.py sdist bdist_wheel $ twine upload --config-file <path>/sixty-north.pypirc dist/* $ git push origin

segpy's People

Contributors

abingham avatar drmaciver avatar rob-smallshire avatar rth avatar t4mmi avatar wassname avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segpy's Issues

how to get data

hi,l want to get the 3D seismic data in python env.but l search for long time ,l don't find the method ,l have get the file.
image

Implement IBM float in C++

This would almost certainly add a significant speed boost, and maybe some space savings as well. Of course this complicates the deployment/distribution story, so it might be good to keep the existing Python implementation around as a fall-back, i.e. for people who can't build C++ extensions.

util.minmax() can be around 25% faster

Depending whether util.minmax() figures large in the runtime profile, it may be worth optimizing. The faster algorithm is to take two elements at a time from the input sequence. You first compare them to one another, then compare the smaller to the current minimum and the larger to the current maximum, updating the min/max as appropriate. This amortizes to 3 comparison per 2 input elements rather than the current 4 comparisons. However, it's a somewhat more complex algorithm, so it may not be worth the time.

Sporadic health-check failures for `test_regular_mapping`

I've seen a Hypothesis health-check failure related to slow data generation for test_catalog.TestCatalogBuilder.test_regular_mapping on a few occasions. I've seen it both locally and on travis. It doesn't happen with every build, and it looks like this:

self = <test.test_catalog.TestCatalogBuilder object at 0x110ef3cf8>

    @given(start=integers(),
>          num=integers(0, 1000),
           step=integers(-1000, 1000),
           values=data())
    def test_regular_mapping(self, start, num, step, values):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 8 valid examples in 1.07 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

test/test_catalog.py:71: FailedHealthCheck

Question about TRACE_HEADER_DEF and TraceIdentificationCode

I'm just perusing the code right now so I may be off-base here, but the type for "TraceIdentificationCode" in TRACE_HEADER_DEF is uint16. But in the SEGY_REVISION_1 description map you use -1 as one of the keys. Are these keys supposed to be of the same type as the "type" field in the header def? If so, it looks like there's a mismatch.

`header()` strategy produces unusable binary reel headers

It seems that the test.strategies.header() strategy can produce BinaryReelHeaders with e.g. negative values in num_samples. This means the machinery is working correctly, but it also seems physically impossible. I imagine other fields in this and other headers can't hold arbitrary values from the entire range that their data type supports.

In trying to test the reader functionality, I was running into cases where header() was generating invalid headers, and I think this is the cause. I'm still wrapping my head around all of the parts, but it seems header is the (or a) culprit.

Is there a way to constrain field values? Am I using this function as expected? I was calling it like this:

binary_header = draw(header(BinaryReelHeader,
                            data_sample_format=data_sample_format))

Unable to read shot_gather.sgy file with error: ValueError: Assigned value 43690 for num_samples

@rob-smallshire Please accept my apology if my question annoys you.
I have read your guide in #87 and try to make use of custom_header.py but can't make it.
I'm trying to read a shot-gather.sgy file with the following script.
Could you please guide me on which number to be replaced.

Thank you in advance

from segpy.binary_reel_header import BinaryReelHeader
from segpy.datatypes import SegYType
from segpy.field_types import IntFieldMeta
from segpy.header import field
from segpy.reader import create_reader

# try to solve error below
# ValueError: Assigned value 43690 for num_samples attribute must be convertible to NNInt16: NNInt16 value 43690 outside range 0 to 32767
# read sample file from: https://github.com/sixty-north/segpy/blob/master/examples/custom_header.py

# Standard SEG-Y does not support 16-bit unsigned integer values in headers.
# This section customises SEG-Y to support them.
class UInt16(int,
            metaclass=IntFieldMeta,
            min_value=0,       # Use the full-range for unsigned
            max_value=65535,   # 16-bit integers
            seg_y_type=SegYType.NNINT16):   # The underlying NNINT16 type is actually read as an unsigned type.
    """16-bit unsigned integer."""
    pass

# Subclass the standard reel header to specialize one of its fields to have a type of UInt16.
class CustomBinaryReelHeader(BinaryReelHeader):
    num_samples = field(
        UInt16, offset=3221, default=0, documentation=
        """Number of samples per data trace. Mandatory for all types of data.
        Note: The sample interval and number of samples in the Binary File Header should be for the primary set of
        seismic data traces in the file."""
    )

# ============================================= RUN below =============================================
'''
The way I tried
- Changed `default` variable of class `CustomBinaryReelHeader` to my value(43690): Failed
- Changed `endian='<'`, got similar error with different value(59395)
'''
with open('./file/shot-gather.sgy', 'rb') as segy_in_file:
    # The seg_y_dataset is a lazy-reader, so keep the file open throughout.
    # seg_y_dataset = create_reader(segy_in_file, endian='<')  # Non-standard Rev 1 little-endian
    seg_y_dataset = create_reader(segy_in_file, binary_reel_header_format=CustomBinaryReelHeader)
    print(seg_y_dataset.num_traces())

can't read 3d seismic segy data

Hi,
I am trying to read a small 3D segy file, unfortunayely it does not understand it to be 3D!
Dimesionality given is 0 and it does not understand inlines and xlines in the header.

python program ::

import argparse
import os
import sys
import traceback

import numpy as np
from segpy.reader import *
from segpy_numpy.dtypes import make_dtype

with open('TEST.sgy', 'rb') as segy:

segy_reader = create_reader(segy)

print("Dimensionality: ", segy_reader.dimensionality)

print("Header: ", segy_reader.trace_header(0))

print("Number of traces: ", segy_reader.num_traces())
print("DATA_SAMPLE_FORMAT: ", segy_reader.data_sample_format)
print("Filename: ", segy_reader.filename)
print("SEG Y revision: ", segy_reader.revision)

print("inine: ",segy_reader.num_inlines())

print("inine: ",segy_reader.num_xlines())

following is the error:

Dimensionality: 0
Number of traces: 7
DATA_SAMPLE_FORMAT: float32
Filename: TEST.sgy
SEG Y revision: 0
Traceback (most recent call last):
File "test3.py", line 25, in
print("inine: ",segy_reader.num_inlines())
AttributeError: 'SegYReader' object has no attribute 'num_inlines'

The python file to read and the small 3D segy file is attached for your reference.

Supplied test_float fails

running tests:
~/.local/bin/py.test-3.3 test
...snip...
test/test_float.py:221: in test_ceil
self.assertEqual(math.ceil(ibm), i + 1)
E AssertionError: -16777216 != -16777215
...snip...
test/test_float.py:221: in test_ceil
self.assertEqual(math.ceil(ibm), i + 1)
E AssertionError: -16777216 != -16777215
...snip...

PEP 8 compliance?

Much of the existing code is not PEP 8 compliant. To date, I've done nothing about this in case backwards API compatibility was an issue for anybody. This seems unlikely given the relatively low profile of the project so far.

Consider putting some test support code inside the segpy package

In developing an extension module related to segpy, I found I wanted to reuse some of the hypothesis strategies used in the segpy test suite. I'm just copying those strategies over to my project right now, but it would be nice if I could access those strategies from segpy itself. Would it be possible and/or a good idea to put e.g. any_ibm_compatible_floats into some module like segpy.testing?

Cannot read Seisware SEG-Y

This issue received by email from a geophysicist in Calgary:

I am a geophysicist out of Calgary and use I use SeisWare. For some reason, I cannot seem to load seisware segy. I was hoping you might be able to point me in the right direction.

image

Use plugins and named extensions for trace headers

After running segpy over all of the SEGY files we've got, I've a better sense of why we need to let users specify different trace headers in different situations. Our new, strict header definitions don't work for some of our SEGY, but slightly looser definitions would work. So I was thinking about how to let users define and select which header definition to use.

One approach that worked well for cosmic ray is plugins and named extensions. In short, each header definition would be registered (e.g. using stevedore) with a given name, and these extensions would be provided by plugins. We could package several with segpy itself: Rev1-strict, Rev1-relaxed, etc. Stevedore would allow users to provide their own if they wanted/needed.

The other part of the puzzle would be deciding how we let them select a header. One option would be on the command line, another would be with a config file, and we could also combine these approaches if that made the most sense.

I just wanted to get this idea out there. It's worked pretty well for Cosmic Ray, so it might be appropriate for segpy as well.

Python <= 3.4 compatibility

Segpy has got to the point where support Python 3.4 requires compatibility hacks and effort, so 3.4 support is dropped unless somebody comes forward to fund the work or maintain it themselves.

segy 3d read error

Dear Respected Brother.

Good afternoon. I am facing error in reading and displaying a file.

Good afternoon. I am facing a problem in 3D segy reading. Can you help me why this error is? Please

ValueError Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/segpy/header.py in set(self, instance, value)
298 try:
--> 299 self._instance_data[instance] = self._named_field._value_type(value)
300 except ValueError as e:

~/.local/lib/python3.5/site-packages/segpy/field_types.py in class_new(cls, *args, **kwargs)
13 cls.name, instance,
---> 14 cls.MINIMUM, cls.MAXIMUM))
15 return instance

ValueError: NNInt16 value 51614 outside range 0 to 32767

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
in ()
19
20 f=open(file_near, 'rb')
---> 21 segy = create_reader(f)
22
23 ntraces = segy.num_traces()

~/.local/lib/python3.5/site-packages/segpy/reader.py in create_reader(fh, encoding, binary_reel_header_format, trace_header_format, endian, progress, cache_directory, dimensionality)
151 if reader is None:
152 reader = _make_reader(fh, encoding, binary_reel_header_format, trace_header_format,
--> 153 endian, progress_callback, dimensionality)
154 if cache_file_path is not None:
155 _save_reader_to_cache(reader, cache_file_path)

~/.local/lib/python3.5/site-packages/segpy/reader.py in _make_reader(fh, encoding, binary_reel_header_format, trace_header_format, endian, progress, dimensionality)
256 encoding = ASCII
257 textual_reel_header = read_textual_reel_header(fh, encoding)
--> 258 binary_reel_header = read_binary_reel_header(fh, binary_reel_header_format, endian=endian)
259 extended_textual_header = read_extended_textual_headers(fh, binary_reel_header, encoding)
260 bps = bytes_per_sample(binary_reel_header)

~/.local/lib/python3.5/site-packages/segpy/toolkit.py in read_binary_reel_header(fh, binary_reel_header_format, endian)
180 header_packer = make_header_packer(binary_reel_header_format, endian)
181 buffer = fh.read(binary_reel_header_format.LENGTH_IN_BYTES)
--> 182 reel_header = header_packer.unpack(buffer)
183 return reel_header
184

~/.local/lib/python3.5/site-packages/segpy/packer.py in unpack(self, buffer)
198 str(e).capitalize())) from e
199 else:
--> 200 return self._unpack(values)
201
202 @AbstractMethod

~/.local/lib/python3.5/site-packages/segpy/packer.py in _unpack(self, values)
222
223 def _unpack(self, values):
--> 224 return self._header_format_class(*values)
225
226

~/.local/lib/python3.5/site-packages/segpy/header.py in init(self, *args, **kwargs)
29 """
30 for keyword, arg in zip(self.ordered_field_names(), args):
---> 31 setattr(self, keyword, arg)
32
33 for keyword, arg in kwargs.items():

~/.local/lib/python3.5/site-packages/segpy/header.py in set(self, instance, value)
300 except ValueError as e:
301 raise ValueError("Assigned value {!r} for {} attribute must be convertible to {}: {}"
--> 302 .format(value, self._name, self._named_field._value_type.name, e)) from e
303
304 def delete(self, instance):

ValueError: Assigned value 51614 for ensemble_fold attribute must be convertible to NNInt16: NNInt16 value 51614 outside range 0 to 32767

Please if anyone can help me to resolve this.

Looking forward

Confusion about dimensionality detection heuristic

I'm trying to write a test for create_reader's ability to guess the dimensionality of a dataset, but I'm confused about how or if the heuristic for 2D works.

The heuristic says "if there's a cdp_catalog but no line_catalog, then the dimensionality is 2". However, looking at toolkit.catalog_traces(), it seems like we can't ever actually create that situation. In short, in every case where we can create a cdp_catalog, we're also able to create (at least) an alt_line_catalog", and this alt_line_catalog will get used as the line catalog, i.e. a valid cdp_catalog implies a valid line_catalog.

It's entirely possible that I'm missing some important point about catalogs, so let me know if that's the case.

Initial read and subsequently reading headers is slow

Hi there,

I am fairly new to Python and even more to Segpy.
I am using Segpy on VSP dataset and not surface seismic and I find the initial loading fairly slow. (I can't see any reason that VSP and not surface seismic would make much of a difference).
230 seconds roughly for a 600Mb file compared to 30 seconds with Obspy.

Once loaded I find reading one specific header but for all traces as slow.
If I wanted for example to plot all source X and Y coordinates of all traces in the SEGY, I would pull the coordinates with the following code for the X coordinates:
np.array([segy_reader.trace_header(trace_index).source_x for trace_index in segy_reader.trace_indexes()])

Again, this would take not far from 230 seconds when I would have imagined that the headers should be in memory and we wouldn't need to load them again? (With Obspy it's almost instant)

I like Segpy a lot because it's much faster to read the actual samples of a trace than Obspy and it seems much easier to modify the samples and then save it, although I haven't tried it yet.

I would appreciate any help as I'm planning to upscale the use of these code from SEG-Y files with about ~300,000 traces to millions.

Fix invalid license change in `segpy-lite`

Since the segpy-lite fork has no issue tracker, I'm putting this issue in here, in the upstream repo.

In commit d8132ed on the segpy-lite fork of segpy, @whimian changed the license of the fork from AGPL to MIT. This appears to be in violation of section 5 of the AGPL, and we need the license to be changed back to AGPL.

@whimian Was this change made in order to facilitate inclusion in anaconda? We would like to support the inclusion of segpy in anaconda, but we can't allow the license to be changed like this.

"segpy help metadata" output not report expected

The output from --help suggests that I can request help from subcommands:

$ segpy --help
segpy

Use segpy to read data about SEGY files.

Usage: segpy [options] <command> [<args> ...]

Options:
  -h --help     Show this screen.

Available commands:
  metadata

See 'segpy help <command>' for help on specific commands.

However, this doesn't appear to yield anything useful:

$ segpy help metadata
usage: segpy help [<command>]
    Get the top-level help, or help for <command> if specified.

Can not install the package?

To whom it may concern:

I am new to Python and currently working on a project which needs to read SEGY data. I am currently using Anaconda on Windows.
I tried two ways to install it:
1, download the package and unzip the segy.zip file and run : python setup.py install
2, pip install git+https://github.com/sixty-north/segpy.git
after installation, both tries give me the follow message:
.................................................................................
requirement already satisfied

But the real package segpy is not installed actually? I can not find it in my anaconda environment. Did I do something wrong?

Best,

Youli

Test that C++ extension is used when available

It would be nice to have a test verifying that the C++ extension module is used when requested. I think we should be able to use unittest.mock to do this. We would create a mock for the extension functions and then check that they were called when expected. They wouldn't have to actually do anything - and in fact I think we wouldn't even need the extension installed - but we'd only want to check that they were accessed appropriately.

`_cdp_catalog` can be `None` in `SegYReader2D.cdp_numbers`.

In SegYReader2D.cdp_numbers, it's possible for self._cdp_catalog to be None. In this case, the call to self._cdp_catalog.keys() fails with an AttributeError.

It seems to be legitimate that _cdp_catalog is None since CatalogBuilder.create documents that it can return None. By that same token, any catalog might be None, so perhaps we need to check for other places where null catalogs are being dereferenced.

install fails on python 2

I git cloned the code and built it successfully. But when I 'python setup.py install --user', I got errors:
========= errors ====================
File "build/bdist.linux-x86_64/egg/segpy/binary_reel_header.py", line 5
class BinaryReelHeader(metaclass=FormatMeta):
^
SyntaxError: invalid syntax
......
byte-compiling build/bdist.linux-x86_64/egg/segpy/toolkit.py to toolkit.pyc
File "build/bdist.linux-x86_64/egg/segpy/toolkit.py", line 93
raise ValueError("Could not decode data sample format {!r}".format(dsf)) from e
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/datatypes.py to datatypes.pyc
byte-compiling build/bdist.linux-x86_64/egg/segpy/reader.py to reader.pyc
File "build/bdist.linux-x86_64/egg/segpy/reader.py", line 801
print(self._character * (required - existing), end='')
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/segpy/textual_reel_header.py to textual_reel_header.pyc
byte-compiling build/bdist.linux-x86_64/egg/segpy/util.py to util.pyc
File "build/bdist.linux-x86_64/egg/segpy/util.py", line 24
yield from zip(a, b)
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/trace_header.py to trace_header.pyc
File "build/bdist.linux-x86_64/egg/segpy/trace_header.py", line 5
class TraceHeaderRev0(metaclass=FormatMeta):
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/header.py to header.pyc
File "build/bdist.linux-x86_64/egg/segpy/header.py", line 36
.format(keyword, self.class.name)) from e
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/types.py to types.pyc
byte-compiling build/bdist.linux-x86_64/egg/segpy/catalog.py to catalog.pyc
File "build/bdist.linux-x86_64/egg/segpy/catalog.py", line 271
yield from ((i, j) for i in self._i_range for j in self._j_range)
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/dataset.py to dataset.pyc
File "build/bdist.linux-x86_64/egg/segpy/dataset.py", line 4
class Dataset(metaclass=ABCMeta):
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/segpy/writer.py to writer.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying segpy.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying segpy.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying segpy.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying segpy.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying segpy.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying segpy.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/segpy-2.0.0a3-py2.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing segpy-2.0.0a3-py2.7.egg
Removing /home/weiliu/.local/lib/python2.7/site-packages/segpy-2.0.0a3-py2.7.egg
Copying segpy-2.0.0a3-py2.7.egg to /home/weiliu/.local/lib/python2.7/site-packages
segpy 2.0.0a3 is already the active version in easy-install.pth
============ end of errors ================
I'm using python 2.7.6 on Ubuntu 14.04 64 bit. Anything I can try?

Would be great to have a working python library for segy data.

Proposal: switch to pytest for tests

While I'm largely of the opinion at this point that pytest is "just better", that wouldn't be enough reason to switch over. However, I've run into a concrete issue that seems like a good motivation. I would like to be able to run some of the tests using both the Python and (if available) C++ version of of the un/pack routines. The obvious way to do this is with test parameterization.

Unfortunately, test parameterization via subTest() wasn't added to unittest until python 3.4, so our travis test suite is failing on 3.3.

We have a few alternatives, including:

  1. Use a non-parameterized approach, i.e. just brute forcing things a bit
  2. Use the backport packages unittest2
  3. Use pytest

All of these will work, but ultimately I think I'd prefer just switching to pytest.

Make numpy an optional dependency

While I see the point of producing a module that only depends on the standard library, in practice I'm sure that a vast majority of use cases consist of something like: read SEG-Y and return some numpy arrays, or given some numpy arrays write the data in SEG-Y. There is just no way of doing anything remotely related to science or data analysis in python without numpy.
In terms of interface, one then would want something like,

from segpy.reader import create_reader

with open(filename, 'wb') as fh:
    segy_reader = create_reader(filename)

for trace_index in segy_reader_in.trace_indexes():
    segy_reader.trace_samples_numpy(trace_index)  # return numpy ndarray

this could be easily achieved adding a couple of methods to SegYReader, however this is made more difficult by numpy related things being exported to segpy_ext. Of course, one could do,

 try:
      import segpy.ext
 exept ImportError:
      print('Warning: numpy not avalable')
 except:
      raise

in the corresponding method, but then one could do the same thing with numpy and just add it as an optional dependency of the segpy module.

My point is that the fact that we can't use any of the numpy functions, in the main segpy code (even if the function in question is never run when numpy is not present), makes things much more difficult.
What's your opinion about this ?

Can read ibm float but not write it

This library is very useful. One problem I have run into is that it can read ibm float (by converting to ieee) but not write it. Different way to deal with this would be:

  1. Write to ieee, by changing cformat and also the reel_header
  2. Implementing ieee2ibm and writing this way

The work around I am using is adding:

    # Write Data. Cannot write ibm float so use ieee instead.
    if ctype == 'ibm':
        ctype='f'

To line 398 in segypy.py as a dirty hack as I haven't managed to work out how to correctly change the reel header.

Datatypes

Should the keys in LIMITS and PY_TYPES in datatypes.py be 'ibm' and not 'ibm:' ?

Rewrite does not produce identical file

I'm trying to read a SEG-Y files, write it back and read back the result in order to check the consistency with the original file. No errors are raised, and according to segpy, the data and the header contents are identical in both files (tested with this script, however the resulting binary has some small differences, as illustrated on the image below,
untitled
this happens periodically I think for every trace, and the string 0xf4252414b in the original file is allways replaced by zeros 0x000000000. My guess is that a field in the trace header is not being written back as it should, although I have not been able to figure out the exact origin of this problem.

Also I think segpy.header.are_equal does not check all the fields, since it relies on self.ordered_field_names, and that function seems to return only a subset of all the fields in my case (although I might be mistaken and/or missed something).

Note: the script above requires the PR #20 , that fixes an issue in segpy/writer.py.

Packaging

Segpy should be properly packaged and uploaded to PyPI.

Automated tests

It's hard to work on this code confidently in the absence of automated unit tests. We need tests, and we need test data in the form of real-world SEG Y files, although we should also check we can round-trip data through the writer and reader losslessly.

Python 2 Compatibility

I have seen a lot of recent changes, including bringing the rewrite branch into master.
Where are things currently with python 2.7 and 3 support. Should we be expecting the code to still be working on both? or have you moved to supporting 3 only?

I note use of zip_longest over izip_longest for example, with no attempt on backwards compatibility.

Remove the GUI

It's poor separation of concerns to mix include a GUI in what should be a straightforward SEG Y loading / saving package.

I propose to remove the GUI.

Effect of precision on stride calculation and catalog construction

A lot of the savings from from e.g. LinearCatalog comes from detecting regular strides between values. But if the values in a catalog are floating-point values then we need to think about precision problems when calculating stride. As things stand, it looks like we'll never find a regular stride between floating-point values (we're just using pairwise subtraction.) Maybe we never want to lose precision, or maybe we'll never see floating point values. I'm really not sure. But I thought I'd flag the issue.

Progress bar

Hi there,

Is the progress bar really implemented for the reader and writer?
I've looked at the code and it doesn't seem to be much in there.

Thanks.

How to process the massive data?

ValueError: Assigned value 40967 for data_traces_per_ensemble attribute must be convertible to NNInt16: NNInt16 value 40967 outside range 0 to 32767

Is this error caused by the size of data?
how to overcome it and get the right data?

catalog_traces seems have memory leak

When I read a large segy file (~130 GB), I found python memory usage keep increasing (from top command) even at the catalog_traces stage. It once reaches to 30 GB memory usage before I kill the job.

From the code, the catalog_traces read each trace from segy data and then construct a few python data structures for inline/xline numbers etc. I don't have the experience of profiling python memory usage, so I was stuck here.

I can provide more information if it helps.

AGPL

Why use the Affero GPL. This seems a toxic choice to me with several companies banning its use, including the one I work at.

Is it possible to change the license?

Or alternatively, how much is a commercial license and does that remove he AGPL restrictions?

Improve test coverage

We could use better test coverage. I'd like to use this issue a place to keep track of what we need to do. At the very least I'd like a list of areas that could use some work. We could also use this as a meta-issue and link it to separate issues for specific testing efforts.

Small error in segpy-numpy

I believe line 7 of extract.py in segpy-numpy should be:
from segpy_numpy.util import ensure_superset

and not

from segpy.util import ensure_superset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.