ucberkeleyseti / blimpy Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 39.0 6.28 MB

Breakthrough Listen I/O Methods for Python

Home Page: https://blimpy.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 59.12% Shell 0.24% Jupyter Notebook 37.62% Dockerfile 0.10% TeX 2.92%

blimpy's People

Contributors

Stargazers

Watchers

blimpy's Issues

.fil files imported with Filterbank() unable to produce sliced waterfall plots

Using blimpy cloned from the Git repository from May 15th, 2018 (1.19?):

The plot_waterfall() method used on .fil files imported with the Filterbank() method in blimpy is only able to produce waterfall plots that use the full, original frequency range of the imported filterbank file. This can be seen in cell 7 of a recreated voyager tutorial notebook and cell 3 of a recreated filterbank tutorial notebook where the waterfall with the full spectrum is replotted, instead of the requested slices (although the x-axes do update to have the appropriate labels).

Python3 support

J. Wright got filterbank.py working in Python 3, but it required some changes due to how Py3 handles strings. Here's the diff:

151c151
<     f = open(filename, 'r')

---
>     f = open(filename, 'rb')
154c154
<     header_str = ''

---
>     header_str = b""
159,160c159,160
<         if 'HEADER_START' in header_sub:
<             idx_start = header_sub.index('HEADER_START') + len('HEADER_START')

---
>         if b"HEADER_START" in header_sub:
>             idx_start = header_sub.index(b"HEADER_START") + len(b"HEADER_START")
163c163
<         if 'HEADER_END' in header_sub:

---
>         if b"HEADER_END" in header_sub:
165c165
<             idx_end = header_sub.index('HEADER_END')

---
>             idx_end = header_sub.index(b"HEADER_END")
181c181
<     f = open(filename, 'r')

---
>     f = open(filename, 'rb')
188,189c188,189
<         if 'HEADER_END' in header_sub:
<             idx_end = header_sub.index('HEADER_END') + len('HEADER_END')

---
>         if b"HEADER_END" in header_sub:
>             idx_end = header_sub.index(b"HEADER_END") + len(b"HEADER_END")
212c212,213
<     for keyword in header_keyword_types.keys():

---
>     for keyword_str in list(header_keyword_types.keys()):
>         keyword = keyword_str.encode("utf-8")
214c215
<             dtype = header_keyword_types.get(keyword, 'str')

---
>             dtype = header_keyword_types.get(keyword_str, 'str')
216c217
<             dtype = header_keyword_types[keyword]

---
>             dtype = header_keyword_types[keyword_str]
219c220
<                 header_dict[keyword] = val

---
>                 header_dict[keyword_str] = val
222c223
<                 header_dict[keyword] = val

---
>                 header_dict[keyword_str] = val
226c227
<                 header_dict[keyword] = str_val

---
>                 header_dict[keyword_str] = str_val.decode("utf-8")
230,231c231
<                 
<                 if keyword == 'src_raj':

---
>                 if keyword_str == 'src_raj':
235c235
<                 header_dict[keyword] = val                

---
>                 header_dict[keyword_str] = val                
312,313c312,313
<             print dd.shape
<             print n_ints, n_ifs, n_chans

---
>             print(dd.shape)
>             print(n_ints, n_ifs, n_chans)
334c334
<         for key, val in self.header.items():

---
>         for key, val in list(self.header.items()):
339c339
<             print "%16s : %32s" % (key, val)

---
>             print("%16s : %32s" % (key, val))
431c431
<         print plot_data.shape, plot_f.shape

---
>         print(plot_data.shape, plot_f.shape)
487c487
<             print "Start freq: %2.2f" % args.f_start

---
>             print("Start freq: %2.2f" % args.f_start)
491c491
<             print "Stop freq: %2.2f" % args.f_stop

---
>             print("Stop freq: %2.2f" % args.f_stop)
494,495c494,495
<         print "Error: Start and stop frequencies must lie inside file's frequency range."
<         print "i.e. between %2.2f-%2.2f MHz." % (fil.freqs[0], fil.freqs[-1])

---
>         print("Error: Start and stop frequencies must lie inside file's frequency range.")
>         print("i.e. between %2.2f-%2.2f MHz." % (fil.freqs[0], fil.freqs[-1]))

Probably requires some Py2/Py3 conditionals in the code

requantization support

Support for requantization of data into different datatypes.

confirm if unicode keys are ever populated in the header dictionary

In response to pull request UCBerkeleySETI/turbo_seti#10, which has to do extra checks for unicode keywords in the header dictionary.

I think it is probably this function:
https://github.com/UCBerkeleySETI/blimpy/blob/master/blimpy/file_wrapper.py#L383

if six.PY3:
    key = bytes(key, 'ascii')

could be changed to:

if six.PY2:
    key = str(key)
else:
    key = bytes(key, 'ascii')

Why do Filterbank data arrays contain an empty dimension?

This is a very minor inconvenience but I have been wondering about it for a while: why is it that extracting the data array from a Filterbank object with self.data yields a numpy array containing an extra empty dimension?

For example, taking a file with 273 integrations and 65536 channels (typical values for files ending in '0002.fil'), turning it into a Filterbank object, and running self.data yields an array of shape (273, 1, 65536). Why the extra dimension instead of just being (273, 65536)?

I also notice that this does not happen when using the grab_data function. Is there a particular reason for this?

installation weirdnes?

I'm getting this issue with umran's code.

/usr/local/lib/python2.7/dist-packages/astropy/utils/introspection.py:153: UserWarning: Module blimpy was already imported from /opt/pyve/woodpy/norwegian/software/blimpy/blimpy/init.py, but /opt/pyve/woodpy/lib/python2.7/site-packages/blimpy-1.1.0-py2.7.egg is being added to sys.path
from pkg_resources import parse_version

Is time axis flipped in plot_all()?

In the attached, I reckon the burst in the waterfall should line up with the burst in the time vs power plot. So the time vs power on the RHS may need to be flipped?

Importing filterbank breaks matplotlib throughout entire script

All of a sudden, using the latest version of filterbank.py, I am noticing that, when not using X-forwarding, importing the filterbank module (or a class or function therein) messes up the matplotlib backend, resulting in a $DISPLAY error.

(This may seem similar to a previous issue regarding $DISPLAY errors; note that the previous issue occured when filterbank.py was the main program, whereas this issue occurs when importing it as a module)

The problem occurs regardless of whether or not the filterbank module is actually used in one's script, and it occurs even if one attempts to forcibly set a matplotlib backend after importing filterbank.

To see what I mean, try running the following code on either the BL head node or storage node, without X-forwarding:

import filterbank
import random

import matplotlib
matplotlib.use('pdf') #forcibly defining the backend does not fix the problem
import matplotlib.pyplot as plt

x = random.sample(range(1,100), 10)
y = random.sample(range(1,100), 10)
plt.plot(x,y)
plt.savefig('myplot.pdf', format='pdf')

Even though the filterbank module is used nowhere in the script, the import filterbank line breaks matplotlib.

Comment out the import filterbank line and you will notice that the script works as expected.

Your help is greatly appreciated.

not all install dependencies listed

8-bit data should be read as uint8

Describe the bug
Currently data are read as int8 instead of uint8.

To Reproduce
Steps to reproduce the behavior:
Download http://www.stevecroft.co.uk/spliced_blc0001020304050607101112131415161720212223242526273031323334353637_guppi_58001_23486_3C48_0006.gpuspec.8.0001.fil
Run and plot.

Expected behavior
Data should be returned as uint8

Screenshots

python3 issue: _d_type.

We got a python3 issue. From Travis:

        blob_start = self._find_blob_start()
>       blob = np.zeros(updated_blob_dim,dtype=self._d_type)
E       TypeError: 'numpy.float64' object cannot be interpreted as an integer
../../../../virtualenv/python3.5.6/lib/python3.5/site-packages/blimpy/file_wrapper.py:653: TypeError

This to me sounds related to the implementation of _d_type?

Add support for 2-bit filterbanks

2-bit is used by HITRUN archival data. Currently only 8-bit, 16-bit and 32-bit supported. We have some code to unpack 2-bit data to a numpy datatype:
https://github.com/UCBerkeleySETI/blimpy/blob/master/blimpy/utils.py#L45

But this currently isn't actually called anywhere. Adding it in should be relatively straightforward.

Rewrite GuppiRaw() reader to not raise EndOfFileError and use Context Management

Using blimpy cloned from the Git repository from May 15th, 2018 (1.19?):

The GuppiRaw() method in blimpy is unable to import any raw file I have downloaded from the open data search website. Different .raw files produce different errors based on the size of the raw file it is trying to import (the larger files cannot be reshaped, while the smaller file reaches an end of file error).

It may be the case that the chosen files are special and others do work with the method, but it is not clear to me that this is the case.

This can be seen in the following notebook:
https://github.com/christiangil/SETI-Final/blob/master/GuppiRaw_Testing.ipynb

Readthedocs

Add readthedocs!

Create a 2-bit to 8-bit utility per @telegraphic

f = Waterfall('ParkesHTRU.fil', t_start=7800000, t_stop=7850000)
f.info() -> Returns expected values from the header.
f.plot_waterfall() -> May or may not be doing the right thing, but at leas the axis labels look right.
But
f.plot_spectrum()
/opt/anaconda/lib/python3.6/site-packages/blimpy/filterbank.py in plot_spectrum(self, t, f_start, f_stop, logged, if_id, c, **kwargs)
546 plot_data = plot_data.mean()
547 else:
--> 548 raise RuntimeError("Unknown integration %s" % t)
549
550 # Rebin to max number of points
RuntimeError: Unknown integration all

I wonder if this is actually an error in unpacking the 2-bit data? If I try to read the data values I see values like
-4.6255555765569165e+303
6.807427090242793e+301
-1.570574712372553e+108
but plot_waterfall() seems to be working around it. Or is it? Is there a way to get access to the data array and directly imshow it, for example? When I do
d = f.data[:,0,:].transpose()
then d.shape is (1024, 50000), as expected (1024 channels, 50,000 time samples) but I get garbage when trying to imshow(d) because it looks like the 2-bit values aren't unpacked correctly.

Please let me know how to better document this error so that you can see the issue! Thanks.

Are we using a new matplotlib?

/opt/pyve/woodpy/lib/python2.7/site-packages/blimpy-1.1.0-py2.7.egg/blimpy/filterbank.py:709: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
axHeader.set_axis_bgcolor('white')

unable to use command line utilities.

Having the issue of not been able to run the utilities from the command line:

Traceback (most recent call last):
  File "/opt/pyve/eepy/bin/watutil", line 11, in <module>
    load_entry_point('blimpy==1.3.4', 'console_scripts', 'watutil')()
  File "/home/eenriquez/software/bl-soft/blimpy/blimpy/waterfall.py", line 585, in cmd_tool
    fil = Waterfall(filename, f_start=parse_args.f_start, f_stop=parse_args.f_stop, t_start=parse_args.t_start, t_stop=parse_args.t_stop, load_data=load_data, max_load=parse_args.max_load)
  File "/home/eenriquez/software/bl-soft/blimpy/blimpy/waterfall.py", line 120, in __init__
    load_data=load_data, max_load=max_load)
  File "/home/eenriquez/software/bl-soft/blimpy/blimpy/file_wrapper.py", line 770, in open_file
    raise NotImplementedError('Cannot open this type of file with Waterfall')
NotImplementedError: Cannot open this type of file with Waterfall

This is related to this line in waterfalll.py
parse_args = parser.parse_args(args)
there is a stack overflow post on this topic:
https://stackoverflow.com/questions/17118999/python-argparse-unrecognized-arguments

.fil files imported with Filterbank() unable to produce sliced spectrum plots

Using blimpy cloned from the Git repository from May 15th, 2018 (1.19?):

The plot_spectrum() method used on .fil files imported with the Filterbank() method in blimpy is only able to produce spectrum plots that use the full, original frequency range of the imported filterbank file. This can be seen in cells 4 and 5 of a recreated voyager tutorial notebook and cells 6 and 8 of a recreated filterbank tutorial notebook where the spectrum is unable to be plotted due to a dimensional mismatch ("ValueError: x and y must have same first dimension, but have shapes (24L,) and (65536L,)").

Also of note is that once these commands end in an exception, the info() of the variable is modified (cell 7 of the recreated filterbank tutorial notebook) and it has to be reimported.

Add travis-ci testing integration

Needed to check if we break stuff!

tighter astropy integration and better time slicing support

Astropy has ndarray-based classes for time, coordinates, and quantities. We could use these and return things like frequency axis data with units attached.

An extension of this is to allow plotting axes with desired units and formats (e.g. Hz, MHz, MJD or unix time).

We should also allow time slices by index (t_start=0, t_stop=16384) or by other formats (t_start=57767.1234, t_format='MJD') or something like that.

Differences in Python 2 vs. Python 3 str objects cause many errors

I am running Python 3.6.4 and have blimpy 1.1.9 installed. When I try to create a Waterfall object, I get the following error:

from blimpy import Waterfall
obs = Waterfall('voyager_f1032192_t300_v2.fil')

/usr/local/Homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    obs = Waterfall('voyager_f1032192_t300_v2.fil')
  File "/usr/local/Homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/blimpy-1.1.9-py3.6.egg/blimpy/waterfall.py", line 123, in __init__
    self.container = fw.open_file(filename, f_start=f_start, f_stop=f_stop,t_start=t_start, t_stop=t_stop,load_data=load_data,max_load=max_load)
  File "/usr/local/Homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/blimpy-1.1.9-py3.6.egg/blimpy/file_wrapper.py", line 861, in open_file
    return FIL_reader(filename,f_start=f_start, f_stop=f_stop,t_start=t_start, t_stop=t_stop,load_data=load_data,max_load=max_load)
  File "/usr/local/Homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/blimpy-1.1.9-py3.6.egg/blimpy/file_wrapper.py", line 451, in __init__
    self.header = self._read_header()
  File "/usr/local/Homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/blimpy-1.1.9-py3.6.egg/blimpy/file_wrapper.py", line 637, in _read_header
    keyword, value, idx = self._read_next_header_keyword(fh)
  File "/usr/local/Homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/blimpy-1.1.9-py3.6.egg/blimpy/file_wrapper.py", line 601, in _read_next_header_keyword
    dtype = self._header_keyword_types[keyword]
KeyError: b'HEADER_START'

This seems to be a problem with how strings are treated in Python 2 vs. in Python 3. In Python 3, the "keyword" variable is assigned to an object of type "bytes", so when it is compared (on line 598) with the "str" object 'HEADER_START', it doesn't recognize them as being the same when it should. This is just one area where this difference causes problems. http://python3porting.com/preparing.html#richcomparisons

Setting max_load in file_wrapper.py

When setting max_load in file_wrapper equal to anything greater than 1, the following error is thrown:
NameError: global name 'MAX_DATA_ARRAY_SIZE_UNIT' is not defined

.fil files imported with Waterfall() unable to produce sliced waterfall plots

Using blimpy cloned from the Git repository from May 15th, 2018 (1.19?):

The plot_waterfall() method used on .fil files imported with the Waterfall() method in blimpy is only able to produce waterfall plots that use the full, original frequency range of the imported filterbank file. This can be seen in cell 7 of a recreated voyager tutorial notebook and cell 3 of a recreated filterbank tutorial notebook where the waterfall with the full spectrum is replotted, instead of the requested slices.

python setup.py install broken?

It seems I can't do python setup.py install.

$ python setup.py install
Download error on https://pypi.org/simple/pytest-runner/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) -- Some packages may not be found!
Couldn't find index page for 'pytest-runner' (maybe misspelled?)
Download error on https://pypi.org/simple/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) -- Some packages may not be found!
No local packages or working download links found for pytest-runner
Traceback (most recent call last):
  File "setup.py", line 66, in <module>
    test_suite="tests",
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/setuptools/__init__.py", line 130, in setup
    _install_setup_requires(attrs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/setuptools/__init__.py", line 125, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/setuptools/dist.py", line 514, in fetch_build_eggs
    replace_conflicting=True,
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 773, in resolve
    replace_conflicting=replace_conflicting
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1056, in best_match
    return self.obtain(req, installer)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1068, in obtain
    return installer(requirement)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/setuptools/dist.py", line 581, in fetch_build_egg
    return cmd.easy_install(req)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 670, in easy_install
    raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pytest-runner')

Conflict with SIGPROC

There is an annoying collision with sigproc, as sigproc has a file named filterbank.py. If you all ready have sigproc installed then the sigproc/filterbank.py will be imported, and no fun is had. I am not sure what the solution is, maybe:

rename the module
or, note in the documentation that this conflict occurs and then provide a solution

Add relative position to the sun.

Describe the solution you'd like
Given the RA&Decl and time of the observation, calculate the angular distance to the sun (particularly if the observation is during the day).

Additional context
This is an important to know to calculate the scintillation effects caused by the IPM.

setup requirements

Could you please list the external module requirements and the preferred version/setup method?

They seem to be h5py, Cython, and bitshiffle. bitshuffle 0.2.4 via pip does not seem to compile on ubuntu.

Add support for 'tim' filterbank files

Time series files in sigproc, such as the output of dedisperse, are a type of filterbank file, data_type=2 in the header. The main difference is that they are frequency collapsed so there is no number of channels (nchans) or channel bandwidth (foff).

Using Filterbank() to load a tim filterbank file fails with:

In [1]: from filterbank import Filterbank

In [2]: Filterbank('Beam6_fb_D20150907T194703.buffer24.d32.dd262.tim')

KeyError Traceback (most recent call last)
in ()
----> 1 Filterbank('Beam6_fb_D20150907T194703.buffer24.d32.dd262.tim')

/usr/local/lib/python2.7/dist-packages/filterbank-1.0.0-py2.7.egg/filterbank/filterbank.pyc in init(self, filename, f_start, f_stop, t_start, t_stop, load_data, header_dict, data_array)
475 self.read_hdf5(filename, f_start, f_stop, t_start, t_stop, load_data)
476 else:
--> 477 self.read_filterbank(filename, f_start, f_stop, t_start, t_stop, load_data)
478 else:
479 self.read_filterbank(filename, f_start, f_stop, t_start, t_stop, load_data)

/usr/local/lib/python2.7/dist-packages/filterbank-1.0.0-py2.7.egg/filterbank/filterbank.pyc in read_filterbank(self, filename, f_start, f_stop, t_start, t_stop, load_data)
549 ## Setup frequency axis
550 f0 = self.header['fch1']
--> 551 f_delt = self.header['foff']
552
553 # keep this seperate!

KeyError: 'foff

Fix $DISPLAY errors when operating over SSH without X-forwarding

Add optional argument that inserts

import matplotlib
matplotlib.use('Agg')

before importing pylab

Allow use of software without installation.

Can we leave the try except option?
The except was there for the cases where people want to use blimpy without installing it, by just adding it to their PYTHONPATH.

So, can we change this last changes:

- try:
-    from .filterbank import Filterbank
-    from . import file_wrapper as fw
-    from .sigproc import *
- except:
-    from filterbank import Filterbank
-    import file_wrapper as fw
-    from sigproc import *
+ from blimpy.filterbank import Filterbank
+ from blimpy import file_wrapper as fw
+ from blimpy.sigproc import *

for this?

try:
   from blimpy.filterbank import Filterbank
   from blimpy import file_wrapper as fw
   from blimpy.sigproc import *
except:
    from filterbank import Filterbank
    import file_wrapper as fw
    from sigproc import *

add fine cuts to bldice

Is your feature request related to a problem? Please describe.
Not a problem, but a limitation.
Currently bldice cuts to the nearest coarse channel end. This makes for us, but maybe not for the rest of the world.

Describe the solution you'd like
Having an option to cut at any frequency maybe useful for some.

need test script for command line options.

Need a test script to check the command line tools. I could create a bash file for this, but how to add it to run_tests.sh ?
suggestions @telegraphic @gijzelaerr ?

Bug with h52fil

Trying to convert a filterbank file to hdf5 at GB using blimpy 1.2.1 (though I tried 1.2.0 and 1.1.8). And I get this error:

h52fil /mnt_bls4/datax3/holding/spliced_blc0001020304050607_guppi_57708_17466_HIP19335_0003.gpuspec.0002.h5 -o /datax3/users/sci -n filfromf5.fil
Traceback (most recent call last):
File "/opt/pyve/eepy/bin/h52fil", line 11, in
load_entry_point('blimpy==1.2.1', 'console_scripts', 'h52fil')()
File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.2.1-py2.7.egg/blimpy/h52fil.py", line 60, in cmd_tool
make_fil_file(filename, out_dir = opts.out_dir, new_filename=opts.new_filename)
File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.2.1-py2.7.egg/blimpy/h52fil.py", line 41, in make_fil_file
fil_file.write_to_fil(new_filename)
File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.2.1-py2.7.egg/blimpy/waterfall.py", line 227, in write_to_fil
self.__write_to_fil_light(filename_out)
File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.2.1-py2.7.egg/blimpy/waterfall.py", line 274, in __write_to_fil_light
fileh.write(generate_sigproc_header(self)) #generate_sigproc_header comes from sigproc.py
File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.2.1-py2.7.egg/blimpy/sigproc.py", line 375, in generate_sigproc_header
header_string += to_sigproc_keyword(keyword, f.header[keyword])
File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.2.1-py2.7.egg/blimpy/sigproc.py", line 350, in to_sigproc_keyword
return np.int32(len(keyword)).tostring() + keyword + value_dtype(value).tostring()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 4: ordinal not in range(128)

Not sure how to proceed... I was told to create this issue...

int32 vs uint32

Need to check if this would make sense (it does for LOFAR data).

         if dtype == 'str':

           str_len = np.fromstring(fh.read(4), dtype='int32')[0]

           str_len = np.fromstring(fh.read(4), dtype='uint32')[0]

github releases missing?

Or, how can I find out which released version on pypi corresponds with which commit in the github repo?

Imshow on 8-bit data

Describe the bug
8-bit data will not plot via imshow (matplotlib doesn't like it).

To Reproduce
Steps to reproduce the behavior:

Load an 8-bit file
try plot_all or plot_waterfall

Expected behavior
Not to crash, and to plot

read_header not working with .h5 files

Hi folks,

The older version of blimpy allowed .fil files to have their headers read with the read_header function. Are the .h5 files supposed to be compatible with this function as well? It's not clear from read_header's docstring or from my perusing of the source code for the past several minutes.

I ask because I am getting errors when attempting to use this function on .h5 files. For example, attempting to call read_header on any of the files in bls0:/mnt_bls5/datax3/holding.Lband.692.0001020304050607/ gives me the following error:

/opt/pyve/sci/local/lib/python2.7/site-packages/blimpy/sigproc.pyc in read_header(filename, return_idxs)
    230 
    231         # Check this is a blimpy file
--> 232         keyword, value, idx = read_next_header_keyword(fh)
    233 
    234         try:

/opt/pyve/sci/local/lib/python2.7/site-packages/blimpy/sigproc.pyc in read_next_header_keyword(fh)
    193         return keyword, 0, fh.tell()
    194     else:
--> 195         dtype = header_keyword_types[keyword]
    196         #print dtype
    197         idx = fh.tell()

KeyError: '\r\n\x1a\n\x00\x00\x00\x00\x00\x08\x08\x00\x04\x00\x10\x00'

.fil files imported with Waterfall() unable to produce sliced spectrum plots

Using blimpy cloned from the Git repository from May 15th, 2018 (1.19?):

The plot_spectrum() method used on .fil files imported with the Waterfall() method in blimpy is only able to produce spectrum plots that use the full, original frequency range of the imported filterbank file. This can be seen in cells 4 and 5 of a recreated voyager tutorial notebook and cells 6 and 7 of a recreated filterbank tutorial notebook where the full spectrum is replotted, instead of the requested slice.

ValueError: Unable to create dataset (error during user callback) for python3

test_fil2h5py crashes, by using the latest version of h5py. Shown in this pip freeze:

astropy==2.0.4
attrs==17.4.0
backports.functools-lru-cache==1.5
bitshuffle==0.3.4
blimpy==1.3.4
cycler==0.10.0
Cython==0.28.1
funcsigs==1.0.2
h5py==2.8.0
matplotlib==2.1.2
numpy==1.14.2
pandas==0.22.0
pluggy==0.6.0
py==1.5.2
pyparsing==2.2.0
pytest==3.4.1
python-dateutil==2.6.1
pytz==2018.3
scipy==1.0.0
six==1.11.0
subprocess32==3.2.7
turbo-seti==0.7.3

this is the error message:

(eepy) eenriquez@blh0:~/software/bl-soft/blimpy/tests$ python test_fil2h5.py 
HDF5-DIAG: Error detected in HDF5 (1.8.16) thread 139787481073408:
  #000: ../../../src/H5Pocpl.c line 1102 in H5Pget_filter_by_id2(): can't find object for ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
  #001: ../../../src/H5Pint.c line 3381 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Unable to register new atom
  #002: ../../../src/H5Pint.c line 3331 in H5P_isa_class(): not a property list
    major: Invalid arguments to routine
    minor: Inappropriate type
Traceback (most recent call last):
  File "test_fil2h5.py", line 26, in <module>
    test_fil2h5_conversion()
  File "test_fil2h5.py", line 16, in test_fil2h5_conversion
    bl.fil2h5.make_h5_file('Voyager_data/Voyager1.single_coarse.fine_res.fil', new_filename = 'test.h5')
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/fil2h5.py", line 44, in make_h5_file
    fil_file.write_to_hdf5(new_filename)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/waterfall.py", line 302, in write_to_hdf5
    self.__write_to_hdf5_light(filename_out)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/waterfall.py", line 431, in __write_to_hdf5_light
    compression_opts=bs_compression_opts)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/h5py/_hl/group.py", line 116, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
    dset_id = h5d.create(parent.id, None, tid, sid, dcpl=dcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 79, in h5py.h5d.create
ValueError: Unable to create dataset (error during user callback)

I get test_fil2h5py to work, by using this h5py==2.5.0 instead.

Plus fixing this error by directly editing /opt/pyve/eepy/local/lib/python2.7/site-packages/h5py/_hl/dataset.py as suggested below. Oddly, this hackable issue I only have in the GBT cluster, but not in my laptop.

(eepy) eenriquez@blh0:~/software/bl-soft/blimpy/tests$ python test_fil2h5.py 
blimpy.waterfall INFO     Conversion time: 0.18sec
blimpy.file_wrapper WARNING  Selection size of 0.06 GB, exceeding our size limit 0.00 GB. Instance created, header loaded, but data not loaded, please try another (t,v) selection.
blimpy.waterfall INFO     Detecting high frequency resolution data.
Traceback (most recent call last):
  File "test_fil2h5.py", line 26, in <module>
    test_fil2h5_conversion()
  File "test_fil2h5.py", line 19, in test_fil2h5_conversion
    bl.fil2h5.make_h5_file('Voyager_data/Voyager1.single_coarse.fine_res.fil', new_filename = 'test_large.h5', max_load = 0.001)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/fil2h5.py", line 44, in make_h5_file
    fil_file.write_to_hdf5(new_filename)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/waterfall.py", line 300, in write_to_hdf5
    self.__write_to_hdf5_heavy(filename_out)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/waterfall.py", line 339, in __write_to_hdf5_heavy
    dtype=self.data.dtype)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/h5py/_hl/group.py", line 103, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/opt/pyve/eepy/local/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 70, in make_new_dset
    if isinstance(chunks, tuple) and (-numpy.array([ i>=j for i,j in zip(tmp_shape,chunks) if i is not None])).any():
TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~` operator or the logical_not function instead.

This is related to #27

Read header without loading entire file?

I'm noticing that using Filterbank to extract header parameters takes significantly longer on larger .fil files than on smaller ones. I'm using the Filterbank class to read header information from spliced 0002-resolution .fil files (all 8 files spliced together) and I've noticed that, for example, extracting fch1 takes between 0.5 and 0.6 seconds per file. In contrast, extracting fch1 from a smaller file, i.e. a non-spliced file of the same frequency resolution, takes about 0.1 seconds.

Am I correct in suspecting that if one desires to extract header information without interacting with the data, the speed should be independent of the size of the data part of the file? If so, the difference in speed suggests the entire file is being read before the header information is returned. Is there any way of getting around this?

Add support for filterbank/hdf5 files with names that don't end in .fil or .h5

Currently blimpy errors out (in a confusing manner) if the filename doesn't end in .fil or .h5, even if it's a filterbank or hdf5 file. Maybe have a filetype command switch?

dependencies

Should describe clearly in the README which dependencies are needed to install/use blimpy.

Need option to forego the real-time displaying of plots

Add optional argument to deactivate the plt.show() at the end

unable to run test scripts directly, needed the voyager data.

This new issue is related to the new tests/data.py. Do we need to add the Voyager_data folder to setup.py somehow?

$ python test_h52fil.py 

Bad key "ckend" on line 1 in
/Users/jeenriquez/.matplotlib/matplotlibrc.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/tests/Voyager_data/Voyager1.single_coarse.fine_res.h5
Traceback (most recent call last):
  File "test_h52fil.py", line 29, in <module>
    test_h52fil_conversion()
  File "test_h52fil.py", line 16, in test_h52fil_conversion
    bl.h52fil.make_fil_file(voyager_h5, new_filename='test.fil')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/h52fil.py", line 41, in make_fil_file
    fil_file = Waterfall(filename, max_load = max_load)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/waterfall.py", line 125, in __init__
    load_data=load_data, max_load=max_load)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/blimpy/file_wrapper.py", line 753, in open_file
    raise IOError("No such file or directory: " + filename)
IOError: No such file or directory: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/blimpy-1.3.4-py2.7.egg/tests/Voyager_data/Voyager1.single_coarse.fine_res.h5

plot in velocity relative to LSR

Add an option to plot frequency in velocity units relative to LSR

Universal read_header function

As discussed in #60, it may be useful to have a universal read_header() function that works on all file types. Starting this issue for discussion, this certainly relates to the roadmap discussion #38 but could be implemented pretty easily.

filterbank.py broken

When running filterbank.py I get the following error, previous versions of the script run fine on the file.

Traceback (most recent call last):
File "/home/obs/sw/filterbank/filterbank.py", line 893, in
fil.info()
File "/home/obs/sw/filterbank/filterbank.py", line 634, in info
for key, val in self.header.items():
AttributeError: 'Filterbank' object has no attribute 'header'

Create frequently asked questions page on readthedocs

Once we have readthedocs (#64), we should have an FAQ for common installation problems (see e.g. #49).

Unable to plot using filterbank

I'm running blimpy from the command line trying to plot a filterbank file using the following command:
python /path/to/filterbank.py

I get the error "ValueError: Attempted relative import in non-package". I have had some success by going into my cloned version of blimpy and taking out dots (".") that are in front of many functions. I believe this is an issue with the latest version of blimpy.

Blimpy roadmap discussion

Hi all, I wanted to start a thread on future direction of blimpy. I have been refactoring the code in the last week, but this does not add any functionality or change the user-level API. Given that this is a heavily used piece of code now, we should discuss together how we continue development. We should make sure that bugs like the ones @christiangil found don’t arise again!

Here are some open questions about the future path:

1. Do we want stability, features, or both?

We could decide to ‘freeze’ blimpy, and make other functionality different packages. Or, we could prefer features, but adding features may break stuff. Having both requires more diligence, code reviews and overall collaboration.

2. Should we stop spending time improving blimpy?

Apart from bugfixes, is blimpy ‘good enough’, and should we put our effort into different goals?

3. Design philosophy

As packages grow, they inevitably get messier. We should decide on what design patterns we want to adopt, and approaches to clean development. I personally don’t like giant files with thousands of lines, and like to split things up by functionality. But this isn’t necessarily the way to go.

For example, we could write our code like this:

class ThingDoer(object):
    def load_data(filename):
        [code to load data here]
    def dedisperse(self, dm):
        [code to dedisperse here]
    def find_et(self)
        [code to find ET here]

d = ThingDoer()
d.dedisperse(dm=521)
d.find_et()

I quite like being able to do things like d.dedisperse()! But we could equally have three files:

/thing_doer_pkg/
  |- thing_doer.py
  |- dedisperse.py
  |- find_et.py

and split the code into three files:

# thing_doer.py
class ThingDoer(object):
    def load_data(filename):
        [code to load data here]

# dedisperse.py
def dedisperse(thing_doer_object, dm):
    [code to dedisperse here]

# find_et.py
def find_et(thing_doer_object):
    [code to dedisperse here]

I think the ‘separation of concerns’ approach to software engineering would favor the second, but there is a lot to like about the first. I note with numpy you can do:

a = np.array([1,2,3])
m0 = a.mean()
m1 = np.mean(a)
# m0 and m1 are identical

I probably favor the numpy approach, assuming that a.mean() and np.mean actually just calls the same code.

4. Do we still need Filterbank()?

blimpy started off as just one file, a filterbank reader. The Waterfall class improves file handling, and has all the same features (it currently subclasses Filterbank). Do we need both still, or do we move toward Waterfall only?

5. Tests and coverage

I would like to keep Travis CI for basic unit tests and add code coverage tests with coveralls.io. But we need to write good unit tests! And I don’t think Travis is good for testing really large files, so maybe we need to set something up like Jenkins. Or is this wasted development effort?

6. Should we redesign to support other formats?

Currently we haven’t really abstracted away the concept of a filterbank: we still rely on headers like foff and fch1. What if we wanted to support FITS files, .ar files, etc? Do we have a separate class for each, or make Waterfall handle this?

7. Py2 and Py3 support

This is painful, but I think for now we should try and support both. There are new features like data classes that would completely break Py2 support but might be nice longer term. Is this (or other Py3 features) compelling enough to dump Py2?

8. API and speed vs sigpyproc

Gerry and I did some speed tests and found that sigpyproc was about 2x faster loading filterbank data than blimpy. But this has a very different API, loads integration by integration (it can’t slice), and can’t read HDF5. It was also harder to install – although I’ve tried to fix this somewhat. What is there to learn from sigpyproc, and do we want to merge some functionality? For example, the dedisperser code? Also, do we want to use a similar design where we offload some parts of the code to C, and wrap it in Python?

ucberkeleyseti / blimpy Goto Github PK

blimpy's People

Contributors

Stargazers

Watchers

Forkers

blimpy's Issues

In [2]: Filterbank('Beam6_fb_D20150907T194703.buffer24.d32.dd262.tim')

1. Do we want stability, features, or both?

2. Should we stop spending time improving blimpy?

3. Design philosophy

4. Do we still need Filterbank()?

5. Tests and coverage

6. Should we redesign to support other formats?

7. Py2 and Py3 support

8. API and speed vs sigpyproc

Recommend Projects

Recommend Topics

Recommend Org

Jobs