dasdae / dascore Goto Github PK

View Code? Open in Web Editor NEW

69.0 9.0 16.0 2.5 MB

A python library for distributed fiber optic sensing

License: Other

Python 99.46% HTML 0.07% Jupyter Notebook 0.44% Dockerfile 0.03%

distributed-acoustic-sensing geophysics python

dascore's Introduction

DASCore

A python library for distributed fiber optic sensing.

Code

Documentation [stable, development]

dascore's People

Contributors

Stargazers

Watchers

Forkers

danesnick spri902 alan204567 seunghookim shawnboltz andresp-wave prateekvyas1996 code-cullison ahmadtourei sonyida zhang-cugb mdbarin camillatrodrigues yuxingxing421 jvernierfr

dascore's Issues

Description

Currently the unit display is a bit overly verbose. For example, "meter" rather than "m" is displayed in plots and such. This SO post shows how to configure pint to be more concise.

support for datetime and timedelta

Description

DASCore currently doesn't support python datetime or timedelta objects. It should.

Example

from datetime import datetime

import numpy as np

import dascore as dc

py_dt = datetime.now()
py_td = datetime.now() - py_dt

# these both raise an error
dc.to_datetime64(py_dt)
dc.to_timedelta64(py_td)

# they should also work with lists
dc.to_datetime64([py_dt] * 10)
dc.to_timedelta64([py_td] * 10)

# and arrays
dc.to_datetime64(np.array([py_dt] * 10))
dc.to_timedelta64(np.array([py_td] * 10))

Description

We need to fix a few things in the Patch documentation

type hints which use numpy's ArrayLike are inscrutable
The parameter table should support indentation.

Need to improve docs on adding examples in docstrings

Description
It isn't as clear as it should be how to add docstring examples.

to_timedelta64 doesn't handle negative numbers correctly

Description
I think to_timedelta64 should return a negative timedelta64 when provided with a negative input, but currently it doesn't.

import dascore as dc

assert dc.to_timedelta64(-0.1) == -dc.to_timedelta64(0.1)  # currently fails

Broken cross references in docs

Description
The cross references currently don't work (e.g., spool's doc page).

I think this is because the back ticks are replaced with "%60" but never replaced again with "`", should be an easy fix but may need to modify the regex in the build docs script.

dropna

We need a Patch.dropna function, which should simply be based on pandas.dropna or, easier yet, xarray's dropna

Wiggle plot Index Error

Description

When plotting a Patch with one trace using the wiggle plot an IndexError is raised.

To Reproduce

import dascore as dc

patch = dc.examples.sin_wave_patch(
    sample_rate=1000, 
    frequency=10, 
    channel_count=1,
)

patch.viz.wiggle(show=True)

Expected behavior

The wiggle plot should still plot.

Patch history elongated from spool.select

Description

When using spool.select to trim a dimension, the select... string which should be added to the history attribute is spaced out such that each letter is an entry.

Example

spool = (
    dc.get_example_spool("diverse_das")
    .select(distance=(100, 200))
)
print(spool[0].attrs.history)
# 's','e','l', ....

Expected behavior

The select string should be a single entry.

Versions

OS [e.g. Ubuntu 20.04]: Ubuntu 20.04
DasCore Version [e.g. 0.0.5]: 0.0.10
Python Version [e.g. 3.10]: 3.11

spool works in pip install but not in developer install

This code worked well with pip install.

dacore.read with specified file format still calls get_format

Description

While working on implementing IO support we found that a FiberIO that didn't have a get_format method implemented didn't work with dc.read(path, format_name) because the get_format logic was still being called. We need to look into why this is and make sure when the file_format is specified get_format is not needed.

Example

Expected behavior

Versions

OS [e.g. Ubuntu 20.04]:
DasCore Version [e.g. 0.0.5]:
Python Version [e.g. 3.10]:

Consistency with units of dimensional selection

There is some inconsistency in units when specifying a range with a dimension name. For example,

import dascore as dc
patch = dc.get_example_patch()

filtered = patch.pass_filter(time=(t1, t2))
filtered - patch.pass_filter(distance=(d1, d2))

Here, should t1 be in Hz or seconds? Should d1 be in m (wavelength) or 1/m (wave number)? Is this consistent with select?

e.g.,

sub_patch = patch.select(time=(t1, t2))

I propose the following rules for using dimension name to specify inputs to functions:

Values always have the same units as the specified dimension.
Units can be inversed with an underscore.

so,

filtered_1 = patch.pass_filter(time=(1, 10))  # filters from 1 to 10 seconds
filtered_2 = patch.pass_filter(time_=(1, 10)) # filters from 1 to 10 Hz

Unfortunately this will break some existing codes but I think having the consistency is worth it. We are still in the 0.0.x version range do still have a big warning that things are rapidly changing after all ;)

interpolate function only takes datetime64[ns]

any other datetime64 precision will create an error due to the to_number method.

Error is creating Time Series from DAS data

Description
The goal is to create time series from the recorded data during multiphase testing. The code is set to run and read each second of the DAS data and calculate some parameters. At some point in the calculation, it stops and doesn't complete going through the whole data.

To Reproduce

bgtime = np.datetime64('2022-05-12 16:04:45')
dt = np.timedelta64(1,'s')

freq_bands = [[1,10],[10,100],[100,500],[500,1000],[1000,5000]]
FBEs = [[] for i in freq_bands] #to create a list of a list
timestamps = []
current_time = bgtime
for i in range(14400):
    gjsignal.print_progress(i)
    try:
        data1 = spool.select(time=(current_time,current_time+dt))
        dataa = dascore.utils.patch.merge_patches(data1)[0]
        DASdata1 = Data2D_XT.Patch_to_Data2D(dataa)
        DASdata1.apply_gauge_length(3)  # apply a gauge length 3x channel spacing
        DASdata1.select_depth(50,100)
        f,amp = spectrum_analysis(DASdata1)
        for ifreq in range(len(freq_bands)):
            FBEs[ifreq].append(get_FBE(f,amp,freq_bands[ifreq][0],freq_bands[ifreq][1]))
        timestamps.append(current_time)
    except:
        print('there is an error')
    current_time += dt

Expected behavior
It is expected to go through the whole data set and create the time series over the four hours of recorded data.

Versions (please complete the following information):

OS: [Windows 10 Pro 19045.2130]
DasCore Version [Not sure]
Python Version [3.9]

spool get_content output should be sorted by time

Currently the get_content() output is random in time:

Expected behavior
spool get_content output should be sorted by time

Restrict distribution contents

Description

We need to restrict the distribution contents to only the necessary files/folders. For example, we shouldn't include the docs/.github directories. This can be done with a simple manifest.ini file. Here is an example.

Non-monotonic time array from terra15 files

Discussed in #65

^{Originally posted by jinwar November 1, 2022}
Current error messages for fetching a time section without data are not very clear.

velocity_to_strain_rate doesn't update units

Description

@jinwar found out today, during a DASCore presentation, that velocity_to_strain_rate updates the data_type, but does not update the associated units. It will be best to fix this in thepatch_refactor branch which implements more support for units.

Path doesn't exist message

import dascore as dc

sp = dc.spool("path/that/doesnt/exist.hdf5")

raises an error that isn't helpful. Something like "couldn't get spool from path/that/doesnt/exist". A more informative error could be raised, such as "path doesn't exist", or, to not assume the input to spool is always a path (it could be a URL in the future), perhaps we just append "if it is a path, it doesn't exist" or something along those lines.

Terra15 file without GPS_TIME array

We have a terra15 strain-rate file which only has the posix_time rather than the gps_time array. It shouldn't be hard, but we need to add logic to the parser to use posix_time if gps_time doesn't exist so this file can be read.

Silixa iDAS format - IO

Description

I tested the ProdML file format in the tests/test_io/test_prodml directory using pytest. The test_prodml_v2_0 and test_prodml_v2_1 passed, but the test_prod_ml did not pass.
Below I added the errors regarding testing this on the example data and my Silixa iDAS dataset.

1. The example data (idas_h5_example_path)

E dascore.exceptions.UnknownFiberFormat: Could not determine file format of /home/ahmadtourei/.cache/dascore/0.0.0/iDAS005_hdf5_example.626.h5

dascore/io/core.py:504: UnknownFiberFormat
================================================ short test summary info =================================================
ERROR tests/test_io/test_prodml/test_prod_ml.py::TestSilixaFile::test_read_silixa - dascore.exceptions.UnknownFiberFormat: Could not determine file format of /home/ahmadtourei/.cache/dascore/0.0.0/iDAS...

2. My Silixa iDAS dataset

E IndexError: index of [0] is out of bounds for spool.

dascore/core/spool.py:158: IndexError
==================================================== short test summary info ====================================================
ERROR tests/test_io/test_prodml/test_prod_ml.py::TestSilixaFile::test_read_silixa - IndexError: index of [0] is out of bounds for spool.

Example

Expected behavior

Versions

OS [e.g. Ubuntu 20.04]: Ubuntu 22.04.2 LTS
DasCore Version [e.g. 0.0.5]: 0.0.11.dev8+gd2d0107
Python Version [e.g. 3.10]: 3.11.3

dc.spool(path)[0] try to read deleted file

When using dc.spool(path)[0], it returns
FileNotFoundError: [Errno 2] No such file or directory: '/Users/rosie/Documents/aceffl/DAS/01_Raw/DAS_P10/SM_BriscoeC3339H_CF_P10_UTC_20211224_000909.913.tdms'

which is a file I deleted... the full error message is below:

`---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [10], in <cell line: 5>()
1 # path = fileloc
2
3 # pa = dc.read(path)[0]
4 # or
----> 5 pa = dc.spool(fileloc)[0]
6 pa

File /opt/anaconda3/lib/python3.9/site-packages/dascore/core/spool.py:123, in DataFrameSpool.getitem(self, item)
122 def getitem(self, item):
--> 123 out = self._get_patches_from_index(item)
124 # a single index was used, should return a single patch
125 if not isinstance(item, slice):

File /opt/anaconda3/lib/python3.9/site-packages/dascore/core/spool.py:158, in DataFrameSpool._get_patches_from_index(self, df_ind)
156 raise IndexError(msg)
157 joined = df1.join(source.drop(columns=df1.columns, errors="ignore"))
--> 158 return self._patch_from_instruction_df(joined)

File /opt/anaconda3/lib/python3.9/site-packages/dascore/core/spool.py:168, in DataFrameSpool._patch_from_instruction_df(self, joined)
165 for patch_kwargs in df_dict_list:
166 # convert kwargs to format understood by parser/patch.select
167 kwargs = _convert_min_max_in_kwargs(patch_kwargs, joined)
--> 168 patch = self._load_patch(kwargs)
169 select_kwargs = {
170 i: v for i, v in kwargs.items() if i in patch.dims or i in patch.coords
171 }
172 out_list.append(patch.select(**select_kwargs))

File /opt/anaconda3/lib/python3.9/site-packages/dascore/clients/dirspool.py:124, in DirectorySpool._load_patch(self, kwargs)
122 def _load_patch(self, kwargs) -> Self:
123 """Given a row from the managed dataframe, return a patch."""
--> 124 patch = dc.read(**kwargs)[0]
125 return patch

File /opt/anaconda3/lib/python3.9/site-packages/dascore/io/core.py:342, in read(path, file_format, file_version, time, distance, **kwargs)
336 file_format, file_version = get_format(
337 path,
338 file_format=file_format,
339 file_version=file_version,
340 )
341 formatter = FiberIO.manager.get_fiberio(file_format, file_version)
--> 342 return formatter.read(
343 path, file_version=file_version, time=time, distance=distance, **kwargs
344 )

File /opt/anaconda3/lib/python3.9/site-packages/dascore/io/tdms/core.py:79, in TDMSFormatterV4713.read(self, path, time, distance, **kwargs)
67 def read(
68 self,
69 path: Union[str, Path],
(...)
72 **kwargs
73 ) -> dc.BaseSpool:
74 """
75 Read a silixa tdms file, return a DataArray.
76
77 """
---> 79 with open(path, "rb") as tdms_file:
80 # get time array. If an input isn't provided for time we return everything
81 if time is None:
82 time = (None, None)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/rosie/Documents/aceffl/DAS/01_Raw/DAS_P10/SM_BriscoeC3339H_CF_P10_UTC_20211224_000909.913.tdms'`

coordinate dependent dimension order

Description

Currently the order of the coordinate dictionary can affect the expected dimensions. This means when passing the combination of data, coord to Patch.new it can work if the order of the coord dict is right, otherwise it will raise.

This is due to this line in Patch.new.

Example

import dascore as dc

patch = dc.get_example_patch()

axis = patch.dims.index("time")
data = np.std(patch.data, axis=axis, keepdims=True)

new_time = patch.coords["time"][0:1]
new_dist = patch.coords["distance"]
coords_1 = {"time": new_time, "distance": new_dist}
coords_2 = {"distance": new_dist, "time": new_time}

# One of these works, the other doesnt
out_1 = patch.new(data=data, coords=coords_1)
out_2 = patch.new(data=data, coords=coords_2)

Expected behavior

The coord dict order shouldn't affect the patch construction.

Versions

OS [e.g. Ubuntu 20.04]:
DasCore Version [e.g. 0.0.5]:
Python Version [e.g. 3.10]:

Incorrect usage of keyword order in velocity_to_strain_rate

Description

Currently, velocity_to_strain_rate uses the order parameter of findiff incorrectly. Although the docs describe that parameter as the order of the stencil for calculaing the first derivative, it is actually the order of the derivative (e.g, 2 means the second derivative, not a stencil with accuracy of 2 cells).

Example

# patch from terra15 data
ok = patch.tran.velocity_to_strain_rate()  # works fine since default order is 1
wrong = patch.tran.velocity_to_strain_rate(order=4)  # actually 4th derivative

Issue with merging patches with single trace

I'm trying to merge a spool of single trace patches, with 1 second interval for 1 minute of data. It seems something is wrong with the chunk merge.
Here is the spool before merging:

here is the spool after merging:

Time axis label in waterfall plots

Description

The time axis label in the waterfall plot should just read "time" not "time(s)". This should also work for other dimensions not named "time" but are datetime64 types.

Patch.new dim mix up

Currently the following raises an error:

import dascore as dc

patch = dc.load_example_patch()
amp = patch.tran.rfft().abs()

because the Patch.new method can mix up the dimensions when the coordinate dict is out of order.

Saving index file in another path

Hello,

I wonder if I can save the index file to a specified location, instead of in the folder. I'm working on a cluster and writing to the data folder needs more permission.

Thank you!
Rosie

cleanup CI

I saw a great blogpost about how to improve github actions for python projects.

We should adopt these practices, particularly implement caching for downloaded test files.

developer installation does not work

the procedure does not work anymore
https://dasdae.github.io/dascore/markdown/contributing/dev_install.html

Error: File "setup.py" not found.

GH-action deployment

GH now supports deploying to GH-pages with a simple GH action. This is a reminder to switch over to it to (hopefully) simplify doc builds.

Example is here: https://github.com/actions/starter-workflows/blob/main/pages/static.yml

Documentation build_api_docs.py

After installing quarto, I had an error when trying to build the api docs.

OS: Mac 11.6.5

(dascore) eileenmartin@csm-wl-dhcp-197-90 dascore % where quarto
/usr/local/bin/quarto
(dascore) eileenmartin@csm-wl-dhcp-197-90 dascore % ls
dascore environment.yml scripts
dascore.egg-info pyproject.toml setup.cfg
docs readme.md tests
(dascore) eileenmartin@csm-wl-dhcp-197-90 dascore % cd scripts
(dascore) eileenmartin@csm-wl-dhcp-197-90 scripts % ls
_index_api.py _render_api.py _templates build_api_docs.py
(dascore) eileenmartin@csm-wl-dhcp-197-90 scripts % python build_api_docs.py
Traceback (most recent call last):
File "/Users/eileenmartin/dascore/scripts/build_api_docs.py", line 10, in
data_dict = parse_project(dascore)
File "/Users/eileenmartin/dascore/scripts/_index_api.py", line 196, in parse_project
traverse(obj, data_dict, base_path)
File "/Users/eileenmartin/dascore/scripts/_index_api.py", line 171, in traverse
traverse(mod, data_dict, base_path)
File "/Users/eileenmartin/dascore/scripts/_index_api.py", line 171, in traverse
traverse(mod, data_dict, base_path)
File "/Users/eileenmartin/dascore/scripts/_index_api.py", line 169, in traverse
data_dict[obj_id] = get_data(obj, key, base_path, parent_is_class)
File "/Users/eileenmartin/dascore/scripts/_index_api.py", line 136, in get_data
data = extract_data(obj, parent_is_class)
File "/Users/eileenmartin/dascore/scripts/_index_api.py", line 111, in extract_data
data["short_description"] = docstr.split("\n")[0]
AttributeError: 'NoneType' object has no attribute 'split'

Inconsistencies in decimate behavior

Description

Slack message from Jin:

There seems to be some bugs with decimate
patch1:
["select(copy=False,time=(numpy.datetime64('2015-11-09T21:17:00'), numpy.datetime64('2015-11-09T21:18:10')))",
"decimate(copy=True,dim='time',factor=4,lowpass=True)",
"decimate(copy=True,dim='time',factor=5,lowpass=True)",
"decimate(copy=True,dim='time',factor=10,lowpass=True)",
"decimate(copy=True,dim='time',factor=10,lowpass=True)"]

patch2:
["select(copy=False,time=(numpy.datetime64('2015-11-09T21:17:00'), numpy.datetime64('2015-11-09T21:18:10')))",
'pass_filter(corners=4,time=(None, 0.5),zerophase=True)',
"decimate(copy=True,dim='time',factor=2000,lowpass=False)"]

patch3:
data = p[0].data.copy()
lpdata = gjsignal.lpfilter(data,0.0005,0.5,axis=1)[:,::2000]
plt.figure()
plt.plot(dsp.data[200,:],label='patch.decimate')
plt.plot(dsp2.data[200,:],label='patch.pass_filter.decimate(lowpass=False)')
plt.plot(lpdata[200,:],label='gjsignal.lpfilter')
plt.legend()

Conda version mismatch

Description

After installing dascore with conda dascore.__version__ prints "0.0.0". However, a conda list does show the correct version (0.0.9). We need to figure out why.

select vs sel

Description
slack message from Jin:

It seems that the select function cannot take a list of sample locations.

upgrade and import issue

I upgraded dascore to the most recent version and now have importing issue:

ImportError Traceback (most recent call last)
/tmp/ipykernel_312/2507837442.py in
----> 1 import dascore as dc

~/anaconda3/lib/python3.9/site-packages/dascore/init.py in
3 from xarray import set_options
4
----> 5 from dascore.core.patch import Patch
6 from dascore.core.schema import PatchAttrs
7 from dascore.core.spool import BaseSpool, spool

~/anaconda3/lib/python3.9/site-packages/dascore/core/init.py in
2 Core routines and functionality for processing distributed fiber data.
3 """
----> 4 from .patch import Patch # noqa

~/anaconda3/lib/python3.9/site-packages/dascore/core/patch.py in
14 from dascore.constants import PatchType
15 from dascore.core.schema import PatchAttrs
---> 16 from dascore.io import PatchIO
17 from dascore.transform import TransformPatchNameSpace
18 from dascore.utils.coords import Coords, assign_coords

~/anaconda3/lib/python3.9/site-packages/dascore/io/init.py in
2 Modules for reading and writing fiber data.
3 """
----> 4 from dascore.io.core import write
5 from dascore.utils.misc import MethodNameSpace
6

~/anaconda3/lib/python3.9/site-packages/dascore/io/core.py in
25 )
26 from dascore.utils.docs import compose_docstring
---> 27 from dascore.utils.hdf5 import HDF5ExtError
28 from dascore.utils.misc import suppress_warnings
29 from dascore.utils.patch import scan_patches

~/anaconda3/lib/python3.9/site-packages/dascore/utils/hdf5.py in
14 import numpy as np
15 import pandas as pd
---> 16 import tables
17 from packaging.version import parse as get_version
18 from tables import ClosedNodeError

~/anaconda3/lib/python3.9/site-packages/tables/init.py in
22
23 # Necessary imports to get versions stored on the cython extension
---> 24 from .utilsextension import (
25 get_pytables_version, get_hdf5_version, blosc_compressor_list,
26 blosc_compcode_to_compname_ as blosc_compcode_to_compname,

tables/utilsextension.pyx in init tables.utilsextension()

ImportError: cannot import name typeDict

Can't get gauge length from attributes - PRODML v. 2.0 format

Description

I can see "GaugeLength" in the attributes list using HDFView software. However, I can't get the value using: patch_0.attrs['gauge_length']
Please note that I can get some other attributes such as sampling interval or channel spacing.

Data format: ONYX - PRODML v. 2.0

Example

import dascore as dc
sp = dc.spool(data_path)
patch_0 = sp[0]
gauge_length = patch_0.attrs['gauge_length']

Error:

AttributeError                            Traceback (most recent call last)
Cell In[3], line 12
      8 print(patch_0.attrs)
     11 # get sampling rate, channel spacing, and gauge length
---> 12 gauge_length = patch_0.attrs['gauge_length']
     13 print("Gauge length = ", gauge_length)
     14 channel_spacing = patch_0.attrs['d_distance']

File [~/anaconda3/envs/py10/lib/python3.10/site-packages/dascore/core/schema.py:94](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/fervo/iDAS_stimulation_stage8/LF_DAS/~/anaconda3/envs/py10/lib/python3.10/site-packages/dascore/core/schema.py:94), in PatchAttrs.__getitem__(self, item)
     93 def __getitem__(self, item):
---> 94     return getattr(self, item)

AttributeError: 'PatchAttrs' object has no attribute 'gauge_length'

Expected behavior

Versions

OS [e.g. Ubuntu 20.04]: Ubuntu 20.04
DasCore Version [e.g. 0.0.5]: 0.0.12
Python Version [e.g. 3.10]: 3.10

DASCore index to support missing time or distance values

Description

Currently the indexing mechanism for the DirectorySpool requires all values in "time_max", "time_min", "d_time", "distance_max", "distance_min", "d_distance" to be non-null values. However, this doesn't need to be the case, and its conceivable some patches may not have even sampling in both time/distance, or may have a completely different second dimension name "eg channel_number". We need to make sure the indexing mechanism is flexible enough to handle these cases.

Colorbar position for figures with subplots

It seems that the colorbar position is not handled appropriately.
Probably good to allow user to turn off colorbar in the waterfall plot.

Spool indices and data from the future

Description

When working with really fresh data set collected by the Colorado School of Mines Terra15 interrogator, we noticed calling spool.update will create duplicates in the spool's index.

I tracked this down to dascore.utils.misc.iter_files. The mtime in the recorded das files is greater than the timestamp returned by time.time, which is odd since, AFAIK, both should be Unix timestamps (seconds from UTC 1970-01-01).

For example:

import time
from pathlib import Path

import dascore as dc

path = Path("DASRCN_whale_velocity_UTC-YMD20230531-HMS172443.512_seq_00000000000.hdf5")

print(path.stat().st_mtime)  # 1685607120.0
print(time.time())  # 1685587714.1883616

So, the indexer is working because it is supposed to re-index any files with m_times after the last time the indexer was run. Since this files m_time is several hours in the future, it will continue to be added to the index each time update is called.

A few things we need to figure out:

Is m_time really based on the system time and not UTC? That doesn't really make sense to me and is contrary to the answer in this SO post.
Is there something wrong with the mtimes created by the interrogator? The difference between atime, ctime, and mtime is suspicious since the files should be created and finalized within a few minutes.

st_atime=1685586990
st_mtime=1685607120
st_ctime=1685586057

Note that st_mtime - st_atime ~ 6 hours, the difference between UTC and Colorado's time zone.

Expected behavior

Perhaps DASCore should check that m_time is <= current time and if not set m_time to current time? This could slow down and complicated indexing though.

Incomplete signatures generated by docs

Signatures in documentation doesn't show default values. For example, for the attached decimate function doc page, the parameter copy should show copy=True.

history stores the select function as a list of single character

Sub Directory Missed on Spool

Description

Spool index's subdirectory's without maintaining links, causing missing files error on trying to access that data.

Example

Versions

OS [Windows 10]
DasCore Version [ 0.0.10]:
Python Version [3.10]:

Got UnknownFiberFormat

Discussed in #120

^{Originally posted by xunen63 March 30, 2023}
When I try to read a hdf5 file with dascore like this
"
import dascore as dc
file_path = 'E:\try_PubDAS\FORESEE\FORESEE_UTC_20190404_194804.hdf5'
spool= dc.spool(file_path)
"

I got:

Traceback (most recent call last):
File "E:\try_PubDAS\DAS_tools.py", line 9, in
spool= dc.spool(file_path)
File "D:\softwares\anaconda\envs\pytorch19\lib\functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "D:\softwares\anaconda\envs\pytorch19\lib\site-packages\dascore\core\spool.py", line 327, in spool_from_str
_format, _version = dc.get_format(path)
File "D:\softwares\anaconda\envs\pytorch19\lib\site-packages\dascore\io\core.py", line 506, in get_format
raise UnknownFiberFormat(msg)
dascore.exceptions.UnknownFiberFormat: Could not determine file format of E:\try_PubDAS\FORESEE\FORESEE_UTC_20190404_194804.hdf5

spool negative indexing

Description

Currently, negative indexing doesn't work on spools, but it should.

Example

import dascore as dc

spool = dc.get_example_spool("random_das")
last_patch = spool[-1]  # raises an Error

Expected behavior

Negative indexes should work exactly how the do for other python sequences.

Versions

OS [e.g. Ubuntu 20.04]:
DasCore Version [e.g. 0.0.5]:
Python Version [e.g. 3.10]:

chunk doesn't work on FileSpool

Description

Currently, chunk doesn't work with a file spool because FileSpool doesn't accept an instance of itself as an input argument for its __init__ method, which is expected by the DataFrameSpool.chunk which FileSpool inherits from.

Example

file_path = fetch('terra15_das_1_trimmed.hdf5')

spool = dc.spool(file_path)

# this raises
spool.chunk(time=.01)

Expected behavior

It should "just work".

Versions

OS [e.g. Ubuntu 20.04]: ubuntu
DasCore Version [e.g. 0.0.5]: master
Python Version [e.g. 3.10]: 3.10

Installation on M1 Macs

Description
While discussing quarto for generating API docs, it was reported here that dascore doesn't install on M1 macs. Potential some problem with pytables/HDF5.

Does anyone out there have an M1 mac we can run tests on? I would like to get this figured out soon as I think many of our users will be using M1s.

use spool().chunk() for a 'MemorySpool' created by the spool_from_patch_list

Description

I have a list of patches, and I want to merge them together to create one patch using the chunk() method. I am using spool_from_patch_list to create a spool from the list of patches and then use chunk() to merge them:

new_spool = spool_from_patch_list(plist)
p = spool(new_spool).chunk(time=None)

However, since the new_spool's type is 'dascore.core.spool.MemorySpool' (instead of 'dascore.clients.dirspool.DirectorySpool',) I'm getting the following error:

ValueError                                Traceback (most recent call last)
Cell In[5], line 3
      1 import imp
      2 imp.reload(lfproc)
----> 3 plist = lfproc.gather_results(output_folder)

File [~/coding/lfproc.py:156](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/coding/lfproc.py:156), in gather_results(folder)
--> 156 return spool(new_spool).chunk(time=None)

File [~/coding/dascore/dascore/core/spool.py:234](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/coding/dascore/dascore/core/spool.py:234), in DataFrameSpool.chunk(self, overlap, keep_partial, snap_coords, tolerance, **kwargs)
    226 df = self._df.drop(columns=list(self._drop_columns), errors="ignore")
    227 chunker = ChunkManager(
    228     overlap=overlap,
    229     keep_partial=keep_partial,
   (...)
    232     **kwargs,
    233 )
--> 234 in_df, out_df = chunker.chunk(df)
    235 if df.empty:
    236     instructions = None

File [~/coding/dascore/dascore/utils/chunk.py:309](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/coding/dascore/dascore/utils/chunk.py:309), in ChunkManager.chunk(self, df)
    300     new_start_stop = get_intervals(
    301         start,
    302         stop,
   (...)
    306         keep_partials=self._keep_partials,
    307     )
    308     # create the newly chunked dataframe
--> 309     sub_new_df = self._create_df(current_df, self._name, new_start_stop, gnum)
    310     out.append(sub_new_df)
    312 out = pd.concat(out, axis=0).reset_index(drop=True)

File [~/coding/dascore/dascore/utils/chunk.py:199](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/coding/dascore/dascore/utils/chunk.py:199), in ChunkManager._create_df(self, df, name, start_stop, gnum)
    197     vals = merger[col].unique()
    198     assert len(vals) == 1, "Haven't yet implemented non-homogenous merging"
--> 199     out[col] = vals[0]
    200 # add the group number for getting instruction df later
    201 out["_group"] = gnum

File [~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/frame.py:3950](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/frame.py:3950), in DataFrame.__setitem__(self, key, value)
   3947     self._setitem_array([key], value)
   3948 else:
   3949     # set column
-> 3950     self._set_item(key, value)

File [~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/frame.py:4143](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/frame.py:4143), in DataFrame._set_item(self, key, value)
   4133 def _set_item(self, key, value) -> None:
   4134     """
   4135     Add series to DataFrame in specified column.
   4136 
   (...)
   4141     ensure homogeneity.
   4142     """
-> 4143     value = self._sanitize_column(value)
   4145     if (
   4146         key in self.columns
   4147         and value.ndim == 1
   4148         and not is_extension_array_dtype(value)
   4149     ):
   4150         # broadcast across multiple columns if necessary
   4151         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File [~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/frame.py:4870](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/frame.py:4870), in DataFrame._sanitize_column(self, value)
   4867     return _reindex_for_setitem(Series(value), self.index)
   4869 if is_list_like(value):
-> 4870     com.require_length_match(value, self.index)
   4871 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File [~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/common.py:576](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/~/anaconda3/envs/dascore/lib/python3.11/site-packages/pandas/core/common.py:576), in require_length_match(data, index)
    572 """
    573 Check the length of data matches the length of the index.
    574 """
    575 if len(data) != len(index):
--> 576     raise ValueError(
    577         "Length of values "
    578         f"({len(data)}) "
    579         "does not match length of index "
    580         f"({len(index)})"
    581     )

ValueError: Length of values (2) does not match length of index (1)

Also, is there any other way to merge patches from a list of patches?
Thanks in advance!

Example

Expected behavior

Versions

OS [e.g. Ubuntu 20.04]:
DasCore Version [e.g. 0.0.5]:
Python Version [e.g. 3.10]:

migrate to new discussion style

Discussed in https://github.com/DASDAE/dascore/discussions/77

^{Originally posted by d-chambers November 11, 2022}
I am checking out the new version of github projects. It looks nice. It supports both board and spreadsheet style views, custom fields, etc.

https://github.com/orgs/DASDAE/projects/2

I am in favor of switching over and archiving the old project board. Thoughts? ( I think @eileenrmartin is the only other person adding things to the old project board)

update docs with correct link

Enable sitemap

Description

We need to enable the generation of a quarto site map so that Google will index our doc pages. This will help, for example, when one googles dascore patch detrend so the correct page will be suggested.

See the here for how to do this.

dasdae / dascore Goto Github PK

dascore's Introduction

DASCore

dascore's People

Contributors

Stargazers

Watchers

Forkers

dascore's Issues

Description

Description

Example

Description

Description

Example

Expected behavior

Versions

Description

Example

Expected behavior

Versions

Description

Discussed in #65

Description

Description

1. The example data (idas_h5_example_path)

2. My Silixa iDAS dataset

Example

Expected behavior

Versions

Description

Example

Expected behavior

Versions

Description

Example

Description

Description

Description

Example

Error:

Expected behavior

Versions

Description

Description

Expected behavior

Description

Example

Versions

Discussed in #120

Description

Example

Expected behavior

Versions

Description

Example

Expected behavior

Versions

Description

Example

Expected behavior

Versions

Discussed in https://github.com/DASDAE/dascore/discussions/77

Description

Recommend Projects

Recommend Topics

Recommend Org

Jobs