dgbowl / dgpost Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 4.0 30.16 MB

dgpost: datagram post-processing toolkit

Home Page: https://dgbowl.github.io/tomato

License: GNU General Public License v3.0

Python 100.00%

dgpost's People

Contributors

Stargazers

Watchers

Forkers

peterkraus ileu nukp empaeconversion

dgpost's Issues

`reflection.prune_cutoff`: `near` parameter is not parsed properly

There is a bug in the helper function load_data, where the near parameter does not get picked up properly when a yaml-file input is used.

https://github.com/dgbowl/dgpost/blob/master/src/dgpost/utils/helpers.py#L286-L296

Drop `yadg` dependency and move to `pint` registry

In dgpost~2.0, we are using the unit registry included in yadg~4.2. This is however restricting and should be relaxed, so that compatibility with the base pint unit registry is restored.

This is likely a breaking change and therefore would need thorough testing.

Front page documentation

Create a front page for the documentation, linking to the following items:

recipe schema description
features in recipes: load, extract, transform, save
user documentation for the transform module
developer documentation for the transform module

Conversion of co-feed

Think about conversion of co-feeds in inlet mixture.

Minimal starting plot

Plot in 3 vertical, aligned windows the data as a function of time. See sketch attached!

Window 1:
Voltage WE vs time

WIndow 2
FE of main products vs time
Main products means H2, CO, CH4, C2H4, HCOOH, CH3CH2OH

Window 3
Corollary info, gas flow and electrolyzer
temperature vs time

Possible issue with Pint library version.

When running the following command:
dgpost ./Recipe/dgpost/dgpost.recipe.01_electro.yaml --patch ./Output/20230602-15p-Cu-PsA1-I-01/datagram_20230602-15p-Cu-PsA1-I-01 -v

I received the following error

Traceback (most recent call last):
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\Scripts\dgpost.exe\__main__.py", line 4, in <module>
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\dgpost\__init__.py", line 5, in <module>
    from .main import run_with_arguments, run
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\dgpost\main.py", line 16, in <module>
    from dgpost.utils import parse, load, extract, transform, save, plot, pivot
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\dgpost\utils\__init__.py", line 1, in <module>
    from .extract import extract
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\dgpost\utils\extract.py", line 74, in <module>
    from dgpost.utils.helpers import (
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\dgpost\utils\helpers.py", line 27, in <module>
    ureg.define("refractive_index_units = [] = RIU")
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\plain\registry.py", line 469, in define
    self._helper_dispatch_adder(definition)
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\plain\registry.py", line 492, in _helper_dispatch_adder
    adder_func(definition)
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\group\registry.py", line 85, in _add_unit
    super()._add_unit(definition)
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\nonmultiplicative\registry.py", line 72, in _add_unit
    super()._add_unit(definition)
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\plain\registry.py", line 571, in _add_unit
    self._helper_adder(definition, self._units, self._units_casei)
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\plain\registry.py", line 509, in _helper_adder
    self._helper_single_adder(
  File "C:\Users\plnu\AppData\Local\miniconda3\envs\test_yadg\lib\site-packages\pint\facets\plain\registry.py", line 532, in _helper_single_adder
    raise RedefinitionError(key, type(value))
pint.errors.RedefinitionError: Cannot redefine 'RIU' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)

You can find the .nc file I used and the recipe here. This file was created using yadg5.1. When using yadg5.04 to process this experiment, the script ran just fine.

I dug a bit deeper into the source of the error. The last pasrt where the traceback stay in dgpost is at line 27 in `dgpost\utils\helpers.py'

Here the line that potentiall causing issue is

ureg = pint.get_application_registry()

ureg.define("refractive_index_units = [] = RIU")

I isolated this code snippet into a separate jupyter notebook file (putting correct import for Pint of couse). I found that if I run this using Pint==0.23, then this snippet ran just fine. But if I use Pint==0.24.1 (which is required by yadg5.1) then I recevie the same error as I got above. Please let me know if you need any additional information on this. Thank you.

In addition, in order to test yadg5.1, I created a new conda environment for this. I found that if I just pip install yadg and dgpost directly, this will not work as the latest version of numpy 2 and pandas 2 has many incompatibility with many underlying library. With some experiment, I found that pandas==2.0.0 and numpy==1.23.5 works for the time being. You may would like to consider updating the requirement for yadg and dgpost with this.

`pivot`: Feature request

Currently a big missing feature in dgpost is to be able to "pivot" tables using an arbitrary data column.

For instance, in electrochemistry data, we often have a running index denoting the current charge/discharge cycle, and the cell capacity (charge) should be integrated between changes in that index. This is currently not possible in dgpost, as all functions from the transform library are applied on the whole table.

`plot`: Allow per-subfigure selection of tables.

Instead of restricting the plotting to use a single table for the whole figure, we should allow selecting the source table on a per-subfigure basis. This would allow plotting data from multiple tables in multiple subfigures in one figure, using a shared x-axis, but at their native frequency.

`save`: allow saving NetCDF files

It should be possible to use NetCDF to roundtrip data in addition to pickles.

Feedback from Francesco

add a way to specify datagram on command line
~~double-check units of GC -> FE~~ -> Implemented in yadg-4.1
~~conversion compatibility with [mol/s] without internal standard~~ -> Implemented in #32

Minor issues

Bugs:

~~Pivot does not actually process timedelta~~ fixed in #79

Documentation:

documentation index does not link to dgbowl_schemas.Recipe properly
documentation usage of dgpost should be consistent - either dgpost, dgpost, or dgpost, not a mixture
include a link to the extended dgpost usage example in the usage page
the namespacing in dgbowl_schemas should be reworked to always point to the latest version of schema: dgbowl_schemas.Recipe works, but the components (load, extract, etc.) don't

Extend electrochemistry library.

To allow for better interpolation of electrochemical data as a function of timesteps, the following two functions should be implemented:

total charge: q = ∫[0->t](I dt)
average current: <I> = dq / dt

where dt is the difference of two consecutive timestamps.

Move recipe schema to dgbowl-schemas

The validation of recipe schemas should be handed off to the dgbowl-schemas package:

move validation from strictyaml to pydantic
fix v1.0 of recipe schema
implement versioning of schema

Extend rates function library.

The rates library in the transform module should be extended with functions that calculate formation rates from a batch experiment. Given a set of timestamps and corresponding concentrations and volumes, the rates can be calculated in a fairly straightforward way as ndot = dc * V / dt.

`unpivot`: the opposite of `pivot`

unpivot, the accompanying inverse operation to pivot, should be included, so that e.g. timeseries data can be re-expanded back onto the whole table.

`save`: allow to optionally select columns

When saving tables, it should be possible to optionally select columns which are to be exported.

Add parameter uncertainties to impedance fitting

Paramteter uncertainty is still missing in impedance.fit_circuit.
For further information look:

Interpolate impedance values.

Given a set of freq, Re(Z) and -Im(Z) data, it would be great to find all values of Re(Z) (and freq) where -Im(Z) = 0.

Implement multiindex tables.

Explore multiindex pandas functionality to "nest" column headers instead of separating them using "->".

Extracting variables:

for the key (source) entry, keep the parent_key->first_key->* and parent_key->first_key->second_key syntax
for the as (target) entry, -> syntax is translated into a multiindex column
subkeys are possible (currently xout->species)
sub-subkeys are also possible (currently traces->detectors->axes)

Units:

instead of nesting variable names in df.attrs["units"] using ->, use a dictionary structure.

Transforming columns:

column specification should be reworked to accept an arrow-separated string (current behaviour) as well as a tuple of str indices (multiindex behaviour)

Documentation for v1.0

The documentation of dgpost needs tidying up for v1.0:

developer/contributor documentation
release notes & zenodo archive
example usage
tidy up autodocs for transform

Issue with dgpost 2.1

I am now working on implementing the yadg 5 upgrade to the pipeline of CO2 reduction e.chem data analysis. I folked yadg 5.0 and use this version to create the datagram.nc file. While running yadg, I also encountered some errors. So, I had to make some minor adjustments to the script, but I eventually could get datagram.nc file. Now that I use this datagram.nc to run dgpost, I encounterd an error. I am not sure whether this is the issue with dgpost itself or it is something with the .nc file itself.

Please find attached the ymal and datagram (.nc) file used in the run as follows: https://drive.switch.ch/index.php/s/fb1oqUuXDz9HRXW

I ran dgpost using the following command:

dgpost "C:\Users\plnu\Dropbox\PythonStuff\Empa\Notebooks\Empa_30 Testing yadg5\Test auto DB\AutoDB\Recipe\dgpost\dgpost.recipe.01_electro_v5.yaml" --patch "C:\Users\plnu\Dropbox\PythonStuff\Empa\Notebooks\Empa_30 Testing yadg5\Test auto DB\AutoDB\Output\test_fran_file\datagram_test_fran_file" -v

But got the following error:

`INFO:dgpost.main:Processing 'load'.
Traceback (most recent call last):
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\file_manager.py", line 211, in _acquire_with_cache_info
file = self._cache[self._key]
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\lru_cache.py", line 56, in getitem
value = self._cache[key]
KeyError: [<class 'h5netcdf.core.File'>, ('C:\Users\plnu\Dropbox\PythonStuff\Empa\Notebooks\Empa_30 Testing yadg5\Test auto DB\AutoDB\Output\test_fran_file\datagram_test_fran_file.nc',), 'r', (('decode_vlen_strings', True), ('invalid_netcdf', None)), '4f541ab0-0650-487e-9c15-0232df7eec9c']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\Scripts\dgpost.exe_main.py", line 7, in
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\dgpost\main.py", line 229, in run_with_arguments
run(args.infile, patch=args.patch)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\dgpost\main.py", line 95, in run
tables[el["as"]] = load(fp, None, el["type"])
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\dgpost\utils\load.py", line 51, in load
return datatree.open_datatree(path, engine="h5netcdf")
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\datatree\io.py", line 58, in open_datatree
return open_datatree_netcdf(filename_or_obj, engine=engine, **kwargs)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\datatree\io.py", line 66, in open_datatree_netcdf
ds = open_dataset(filename, **kwargs)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\api.py", line 573, in open_dataset
backend_ds = backend.open_dataset(
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\h5netcdf.py", line 402, in open_dataset
store = H5NetCDFStore.open(
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\h5netcdf.py", line 175, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\h5netcdf.py", line 126, in init
self.filename = find_root_and_group(self.ds)[0].filename
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\h5netcdf.py", line 186, in ds
return self.acquire()
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\h5netcdf.py", line 178, in _acquire
with self._manager.acquire_context(needs_lock) as root:
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\contextlib.py", line 135, in enter
return next(self.gen)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\file_manager.py", line 199, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\xarray\backends\file_manager.py", line 217, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\h5netcdf\core.py", line 1051, in init
self._h5file = self._h5py.File(
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\h5py_hl\files.py", line 562, in init
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\h5py_hl\files.py", line 235, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 102, in h5py.h5f.open
OSError: Unable to synchronously open file (file signature not found)`

`load`: support for NeXus files

dgpost should be able to read NeXus files, in particular ones containing NXlog entries (i.e. timeseries data). Support for other NeXus files (e.g. NXdata entries) is perhaps less important at this stage.

Implement table versioning

The save function should document the version of dgpost used to create the saved table. The load function should be version-aware. This is especially important for round-tripping saved pkl/json files, as a move to multiindex tables is planned for v2.0 (#2).

Implement loading of other files than datagrams.

dgpost should be able to load other files than datagrams. Especially pkl files which include a pandas.DataFrame for chaining and round-tripping data, but also non-datagram jsons and perhaps csv and xlsx.

dgbowl / dgpost Goto Github PK

dgpost's People

Contributors

Stargazers

Watchers

Forkers

dgpost's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs