GithubHelp home page GithubHelp logo

pycroscopy / scifireaders Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 13.0 103.16 MB

Tools for extracting data and metadata from scientific data files

Home Page: https://pycroscopy.github.io/SciFiReaders/about.html

License: MIT License

Python 19.80% Jupyter Notebook 78.75% PowerShell 0.36% Shell 0.12% JavaScript 0.39% CSS 0.01% Roff 0.58%

scifireaders's Introduction

pycroscopy

Downloads GitHub Actions PyPI coverage conda-forge License DOI Notebook

pycroscopy is a python package for generic (domain-agnostic) microscopy data anlaysis. More specialized or domain-specific analysis routines are contained within some of the other packages within the pycroscopy ecosystem.

Please visit our homepage for more information and installation instructions.

If you use pycroscopy for research, we would appreciate if you could cite our Arxiv paper titled "USID and Pycroscopy - Open frameworks for storing and analyzing spectroscopic and imaging data"

scifireaders's People

Contributors

ahoust17 avatar gduscher avatar marmdixit avatar nsulmol avatar ondrejdyck avatar rajgiriuw avatar ramav87 avatar saimani5 avatar slautin avatar smisra87 avatar ssomnath avatar sumner-harris avatar utkarshp1161 avatar ziatdinovmax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scifireaders's Issues

Add "Microscopy" as the outermost folder

ScopeReaders shouldn't necessarily be limited to microscopy (though the name partly suggests as much). We should add a directory called "Microscopy" above all the SPM, EM, etc. that we have. This will make room for other communities like mass spectrometry, etc.

Importantly though, should we consider renaming this package to something more general / accommodative?
How about ScienceReader, ExperimentReaders (potentially, this could be used for extracting information from simulations too), SciFiReaders (scientific file readers) or something better?

HyperSpy object translation

Functions to translate to and from HyperSpy's Signal objects to go into SciFiReaders (rather than sidpy). I realize now that we have two separate objectives in SciFiReaders:

  1. Readers for reading from files
  2. "Object Translators" (or give it another name) for going back and forth between in-memory objects.

How best do we organize the package then? Should we have a fork at the very top that splits between Readers and "Object Translators" or should we keep the existing modality-based organization and mix Readers with "Object Translators"? This is up for debate. I think it should be the former but could also work with the latter. I do not think it makes much sense to have a dedicated package just for "Object Translators"

example data for arhdf5

@ramav87 can you / someone from CNMS please add an example data file for ARHDF5 in the data folder for testing and example usage?

Launch new version

I think we should launch a new version of the package soon with all the latest bug fixes and additions. We need to merge the half a dozen or so branches into master as appropriate before releasing the new version.

ARDFh5Reader does not recognize Fast Force Mapping datasets

Fast force mapping (FFM) datasets are not recognized by the ARDFh5Reader and therefore not read. This is because unlike standard force mapping datasets that have a list of lists structure, FFM datasets are saved as simple arrays. Suggest to create a new reader for FFM datasets.

Checks at the __init__ stage

It looks like many readers (aka translators) check the file extension at the __init__ stage and throw an error if the extension is not "the right one" (e.g. a DM3Reader will through an error if the extension is not .dm3, etc.). This seems to defy the purpose of having the ingestor (and in fact, will cause an error in the ingestor). My suggestion is to remove all the checks from the __init__ method in favor of can_read method.

Why are metadata written as "original_metadata" and not "metadata"?

Is there a specific reason the metadata are not just saved as "metadata" attributes instead of "original_metadata"? I found that unintuitive and at odds with the SidPy documentation-- here, it's only buried on an example page.

I'm not sure if there's a specific SidPy Dataset conflict that is the source of this decision, but I wanted to bring it up. It took some tracking down when using the Igor reader to find the metadata.

Otherwise a suggestion might be to default .metadata attribute in any given SciFiReader to be a string of "metadata are saved as 'original_metadata.' This metadata attribute will be overwritten via PyNSID or pycroscopy utilities during processing."

USID reader

Need to handle the case when reading USID data that does not contain ndim representations. One option will be to read it in as sparse arrays, but this will need some work.

Is there any data and/or metadata standardization available?

One thing I have noticed while viewing datasets from different file formats in sidpy is that there does not appear to be any explicit standardization of the data or metadata. Particularly:

  1. Data follow the coordinate system definitions of their file formats, and are not standardized to a common origin and/or format.
  2. Metadata are stored as saved for their file format, with no standardization performed.

I wanted to (a) confirm this is the intention, and (b) ask whether, if standardized, there would be any interest in including this in SciFiReaders.

I have been reviewing metadata differences in topographical afm images, and may end up creating a dictionary or 'translator' to allow analyzing metadata in a common format. I cannot guarantee I will finish it (and it would be limited only to topographical data for now), but I am still wondering if there is explicit value for this.

My reasoning for doing so is simply that it would allow a common set of methods/modules for analyzing data from different devices (the laboratory where I am working has multiple different afm/spm devices that save in different proprietary formats). I certainly see the value in using pycroscopy to read from these devices, and ideally could completely abstract away the device used to save the data.

If I were to do such a thing, would it make sense to take advantage of the metadata attribute (rather than original_metadata) for this? I cannot seem to find the current purpose of this attribute right now (though I have not used these tools significantly yet).

Example dataset for dm3 and nion

We need each reader to have at least one example datasets for use on documentation examples and tests. I don't see any examples for the DM3 and Nion readers in the data folder. Could you please add them?

Fix Omicron Asc Reader

The Omicron Asc Reader does not appear to read .asc files correctly. This needs further investigation.

.3ds reader and .asc file

There's a .3ds file in the data/ folder, but no reader associated with it. Need to make a reader for those files, and correspondingly add an .asc file for the existing asc reader.

Unable to read *ibw files with IgorIBWReader

Hi there,

I am trying to read my *ibw files with SciFiReaders library.
This is my code:

import SciFiReaders as sr
reader = sr.IgorIBWReader('/path_to_file/file.ibw')
reader.read()

And this is the output:

----> 4 reader.read()

~/opt/anaconda3/envs/pycroscopy/lib/python3.7/site-packages/SciFiReaders/readers/microscopy/spm/afm/igor_ibw.py in read(self, verbose, parm_encoding)
45
46 # Load the ibw file first
---> 47 ibw_obj = bw.load(file_path)
48 ibw_wave = ibw_obj.get('wave')
49 parm_dict = self._read_parms(ibw_wave, parm_encoding)

AttributeError: 'NoneType' object has no attribute 'load'
file.ibw.zip

Thanks a lot in advance!

Port pyUSID.ImageTranslator to ScopeReaders.ImageReader

One question would be where this reader would live. Perhaps we need to make another sub package called other or generic. I would prefer generic since it prevents people from putting scientific Readers into something vague.

Changing Igor to Igor2

I recently found out (when trying to do the same thing) that Igor2 has supplanted the very old and un-updated Igor, which throws errors with modern version of numpy.

I will fix this as it's trivial, but wanted to write this issue in part if people Google about Igor2.

converters as Reader?

I don't think converters (e.g. HyperSpy) can be treated as a Reader since Readers expect a file path. We should either relax the requirement that the Reader only accept a valid file path or move the converters out of the readers directory. Thoughts?

Basic documentation

We need a few pages of basic documentation talking about the scope of this repository, pointing people to the two main examples, etc.

Make some more tests

Code coverage is limited, so more tests are needed for the AFM side of things

No popup for SciFireaders

Hi

I used the following code, but it could not popup interactive window in spyder, do you know how it might help?

%pylab notebook
%gui qt
import os
import sys
from sidpy.io.interface_utils import openfile_dialog_QT
sys.path.append('../')
from SciFiReaders import IgorIBWReader
import SciFiReaders
print('SciFiReaders version: ', SciFiReaders.version)
import sidpy
print('sidpy version: ', sidpy.version)

Migrate Translators in Pycroscopy to ScopeReaders

Once issue #1 is complete, we will have an example showing users how to put together sidpy.Reader classes.
Contributors could use this example as a basis for converting Translator classes in pycroscopy.io.translators to sidpy.Reader classes. The Reader class is almost identical to the Translator class.

The majority of the effort would be in converting file writing portions of the code to populating sidpy.Dataset objects and returning these objects instead.

Note - Do not migrate CNMS-specific Band Excitation or General Mode Translators to ScopeReaders. These have already been copied over to BGlib and are likely better off remaining as Translators due to their complexity.

Connecting Readers to Data Files & Examples

There are currently around 5 data files in "\data" and more than 5 readers. It is currently not obvious which reader goes with which file. For each reader, there should be some example script (probably in examples) and some example data file (probably in \data)

Add gwyddion reader

We need to support Gwyddion files; there is an old reader on a legacy branch in px that can be ported for use here.

remove can_read from Reader

the can_read() function in sidpy.Reader seems superfluous, tedious, and unnecessary. Whether or not a Reader can read a file can easily be determined by whether __init__ throws an exception. If __init__ did not throw an exception, it is capable of reading that file. This will mean that we may need to restructure the codes a tiny bit in SciFiReaders first, then mark can_read() in sidpy as Deprecated or a FutureWarning.

Thoughts?

Example on how to put together a Reader class

We need to show a quick and simple example on how to develop a Reader class to extract data and metadata from proprietary instrument data files.

We already have a tutorial for developing Translator classes in pyUSID documentation. I suggest using the same example, but stopping before the second component - writing to disk. Instead show users how to populate a Dataset object with raw data and metadata.

NSID HDF5 reader

We need to move the existing Reader class in pyNSID into ScopeReaders. The one challenge is that we are not releasing new versions of ScopeReaders very frequently and are still in "collection" mode. Perhaps this can be done at a later time / whenever we release the next version of ScopeReaders

Example on how one would use a Reader class

This example should show how a user would go about using a Reader class:

  1. First pick the appropriate Reader class. We can replace this manual step with a read() function which would automatically find which Reader to use
  2. Then, pass the file to instantiate the Reader object
  3. Call the my_reader.read() function to get back one or more sidpy.Dataset objects
  4. Show the user that this sidpy.Dataset object has the following:
    1. Raw data - by visualizing the data
    2. Metadata
  5. How how the user could optionally use pyNSID or pyUSID to write the sidpy.Dataset object(s) to HDF5 if the user wanted to exchange information with other researchers.
    1. Show the tree structure within the HDF5 file showing where the data has been written to. No need to visualize again.
    2. Show that the metadata too has been safely captured
    3. Show that any ancillary dataset(s) have also been written to HDF5

Problem reading multiple files using NSIDReader

Hello, I am able to successfully read one hdf5 file using NSIDReader and store it in a variable.
However, when I read another file and store it in another variable, somehow the 1st variable data is also replaced by the new file. Could you please help me with this?

example:
dset1 = NSIDReader("data1.h5").read()
--> dset1 consists of data from data1.h5
dset2 = NSIDReader("data2.h5").read()
--> now both dset1 and dset consists of data from data2.h5.

How do I prevent dset1 from being overwritten?

I have tried the NSIDReader from both SciFiReaders and pyNSID modules, with the same result.

Central / automatic read function

We need a simple read() function that can figure out which Reader to use when the user does not specify a specific Reader object. For this to be a reality, we need every Reader class to implement the can_read() function. In most cases, this will be trivial, all one would need to do is call __super__ and provide the file extension that this Reader handles.

I have already written a version of this in pycroscopy here. We just need to copy the contents and update some links.

Swap pywget for wget

pywget should work for Windows, Mac, and Linux unlike Wget (Linux and Mac only). Also need to move pywget outside the requirements and into test requirements

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.