GithubHelp home page GithubHelp logo

davidt3 / daxa Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 0.0 6.62 MB

Democratising Archival X-ray Astronomy (DAXA) is an easy-to-use Python module for downloading multi-mission X-ray telescope data and processing it into usable archives. Users can acquire entire archives, or filter observations based on ID/positions/time. Supports XMM; partial support eROSITA, Chandra, NuSTAR, Swift, Suzaku, ASCA, ROSAT, INTEGRAL

License: BSD 3-Clause "New" or "Revised" License

Python 98.70% TeX 1.30%
astronomy astrophysics python x-ray-astronomy xmm chandra erosita xga archival-astronomy nustar

daxa's People

Contributors

davidt3 avatar dependabot[bot] avatar guptaagr avatar jessicapilling avatar tobywallage avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

daxa's Issues

Should design a DAXA-specific cleaning process at some point

This would be mission-agnostic, and ideally support any of the telescopes which DAXA ends up being able to reduce data for. This would be an alternative to the mission-specific methods I am implementing first (i.e. the SAS cleaning methods for XMM).

Process logging storage keys

Currently the logs, errors, processed errors, and warnings are stored in archives under either an ObsID or an ObsID+instrument+sub exposure ID combo.

This is somewhat tantamount to what the docstrings in the Archive class say, as they state either an ObsID or an ObsID+Instrument key combo.

I should consider having lower level instrument and then sub-exposure dictionaries to store the results/logs in, rather than ObsID+instrument+exposure ID. I intend to implement some sort of lookup method that can grab all results for an ObsID, or a specific ObsID instrument combo, and that would probably be easier with more distinct layers of dictionaries.

Add the basic structure to the documentation

Set up the general structure, with an installation guide, intro section, contact section etc.

Don't need to make it perfect for this issue, or write any tutorials, but sketch out the framework.

Support the acquisition and reduction of proprietary data

What it says on the tin really. For XMM for instance you need to provide a login and password, and I'll also have to make sure that the proprietary data belonging to a particular user are marked as usable in the fetch_obs_info method, as currently all proprietary observations are marked as unusable.

PN non-imaging mode sub-exposures

As mentioned in issue #34, DAXA cannot currently parse XMM ODF summary files. As such it is difficult to efficiently identify which exposures are in which observing mode, and other details about them.

Currently only PN imaging mode data will be processed by DAXA (though hopefully that will change at some point), but as issue #34 is not implemented, and I don't want to be reading headers for thousands of fits files if I can avoid it, epchain will currently attempt to process every sub-exposure as an imaging mode observation.

This will cause an error for other data modes (like timing for instance):

At the moment I am just going to let them fail, and then catch them further down the line, rather than identifying them a priori and never running the commands in the first place

XMM scheduled and unscheduled observations

I want to ensure that any unscheduled (with U in their exposure identifier rather than S) PN observations are processed by epchain, but its not clear to me whether that is True by default.

You can set the 'schedule' flag in epchain to S or U, but it only triggers if odfaccess=odf rather than oal, which is not explained...

To be honest, exactly what an 'unscheduled' observation is isn't really explained either.

Setup convenience functions to easily set up Archives in particular circumstances

I.e. the simplest could be 'process all available observations from XMM', or 'process all available observations from XMM, Chandra, and eROSITA' (once support for other telescopes is added).

That would provide an archive instance which could be passed into processing functions, both the telescope-specific processing which is generally provided by a particular telescope's software suite, or the planned mission-agnostic processing that I will eventually add to DAXA (issue #17).

Other examples of convenience functions like this could be ones that would assemble an archive from multiple telescopes for observations relevant to some particular sources.

Add previous-process awareness to XMM processing tasks

Need to ensure that things are run in the right order - for instance cif_build must be run before everything, odf_ingest must be run before basically everything else etc.

Currently just rely on the user doing that, but that won't be a permanent state of affairs - I'll make use of the process_success property of Archive to a) check if dependencies have been run, and if they were successful.

Add combined sky-coverage calculation capabilities

This should both be able to assess how much of the sky is covered by a particular set of data, but also produce coverage maps which can be stored alongside the processed datasets to allow for the identification of data relevant to a particular source.

Downloading specific instruments for XMM currently downloads then deletes irrelevant

I intended the downloading of specific instruments to minimise disk usage/bandwidth usage by not downloading data that a user considers to be irrelevant to their use-case/can't (yet) be processed by DAXA. Unfortunately for XMM, the downloading of ODF (observation data files) using the AIO URLs (and thus the AstroQuery interface) for specific instruments is currently impossible, as regardless of the specified instrument, all instrument ODFs are downloaded.

This is happening on the XSA end, and I've sent in a ticket asking if this is an intended behaviour, however whatever the answer ends up being I have to deal with it for the time being. As such the XMMPointed class download behaviour will acquire all instrument data for a given observation.

Then (assuming that this doesn't break any pre-built data processing tasks downstream) it will delete those ODF files which relate to instruments that have NOT been selected by the user.

The documentation is not building on RTD

I will add more information as I explore the issue, but every build of the DAXA documentation on read the docs has failed thus far.

I think its a dependancy versioning problem.

Add an Archive class

Instances of the Archive class will be capable of storing and accessing multiple missions, and will probably be the most user-facing class of this module. They will contain a bunch of convenience methods, and probably the planned sky coverage generation capabilities (issue #18).

NOTE ON ExceptionGroup IN JUPYTER NOTEBOOK

Very limited parts of DAXA use a new Python feature (introduced in 3.11, backported by exceptiongroup module), that allows me to raise a set of exceptions together.

Specifically this is used when Python errors occur during the parallel tasks that run command line SAS tools (and possibly other telescope specific command line tools in the future) - to be clear Python errors shouldn't happen in those parallelised tasks, but if they do an ExceptionGroup is used.

It seems that at the moment (this is true on my setup on the date this issue was created) that Jupyter notebooks do not show the tracebacks properly for ExceptionGroup. For instance in the notebook a test raised ExceptionGroup gives this traceback:

ExceptionGroup: pythony errors (3 sub-exceptions)

Whereas in a script run from terminal this is what you get (and should get):

  • Exception Group Traceback (most recent call last):
    | File "/Users/dt237/code/test_daxa/testo.py", line 12, in
    | success, errors, outs = cif_build(arch)
    | ^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 209, in wrapper
    | raise ExceptionGroup("pythony errors", python_errors)
    | ExceptionGroup: pythony errors (3 sub-exceptions)
    +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    | File "/opt/anaconda3/envs/daxa_dev/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    | result = (True, func(*args, **kwds))
    | ^^^^^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 89, in execute_cmd
    | print(boi)
    | ^^^
    | NameError: name 'boi' is not defined
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    | File "/opt/anaconda3/envs/daxa_dev/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    | result = (True, func(*args, **kwds))
    | ^^^^^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 89, in execute_cmd
    | print(boi)
    | ^^^
    | NameError: name 'boi' is not defined
    +---------------- 3 ----------------
    | Traceback (most recent call last):
    | File "/opt/anaconda3/envs/daxa_dev/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    | result = (True, func(*args, **kwds))
    | ^^^^^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 89, in execute_cmd
    | print(boi)
    | ^^^
    | NameError: name 'boi' is not defined
    +------------------------------------

So just be aware of that!

Start work on the DAXA paper for JOSS

It won't need to be very long, and some liberties can be taken in terms of writing about features that don't exist yet, because this won't be going on arXiv until they do exist.

Add anomalous CCD state checking for MOS

I am basically following the eSAS guide at this point, but checking for CCDs in anomalous states is going to be a good idea.

This should enable filtering based on what states the user considers acceptable as well.

The choice of acceptable states should of course be recorded for the archive.

SAS v21's (upcoming not released) eSAS implementation is quite different from previous versions

I am currently implementing an eSAS-based XMM processing method, and have accidentally found the eSAS v21 manual indexed on google. It indicates that many of the eSAS tools have had their inputs changed considerably to better resemble normal SAS functions (i.e. they'll take arguments to point to specific event lists etc.)

This is obviously great, more control is better, but it does mean that there will be a significant difference in behaviour. As I do not want to lock people into one specific version of SAS if it can be avoided (especially considering that version isn't even out yet) I will have to build two different approaches (though within the same Python function) for SAS v21 and any lower SAS version (though I don't think I will allow any SAS version below v14).

Hopefully won't be too difficult, considering I already identify the installed SAS version in the find_sas function, it'll just be some extra work.

Small-window mode PN processing errors

When running epchain on small-window mode without some extra configuration, it will throw errors when it finds that most of the CCD IME files are missing (small window just uses one). These errors aren't fatal to the epchain process, but they do contaminate the stderr output which DAXA parses to try to find any truly fatal errors.

As such we should identify which CCDs are available a priori and pass that list to the ccds parameter of epchain. Ideally this will eventually be done by parsing the SAS summary file (issue #34), but for now I think I can just search through files in the ODF directory.

XMM data weirdness

This isn't really a question to me, but just XMM in general.

On XSA the observation 0001730401 shows only RGS data being available - the quality report just indicates RGS as well, no EPIC data.

However when I acquire the ODFs I find unscheduled PN observations and scheduled MOS observations - so what gives??

This is being left here mostly as a reminder to myself to try and solve this mystery

Ensure that a new CCF is created if a different analysis date is used

In the case where ccfs already exist but cif_build is run again with a different analysis date set make sure they are overwritten. The date information should be stored somewhere as well.

This will be integrated into the backend datebase I suspect, in some way that I have yet to figure out.

If a CCF is re-created, then presumably reduction should be re-run to be completely valid?

Implement a wrapper for the eSAS espfilter soft-proton filter function

Again following the example of the XMM eSAS manual, I will be using espfilter to find bad time intervals with high levels of soft proton flaring courtesy of the Sun.

In the currently released version of eSAS there is a script called PN-FILTER (and an equivalent MOS implementation) that called espfilter, but the upcoming version of eSAS (per the unreleased manual I found) has removed it so that eSAS adds functions rather than processing scripts to SAS.

I will be attempting to make DAXA compatible with as many versions of SAS/eSAS as possible (whilst remaining consistent) by not using PN-FILTER, and making a espfilter function for DAXA that supports PN and MOS.

I should normalise how DAXA calls emchain and epchain as much as possible

Currently emchain will loop through all available sub-exposures, including unscheduled observations, without any extra intervention. As such the processing of an entire ObsID-MOSX set of data happens as one process.

As epchain has to have the sub-exposures manually specified, each sub-exposure of each observation is processed separately. As such it gets its own success/log/error entry in the Archive records - considerably more granular.

I think I should change emchain's behaviour in DAXA so it is more comparable to how epchain behaves. I can address separate sub-exposures by themselves in emchain (using the exposure argument) - this will also make it easier to check that a particular process for a particular sub-exposure did work when it comes to looking for anomalous CCD states in MOS observations.

cleaned_evt_lists fails for 0099280101 because of emanom and calclosed

Currently no checks are performed at any stage to identify what the filter value of a particular sub-exposure of an observation is, and as such everything is blindly thrown into emanom (if the user chooses to run it). This method will fail for any CalClosed filter data, which then carries through to cleaned_evt_lists because DAXA is trying to create cleaned versions of those evt lists as well and expects there to be an emanom log file, even though CalClosed is not useful observation data for us.

Failed to find or open the following file: (ffopen) toto.in.mos[1]

This is happening during emchain runs, and at another point in the stderr output there is 'sh: lcurve: command not found'

I suspect they might be connected.

lcurve is a part of xronos section of HEASoft, which I may not have selected for my laptop install of HEASoft. This could help me learn which parts of HEASoft are actually required for SAS to work in its entirety.

My ICER install of HEASoft is the whole thing, so I can test running emchain on there to see if the same problem pops up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.