GithubHelp home page GithubHelp logo

desidatamodel's Introduction

desidatamodel

GitHub Actions CI Status Test Coverage Status Documentation Status

Introduction

This product defines the Data Model for DESI in the doc/ subdirectory.

Adding a new file

When you add a new file, you also need to add it to a "toctree" (Table of Contents Tree) of one of the index.rst files at that level or higher.

Full Documentation

Please visit desidatamodel on Read the Docs

License

desidatamodel is free software licensed under a 3-clause BSD-style license. For details see the LICENSE.rst file.

desidatamodel's People

Contributors

akremin avatar araichoor avatar ashleyjross avatar aureliocarnero avatar crockosi avatar dkirkby avatar duanyutong avatar dylanagreen avatar forero avatar geordie666 avatar julienguy avatar moustakas avatar parfa30 avatar rkehoe avatar schlafly avatar srheft avatar weaverba137 avatar zkdtc avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

desidatamodel's Issues

review cross file datamodel consistency

Review the end-to-end dataflow for data model consistency. The same concept should have the same name if it appears in multiple files, e.g. EXTNAME=WAVE vs. WAVELENGTH, or SN vs SNR vs S_N. Identify mismatches and propose necessary changes.

Milky Way Survey outputs datamodel

Hi,

We have currently the working version of the datamodel for some parts of the MWS output https://github.com/segasai/desidatamodel/tree/mwbranch
Currently I put our output files in DESI_SPECTRO_REDUX/SPECPROD/spectra-NSIDE/PIXGROUP/PIXNUM/
But presumably we want to place them in some other folder like
DESI_MWS_REDUX .
What would be the steps necessary to start merging our branch with the main desidatamodel ?
At least all the travis tests are passing. (I also assume that we do want our datamodel merged).

Sergey

Also CC to @callendeprieto

collapsable sections for header keyword tables

It would be helpful if the datamodel could include optional collapsable sections. e.g. we inherit hundreds of keywords from raw data + fiberassign and it would be nice to only show those upon request so that the webpage can focus on the data portions of the files without scrolling through pages and pages of keywords first.

Add surveysim files to datamodel

The files owned by surveysim that probably need documented data models are:

Survey per-tile and per-night statistics written by surveysim.stats.SurveyStatistics.save
Simulated exposure metadata written bysurveysim.exposures.ExposureList.save

(copied from desihub/surveysim#60)

Can we remove "brick" models?

There are still references to brick-based reduction files, especially in the DESI_SPECTRO_REDUX section. Can these be removed? In general, what brick-based models need to be retained, and what can be removed?

Review datamodel for 17.12 consistency

Update datamodel as needed to match the results of software release 17.12 in /project/projectdirs/desi/datachallenge/reference_runs/17.12 .

In some cases where they differ, the datamodel may reflect what we really want. Make sure that the datamodel is clear that this isn't yet implemented, and make sure that there is a ticket for getting the code to implement the desired datamodel.

update datamodel for software release 18.11

Update the data model for software release 18.11 based on files in /project/projectdirs//desi/datachallenge/reference_runs/18.11/

Lots of formats changed, in particular

  • target catalogs
  • fiber assign output
  • FIBERMAP HDU of fibermap, frame, cframe, and spectra files

Add desisurvey files to datamodel

The files owned by desisurvey that probably need documented data models are:

Ephemerides written by desisurvey.ephem.Ephemerides
Design HAs written by desisurvey.scripts.surveyinit.calculate_initial_plan
Afternoon planning state written by desisurvey.plan.Planner.save
Tile scheduler state written by desisurvey.scheduler.Scheduler.save

(copied from desihub/desisurvey#91)

Update Travis configuration and other infrastructure.

  • Merge #23 and clean up old branches.
  • Update the Travis configuration to roughly match that of desiutil.
  • Make sure that documentation builds with --warning-is-error.
  • Update desiutil version (in requirements.txt).
  • Check dependencies needed to build on ReadTheDocs.

Data model for qso afterburner files

Please review the data model for the qso_mgii and qso_qn files in both healpix and tiles directories. The structural description of these files should be complete, but descriptions of the HDUs, header keywords and columns need work. If necessary it is possible for files to refer to other files, or for HDUs to refer to other HDUs, to reduce duplication of effort.

Quicklook to Nightwatch

How similar are Nightwatch outputs to Quicklook? That is, could we get away with renaming the current Quicklook directory and tweaking some of the files? Also, does this transition make #49 moot?

Data model for top-level exposures and tiles files

Please review the data model for the top-level exposures-SPECPROD and tiles-SPECPROD files. The structural description of these files should be complete, but descriptions of the HDUs, header keywords and columns need work.

Compare by key, not by list.

  • Compare HDUs by EXTNAME, not by order in the FITS or model file.
  • Compare HDU keywords by keyword, not by order.
  • Compare Column names by Column name? Can we at least specify the order of columns in a table?
  • This may also make it easier to support required and optional keywords.

Optional HDUs and table columns

Some files may have entire optional HDUs, and, in some situations, data tables may have missing/optional columns. We can add human-readable documentation describing situations and files where HDUs or columns may be absent. However, there also needs to be a machine-readable mechanism for marking optional items.

One suggestion: create an RST comment that is not rendered into the final document, but would nevertheless be machine readable. For example:

.. Optional Columns - COLUMNA, COLUMNB, COLUMNC

tool to check if data model is current

We originally dreamed of having the desidatamodel formats be both human and machine readable such that code could verify if the datamodel files were correct. We found it hard to make something that was human-friendly while also being strict and complete enough for computer parsing. And in the meantime our datamodel has gotten out of sync with our data files. I suggest a pragmatic middle ground:

Write a script to run on a production directory and verify that:

  • every fits file has a corresponding datamodel file
    • in the future this could be expanded to hdf5, yaml, etc. too
  • every HDU in the fits file has a description in the datamodel file (based on EXTNAME)
  • for binary tables, confirm that the columns in the file match the columns in the datamodel

Print a brief report for discrepancies, e.g.

blat/foo/bar/quat.fits missing from datamodel
blat/foo/bar/quiz.fits datamodel missing description of HDU EXTNAME=BIZBAT

Once a file of type X has been checked, all other files of that type could be skipped โ€” i.e. we don't need to verify 300000 frame files all have the same format. I think all of our files have the form directory/prefix-*.* โ€” i.e. you can parse and key off of "prefix" since we don't reuse the same prefix for different kinds of files in different locations.

Even keeping the files and datamodel in sync at that level would be very useful. Checking header keywords, float32 vs. float64, etc. could come later.

The interface might look something like

checkmodel /project/projectdirs/desi/spectro/redux/dc3 --root DESI_SPECTRO_REDUX/PRODNAME
checkmodel /project/projectdirs/desi/target --root DESI_TARGET

I hope this is viable. Actually trying to implement it could reveal any gotchas.

PROTODESI_DATA files

I noticed a few problems with the PROTODESI_DATA files that should be addressed before that branch is merged.

  1. The naming conventions for the various files has a three-digit exposure ID and a 5-digit MJD. Are we absolutely certain that protoDESI will never produce more than 999 exposures, or is the exposure ID reset every day (which seems dangerous)? Also, the rest of the pipeline is using eight-digit YYYYMMDD instead of MJD. Is it possible to use YYYYMMDD for consistency?
  2. The main DESI pipeline has largely abandoned mixed-case filenames in favor of all lowercase.
  3. The files describe FITS files that have binary tables in HDU0, which is not allowed. It appears there is some confusion about the difference between Header Keywords and Column Names.
  4. The index.rst file has incorrect links in the toctree section. The links should simply be e.g. pdFVC not pdFVC/index.

Human-readable directories and filenames from everest

We will likely need to expand the set of human-readable directory and file names. For reference, these are things like NIGHT and EXPID, which stand in for harder-to-read regular expressions. The existing set is here.

It looks like we no longer use NSIDE as of everest, but I'm more interested in what to call:

  1. (sv1|sv2|sv3|main) = SURVEY?
  2. (backup|bright|dark|other) = CONDITIONS?
  3. (cumulative|perexp|pernight) = TILETYPE?
  4. [14]x_depth = TILETYPE or does that have such a different data model that it requires a separate section?

To think about this another way, do we need this structure in the data model:

  • DESI_SPECTRO_REDUX
    • SPECPROD
      • tiles
        • 1x_depth
          • TILEID
            • SPECTROGRAPH
              • various files...
        • 4x_depth
          • TILEID
            • SPECTROGRAPH...
        • cumulative...
        • perexp...
        • pernight....

or can we get away with this:

  • DESI_SPECTRO_REDUX
    • SPECPROD
      • tiles
        • TILETYPE
          • TILEID
            • NIGHT
              • various files....

And similar question for the healpix directory.

different visual style for header keyword tables vs. data tables

Currently both header keyword tables and data tables use the same CSS styling. If sphinx can support it, it would be UX helpful to have them visually distinguishable, e.g. with different background colors so that you can quickly tell when scrolling if you are looking at a data table or a header keyword table.

HDU 0 EXTNAME policy

DESI FITS files EXTNAME requirements:

  • FITS extensions shall have an EXTNAME set
  • Pipeline code shall access FITS extensions by name not by number
  • User code should access FITS extensions by name not by number

i.e. we reserve the right to re-order or insert HDUs; accessing them by name won't break but accessing them by number might.

For discussion: a potential special case is blank HDU 0, where the "real" data are in HDU 1 (with an EXTNAME!) because they are a binary table that the FITS standard doesn't allow in HDU 0, e.g. the targets catalog. In formal FITS nomenclature I don't even think that HDU 0 is an "extension" but the standard doesn't forbid giving it an EXTNAME anyway (thankfully). I suggest our HDU 0 EXTNAME policy should be:

  1. if HDU 0 has blank data and no non-standard header keywords, it should not have an EXTNAME
  2. if HDU 0 has blank data but meaningful header keywords, HDU 0 should have EXTNAME=PRIMARY (Note: should not shall)
  3. if HDU 0 has meaningful data, it shall have an EXTNAME set that should be something more meaningful than "PRIMARY"

I suggest that check_model and unit tests check (3) (the only "shall") and do a best-effort basis on (1) and (2), but if that gets too irritating or there are messy corner cases, don't worry about it.

Comments in PR #68 have a list of known files with HDU 0 that have no EXTNAME.

targets.dat should really be a YAML file.

The data file data/targets/targets.dat should really be a YAML file. As far as I can tell it is already perfectly valid YAML.

This is especially important because the format of targets.dat is significantly different from all other *.dat files in that directory.

Update data model for tiles product

The data model path DESI_TARGET/fiberassign is outdated and should be updated:

  • The ultimate source for fiberassign files is DESI_SPECTRO_DATA/NIGHT/EXPID/fiberassign-EXPID.rst. After those files appear in the raw data, they are checked into the tiles product, so the fiberassign data model in that area can refer to the raw data version.
  • The equivalent path to the actual data tree is DESI_TARGET/tiles/TILES_VERSION/TILEXX/fiberassign-TILEID.rst. DESI_TARGET/tiles is simply a svn checkout of the full tiles product.
  • There may be other files in the tiles product to document.

Ensure EXTNAME is always set

It is always a good thing to set EXTNAME in FITS file HDUs, so that they can referred to by name. However, we haven't been very careful about enforcing this. Therefore I propose:

  1. All data models should have an EXTNAME recorded for every HDU. Stern warning if not.
  2. When generating data models from FITS files, complain if EXTNAME is not set.
  3. When validating FITS files against the data model, complain if EXTNAME is not set in one or the other or both.

'electron' is not a valid unit in the FITS standard

I'm trying to make sure that all units defined in FITS files (BUNIT and TUNIT keywords) conform to the FITS standard. Unfortunately 'electron' is not valid.

See doc/DESI_SPECTRO_REDUX/SPECPROD/exposures/NIGHT/EXPID/calib-CAMERA-EXPID.rst for an example. Would adu be an acceptable substitute?

Remove imaging directory

The top level index.rst file explicitly says that the imaging data model is not included in this package, so there is no point in keeping the doc/imaging directory. Sphinx is complaining about not being able to find a link to this directory.

Allow data model to express sets of very similar HDUs

The DESI raw data files will have a set of 30 HDUs that are more-or-less identical except for the EXTNAME, B0, B1, B2, ..., R1, R2, ..., Z8, Z9. Allow all of the data model metadata to be expressed by a single HDU description, rather than repeat the same HDU description 30 times.

Note that this has nothing to do with the INHERIT keyword in raw data HDUs. Treat that keyword as an ordinary, but not-trivial, required HDU keyword.

Deduplicate fiberassign/fibermap tables

Many data files contain fiberassign/fibermap table HDUs. To the extent possible, and leveraging the optional table and HDU cross-reference capability implemented in #102, reduce the number of separate descriptions of these tables.

list of scalars being tested

In desidatamodel/doc/QUICKLOOK/*.rst, we need the information of which scalar is being tested in *_NORMAL_RANGE and *_WARN_RANGE when the *_STATUS is created. This info was available in older versions of datamodel.

data model issues to check

This is a meta ticket to capture a list of questions leftover from PR #78.

DESISURVEY_OUTPUT

  • No model file for exposures.fits, test-tiles.fits.

DESI_SPECTRO_DATA

  • The datachallenge files are not a useful comparison to the model.
  • See also comments on fibermap files below.

DESI_SPECTRO_REDUX

  • Many files types are not generated by the datachallenge.
  • I'm very worried that passing around huge fibermap/target tables is going to
    result in lots of inconsistencies between files as columns are added or
    removed.

DESI_SPECTRO_SIM

  • No comparison possible for pix and simpix files.
  • Only routine header keyword and column name cleanup are needed.
  • Fibermap files no longer have HDU2, TARGETS.
  • Simspec TRUTH table no longer has CONTAM_TARGET.
  • Why are NUMOBS_INIT and PRIORITY_INIT int64, when NUMOBS_MORE is int32.
    And in any case, why are we planning for 2**32 or 2**64 exposures?
  • There are a number of type changes in simspec, and some columns
    that used to have units no longer do. See the OBSCONDITIONS table especially.

DESI_TARGET

  • Is there any way to more accurately represent the "official" pipeline layout?
    For example, is the position of fiberassign under DESI_TARGET reflective of
    actual operations?
  • The fiberassign data model file is named tile-TILEID-FIELDNUM.rst but the files
    are named tile-TILEID.fits.
  • What happened to the standards files?
  • Why did skies.rst get named that? It's called sky.fits.
    Are these really meant to be the same? There are a lot of changes.
  • mtl files have no units on any column. Some of those columns have to have units.
    In fact, units are missing from basically every DESI_TARGET file.
  • mtl files have NUMOBS_INIT as int64, but it is int32 in other files.
  • targets file do not contain PRIORITY?
  • There are several minor differences among truth, sky, mtl that should be
    checked in detail.
  • Why are ALLMASK_[GRZ] columns float32? The description is pretty clear
    that it should be an integer.
  • SEED keyword in truth table looks dubious. Why not an integer with a comment?
  • Layout of tile file FIBERASSIGN table is radically different from other
    files.
  • Does every HDU in tile have to have the same keywords?

String types not being written out with size.

Recent runs of generate_model are printing char stream instead of e.g. char[8] when describing string-valued columns in binary tables. This may be an artifact of the Numpy version used.

tilemeasure.rst file needs to be fixed

The file doc/DESI_TARGET/fiberassign/tilemeasure.rst was added on June 13 to master by @forero (I think...) but it causes Travis tests to fail with a couple warnings / errors (see below). Please fix or move to a branch while in development!

% sphinx-build doc junk
[snip]
building [html]: targets for 109 source files that are out of date
updating environment: 109 added, 0 changed, 0 removed
reading sources... [100%] index                                                                               
/Users/ioannis/repos/desihub/desidatamodel/doc/DESI_TARGET/fiberassign/tilemeasure.rst:27: ERROR: Unknown target name: "hdu2".
looking for now-outdated files... none found
pickling environment... done
checking consistency... /Users/ioannis/repos/desihub/desidatamodel/doc/DESI_TARGET/fiberassign/tilemeasure.rst:: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] index                                                                                
generating indices... genindex py-modindex
highlighting module code... [100%] desidatamodel.check                                                        
writing additional pages... search
copying static files... done
copying extra files... done
dumping search index in English (code: en) ... done
dumping object inventory... done
build succeeded, 2 warnings.

Data model for calibration files in exposures and calibnight directories

Please review the descriptions in the data model for these files:

  • In DESI_SPECTRO_REDUX/SPECPROD/exposures/NIGHT/EXPID
    • cframe
    • exposure-qa
    • fiberflat
    • fiberflatexp
    • fit-psf
    • fit-psf-before-listed-fix
    • fit-psf-fixed-listed
    • fluxcalib
    • frame
    • psf
    • sframe
    • shifted-input-psf
    • sky
    • stdstars
  • In DESI_SPECTRO_REDUX/SPECPROD/preproc/NIGHT/EXPID
    • preproc
  • In DESI_SPECTRO_REDUX/SPECPROD/calibnight/NIGHT
    • biasnight
    • biasnighttest
    • fiberflatnight
    • psfnight

Note that Issue #106 concerns how to support the somewhat complex structure of the psfnight files in the desidatamodel automation, but is not related to the descriptions of HDUs, header keywords or columns within those files.

add desimodel files to datamodel

The desimodel files are lightly documented in the desimodel/doc directory and in DESI-0847, but we should add more detailed data model documentation to desidatamodel. This would be a new top-level DESIMODEL directory, at the same level as DESI_SPECTRO_REDUX, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.