desihub / desispec Goto Github PK

DESI spectral pipeline

License: BSD 3-Clause "New" or "Revised" License

Python 99.61% Shell 0.37% CSS 0.02%

desispec's Introduction

desispec

Introduction

This package contains scripts and packages for building and running DESI spectroscopic analyses. Details about the algorithms are documented in Guy et al 2023 AJ 165 144G.

Installation

You can install these tools in a variety of ways. Here are several that may be of interest:

Manually running from the git checkout. Add the "bin" directory to your $PATH environment variable and add the "py" directory to your $PYTHONPATH environment variable.

Install (and uninstall) a symlink to your live git checkout:

$>  python setup.py develop --prefix=/path/to/somewhere
$>  python setup.py develop --prefix=/path/to/somewhere --uninstall

Install a fixed version of the tools:

$>  python setup.py install --prefix=/path/to/somewhere

Versioning

If you have tagged a version and wish to set the package version based on your current git location:

$>  python setup.py version

And then install as usual.

Full Documentation

Please visit desispec on Read the Docs

License

desispec is free software licensed under a 3-clause BSD-style license. For details see the LICENSE.rst file.

desispec's People

Contributors

Stargazers

Watchers

desispec's Issues

resample_flux edge effects

test_resample.py now includes a unitttest flagged as expectedFailure because it identifies an edge effect bug in resample_flux. Note how the last resampled bin is way lower that it apparently should be.

import numpy as np
from desispec.interpolation import resample_flux

x = np.arange(0.0, 30)
y = np.sin(x/4)+1
xx = np.linspace(1, 29, 13)
yy = resample_flux(xx, x, y)

clf()
plot(x, y, 'b.-')
plot(xx, yy, 'r.-')

desi_fit_stdstars.py should be flexible about what filters it needs

Issue #89 with flux calibration crashing was due to desi_fit_stdstars.py not finding SDSS_R as a magnitude in the input standard stars. PR #90 fixes this for now, but it raises several to do items:

if desi_fit_stdstars.py can't find a filter that it needs, it should raise an error and stop instead of writing a bogus file
we need unit tests on the underlying code
eventually the real data will have a mix of DECam, Bok, and Mosaic fluxes, and desi_fit_stdstars should work flexibly with what it can find. Exactly how those filters will be named will be determined by the Legacy Survey team, but this week at Argonne is a good chance to talk to them about their future plans for including these filters.

resolution module shouldn't hardcode the number of diagonals

The desispec.resolution module shouldn't hardcode the number of diagonals stored for the resolution matrix when it is constructed from the sparse diagonals data. The fits file format keeps the same number of upper and lower diagonals and the number of total diagonals should be odd.

When constructing from a dense matrix, it does need to pick something for the number of diagonals to keep. Consider making this an optional argument of the constructor.

Write a fast raw to redshifts functional test

Write a fast raw to redshifts functional test, e.g. like the dogwood production on 3-5 fibers.

Wrap that in a cronjob script that updates a "master" branch installation of all code, fetches input test data, runs the script, and report if anything is broken. This should run every night.

Note: this could be integrated into a Travis test, but a basic version running as a cronjob at NERSC will be simpler to start with. Travis might be more appropriate for our lower level unit tests.

io.findfile should raise exception for missing inputs

io.findfile() should raise an exception if one of the inputs required for creating the file path is missing. e.g. desispec.io.findfile('frame', night='20150211', expid=2) is missing the camera input, but it still returns a path to frame-None-00000002.fits (note the bogus "None").

Unfortunately simply changing {camera} to {camera:s} isn't sufficient since that will still parse a None input. Also note that some file types need camera and others don't, so requiring that camera be set also won't work.

Options that occur to me:

Change {camera} to {camera:s} and change default input camera=None to camera=0 so that for the filetypes where camera is required, it must be set or it otherwise won't parse. Ditto for specprod, night, and the misnamed brickid (should be brickname, string not int). The downside is that the default value is of a different type than what you need to supply if you are filling in that variable.
- Variant: Leave defaults as None, convert to 0 prior to parsing. Too fancy?
Add another dictionary defining what inputs are required for each filetype and ensure that they aren't None. Seems fragile to maintain
Parse the location template strings and check for None inputs; messy.

make brick file -> input traceability easier

Fix the ease of traceability from final brick/zbest/zcatalog files back to the original pix/simspec files.

It is currently a pain to go from brick/zbest files back to the original simulated data at the beginning. This is a combination of factors:

It spans the $DESI_SPECTRO_REDUX vs. $DESI_SPECTRO_SIM boundary and we don't currently track that in the redux file metadata. That will be unambiguous for the real data since there is only one real set of raw data, but it is ambiguous for simulated data where the same night/expid combination could appear multiple times in different simulation runs.
The input data are organized by night and exposure id not by brick and targetid. A given targetid in a zbest file could come from multiple nights and expids and it is a pain to look that up. The info is available in the FIBERMAP HDUs of the brick files, but it isn't obvious/easy for casual users to put the pieces together.

Develop tools analogous to findfile that would make it easy to get from a targetid back to the contributing input files of a given step.

brick files missing EXTNAME for fibermap HDU

This is a simple fix, but opening an issue to remember to do it:

desi_make_bricks.py outputs the brick files using desispec.io.brick.Brick. These include a copy of the fibermap entries for these targets, but that HDU is missing the EXTNAME=FIBERMAP keyword.

Package dependencies in desispec

I'm analyzing the dependencies of desispec in order to determine what packages need to be installed during Travis tests.

Is matplotlib strictly necessary for the functioning of the package?
How do we install redmonster in order to run tests requiring that package?

spectro pipeline 2016a software release

For the spectro pipeline software release:

make fresh tags of all packages needed
install on both edison and cori
generate arc + flat + >2 exposures of dark data and >2 exposures of bright data
run pipeline
document how all that was done
update desiModules to have a version representing this set of tags

flux calib tests failing on master

I just pushed a change to etc/cron_dailytest.sh to master, and the travis test reported a failure of the flux calibration tests:

https://travis-ci.org/desihub/desispec/jobs/111545817

======================================================================
FAIL: test_match_templates (desispec.test.test_flux_calibration.TestFluxCalibration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/desihub/desispec/py/desispec/test/test_flux_calibration.py", line 102, in test_match_templates
    self.assertTrue(np.all(bestid>-0.1)) # test if fitting is done, otherwise bestid=-1
AssertionError: False is not true

The tests pass on my laptop and are completely unrelated to the cronjob script update. The tests use numpy.random without fixing the seed, so it is possible that this is a rare case, but we should still fix it.

High priority: debug and fix this.

get docs + docstrings working for sphinx

Related to issue #17 -- get sphinx documentation working for desispec. Add boilerplate stuff as needed to get "python setup.py build_sphinx" to work, or some equivalent solution. When a local build of the documentation works, close this ticket and start a new one for streamlining documentation generation onto github pages, readthedocs, or similar.

spectra as 32-bit instead of 64-bit

Update io.write_frame() to write 32-bit floats, and update io.read_frame() to cast back to 64-bit double for internal calculations. This is a very simple and easily reproducible form of lossy data compression.
Related: We had a long discussion on the desi-data mailing list about the pros and cons of this vs. FPACK tile compression. FPACK can do somewhat better in terms of compression vs. precision lost, but the basic float vs. double compression seems "good enough" for now.

This is similar to what is already done for io.write_image() and io.read_image().

review code/datamodel consistency

Check all fits files in NERSC:/project/projectdirs/desi/spectro/redux/sjb/dogwood/

Is there a corresponding desiDataModel entry? (that's in svn, not github)
Is the file consistent with the data model? Focus on HDUs first, then keywords.
If not, either update the data model to match the file, or update the code to write a file matching the data model. Both might change. Make sure that desispec.io.read/write* and test/test_io.py remain in sync with any changes.

desiDataModel/bin/generate_model is a useful script for creating a new data model for files that don't have one yet.

write unit tests

Most desispec code does not have unit tests. Write some stub tests for most code, perhaps spinning off additional pull requests for others to fill in particular tests. Even super basic tests to confirm that functions don't raise exceptions when given valid inputs would be useful -- the dogwood spectral production run found several such problems by running the full pipeline; tests could catch those before bothering with batch jobs.

Implement classification & redshift fitting

Experimenting with opening an issue ahead of time for comment before I start coding it.

Implement classification and redshift fitting by writing a desispec wrapper for redmonster. The first version of this will only use the redmonster fast loglamda mode. Since coadds aren't in loglambda, I'll have to rebin the spectra on the fly. Current coadd code is a bit slow and memory hungry, so I plan to start with using desispec.interpolation.resample_flux() even though that introduces covariance. After the redshift fitting itself is worked out, we can come back to the coadded loglambda inputs.

extraction bias due to estimated inverse variance being correlated with flux

With the current implementation of pixel level simulation and extraction, output fluxes are biased because of a correlation between inverse variance (ivar) and fluxes in CCD images due to Poisson noise.

new field in image datamodel ?
One way to fix this is to weight pixels according to readout noise only, but this means tracking the readout noise scaled by the pixel level flat field along with the inverse variance. This would be a change in the data model of images (we cannot drop totally the ivar field because we still need it for chi2 evaluation).
- evaluate loss of optimality
  Some work is needed to evaluate if we can afford or not the loss of optimality with this simple readout noise weighting. It would be interesting to run 2 extractions of the same simulated images to test this even if a analytical formula exists.

This issue is also posted in specter where the actual extraction is performed.

cosmic ray rejection

Hi,
I am starting a branch for cosmic ray rejection.
This will contain py/desispec/cosmics.py and a binary code.
I am implementing the SDSS routine and testing new algorithmes.

Create `io.download()` function.

We need a io.download() function to handle downloads of DESI data from the main repository. It should:

Be aware of the structure of the DESI repository.
Be aware of files that have already been downloaded (i.e. have a cache).
Accept a single file or list of files.

Example function signature:

def download(filenames,baseurl='http://portal.nersc.gov/project/desi'):
    """Download files from DESI repository.
    """
    if isinstance(filename,str):
        filenames = [filenames]
    # ...

fix flux calibration crash

desi_compute_fluxcalibration.py is crashing; it appears that is is having trouble finding the input standard star templates. How templates are generated and read recently changed on the desisim side; this may be related.

This would be the top priority ticket except that this probably isn't needed for the Argonne workshop this week. Argonne workshop needs get top priority, then this.

[edison ~] python -m pdb $DESISPEC/bin/desi_compute_fluxcalibration.py           --infile /scratch1/scratchdirs/sjbailey/desi/spectro/redux/dailytest/exposures/20151101/00000002/frame-b0-00000002.fits --fibermap /scratch1/scratchdirs/sjbailey/desi/spectro/sim/dailytest/20151101/fibermap-00000002.fits --fiberflat /scratch1/scratchdirs/sjbailey/desi/spectro/redux/dailytest/calib2d/20151101/fiberflat-b0-00000000.fits --sky /scratch1/scratchdirs/sjbailey/desi/spectro/redux/dailytest/exposures/20151101/00000002/sky-b0-00000002.fits           --models /scratch1/scratchdirs/sjbailey/desi/spectro/redux/dailytest/exposures/20151101/00000002/stdstars-sp0-00000002.fits --outfile /scratch1/scratchdirs/sjbailey/desi/spectro/redux/dailytest/exposures/20151101/00000002/fluxcalib-b0-00000002.fits
> /project/projectdirs/desi/software/edison/desispec/master/bin/desi_compute_fluxcalibration.py(9)<module>()
-> """
(Pdb) c
INFO:desi_compute_fluxcalibration.py:49:main: read frame
INFO:desi_compute_fluxcalibration.py:53:main: apply fiberflat
INFO:fiberflat.py:259:apply_fiberflat: starting
INFO:fiberflat.py:288:apply_fiberflat: done
INFO:desi_compute_fluxcalibration.py:60:main: subtract sky
INFO:sky.py:202:subtract_sky: starting
INFO:sky.py:214:subtract_sky: done
INFO:desi_compute_fluxcalibration.py:67:main: compute flux calibration
INFO:desi_compute_fluxcalibration.py:80:main: star fibers= [2]
INFO:fluxcalibration.py:212:compute_flux_calibration: starting
Traceback (most recent call last):
  File "/scratch1/scratchdirs/kisner/software/hpcports_shared_gnu/python-2.7.9_d371bbbe-7.1/lib/python2.7/pdb.py", line 1314, in main
    pdb._runscript(mainpyfile)
  File "/scratch1/scratchdirs/kisner/software/hpcports_shared_gnu/python-2.7.9_d371bbbe-7.1/lib/python2.7/pdb.py", line 1233, in _runscript
    self.run(statement)
  File "/scratch1/scratchdirs/kisner/software/hpcports_shared_gnu/python-2.7.9_d371bbbe-7.1/lib/python2.7/bdb.py", line 400, in run
    exec cmd in globals, locals
  File "<string>", line 1, in <module>
  File "/project/projectdirs/desi/software/edison/desispec/master/bin/desi_compute_fluxcalibration.py", line 9, in <module>
    """
  File "/project/projectdirs/desi/software/edison/desispec/master/bin/desi_compute_fluxcalibration.py", line 89, in main
    fluxcalib = compute_flux_calibration(frame, fibers, model_wave, model_flux)
  File "/project/projectdirs/desi/software/edison/desispec/master/py/desispec/fluxcalibration.py", line 225, in compute_flux_calibration
    model_flux[fiber]=resample_flux(stdstars.wave,input_model_wave,input_model_flux[fiber])
  File "/project/projectdirs/desi/software/edison/desispec/master/py/desispec/interpolation.py", line 130, in resample_flux
    return _unweighted_resample(xout, x, flux)
  File "/project/projectdirs/desi/software/edison/desispec/master/py/desispec/interpolation.py", line 195, in _unweighted_resample
    ty=np.interp(tx,ix,iy)
  File "/scratch1/scratchdirs/kisner/software/hpcports_shared_gnu/numpy-1.9.2_a439d69e-7.1/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1191, in interp
    return compiled_interp(x, xp, fp, left, right)
ValueError: object of too small depth for desired array
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /scratch1/scratchdirs/kisner/software/hpcports_shared_gnu/numpy-1.9.2_a439d69e-7.1/lib/python2.7/site-packages/numpy/lib/function_base.py(1191)interp()
-> return compiled_interp(x, xp, fp, left, right)
(Pdb) u
> /project/projectdirs/desi/software/edison/desispec/master/py/desispec/interpolation.py(195)_unweighted_resample()
-> ty=np.interp(tx,ix,iy)
(Pdb) u
> /project/projectdirs/desi/software/edison/desispec/master/py/desispec/interpolation.py(130)resample_flux()
-> return _unweighted_resample(xout, x, flux)
(Pdb) u
> /project/projectdirs/desi/software/edison/desispec/master/py/desispec/fluxcalibration.py(225)compute_flux_calibration()
-> model_flux[fiber]=resample_flux(stdstars.wave,input_model_wave,input_model_flux[fiber])
(Pdb) print input_model_flux.shape
(1,)
(Pdb) print input_model_flux
[30787]

sky model file name does not match datamodel

The following line in io.meta.findfile():

sky = '{specprod_dir}/exposures/{night}/{expid:08d}/sky-{camera}-{expid:08d}.fits',

does not match the naming scheme in the datamodel:

https://github.com/desihub/desidatamodel/blob/master/doc/DESI_SPECTRO_REDUX/PRODNAME/exposures/NIGHT/EXPID/skymodel-CAMERA-EXPID.rst

Probably the datamodel should be updated.

Create `io.findfile()` function

We need a io.findfile() function that will:

Be aware of the DESI file hierarchy.
Work at NERSC & on someone's laptop.
Download files as needed, using io.download (see #48).
Optionally, accept a file type, with parameters, rather than a full file name.

Load file metadata into database

Note: the branch 'metadata-db' has been created to address this issue.

We want a metadata database to track the relationships among nights, exposures, bricks and files. The schema of such a database is described in the file etc/file_db.sql. We still need code to:

Register files at file creation time.
Load a set of files after creation (i.e. do the initial population of the database).
Encapsulate commonly-used queries.

Based on the schema in the file mentioned above, we know how to fill in everything except these columns:

exposure.telra, exposure.teldec. These are supposed to be in the pix*.fits files but appear to be missing.
exposure.tileid. This should also be in a header somewhere.

combine_ivar doesn't work as advertised

The test in desispec.test.test_util.TestNight.test_combine_ivar() performs this test:

        ivar = util.combine_ivar(1, 2)

desispec.util.combine_ivar tries to convert the scalar values to an array, using

    iv1 = np.asarray(ivar1)  #- handle list, tuple, ndarray, and scalar input
    iv2 = np.asarray(ivar2)

However, for scalar input, np.asarray() returns a numpy scalar, which cannot be indexed. When the combined inverse variance is computed,

    ivar[ii] = 1.0 / (1.0/iv1[ii] + 1.0/iv2[ii])

the indexing of the scalar values throws an exception.

It is very strange that this test ever passed.

failing coadds

@dkirkby : Branch test-coadds introduces some basic unit tests of the coadditions using the coaddition.Spectrum class. These fail, choking on the self.Cinv += other.Cinv accumulation: apparently csc_matrix supports x + y but not x += y.

It is possible that I'm just misunderstanding how Spectrum should be used since I thought I previously had successfully run coadds, though I don't see how those could have worked given the current problem.

copy of classes : should be use copy.deepcopy(class instance)?

it is useful to make copies of data in codes.
I am doing for now
import copy
tmpimg=copy.deepcopy(img)
where img is a desispec.Image for instance

I am not sure it's the correct way to do

simplify resolution matrix input

io.read_frame() currently returns the sparse resolution matrix data, requiring the user to convert this into an actual matrix for use (via resolution.Resolution(), which replaces the deprecated io.frame.resolution_data_to_sparse_matrix() ). Some codes even do this multiple times in different places for the same fiber.

Should io.read_frame() just directly convert the 3D resolution matrix data[fiber, diag, wave] into a list of Resolution objects since that is what the user needs in the end? Is there a reason to keep it as the 3D data?

Write brick files from fibermap and cframe files

Nominal brick file data model is here.

update fibermap data model

Review fibermap datamodel compared to upstream steps. For any variables that have the same meaning, make them also have the same name and datatype. e.g. TARGETMASK0 -> DESI_TARGET, and add BGS_TARGET and MWS_TARGET.

robustness of downloading a file that doesn't exist

If you try to desispec.io.download() a file that doesn't exist on the remote server, you get a KeyError: 'last-modified' exception, which isn't obvious that the cause was that the file doesn't exist on the server. Additionally, it does leave a local file behind with HTML instead of data (it tells you that the file wasn't found, but at first glance it looks like it downloaded the file and raised and exception anyway).

Suggested change:

attempting to download a file that doesn't exist should raise an IOError("File not found on remote server")
Failing to download a file should cleanup after itself and not leave local HTML files around with .fits extensions, etc.

automate spectro pipeline

Automate spectro pipeline:

write code to determine what commands should be run in what order to process a night's worth of data
generate slurm scripts to process those commands in parallel
OK to have this as fairly NERSC / slurm specific for now

This will use mpi4py for parallelism, but that should be an optional dependency — it is only needed for the parallel running itself, not for the planning. And if mpi4py isn't available, a serial processing of the commands should be available.

@tskisner is working on this in the automate branch; this issue is part of collecting tasks to do for the spectro pipeline software release.

scripts shouldn't end in .py

This proposed change is embarrassingly minor, but it could create confusion for anyone trying to use pipeline code, so advertising it via an issue prior to implementation:

Our coding guidelines say that command line scripts should not end with .py (code you import ends in .py; scripts you run do not). I'd like to fix this for all our scripts prior to documenting the next major pipeline production and prior to writing the pipeline wrapper code that will be calling all these scripts.

include fibermap in frame and cframe files

The original fibermap is included in the raw data directory $DESI_SPECTRO_DATA/{night}/fibermap-{expid}.fits . Spectro pipeline outputs are written to a different directory structure under $DESI_SPECTRO_REDUX , make it somewhat of a pain to be crossing directory hierarchies to get back to the fibermap file.

To simplify this, include the fibermap as a table in the extracted frame files, and also propagate into the cframe files. This replicates data, but is similar to what BOSS does and what we have done for the brick files for convenience. This makes the c/frame files more stand-alone useful, and makes the redux directory stand-alone useful after the spectral extraction step.

Probably include only the fibermap entries for the fibers in each frame, e.g. frame-b0* would have fibers 0-499, while frame-b1* would have 500-999, etc.

Include the fibermap in b*, r*, z* so that each can be handled independently, and to provide datamodel robustness for the case when one of the channels is unavailable.

Currently we use the desi-agnostic 'exspec' script in specter for the extractions. This should be wrapped by desispec to provide the desi-specific functionality of propagating the fibermap. This will require some script -> module refactoring of exspec.

fiberflat problems when both resolution and throughput vary

When the input to compute_fiberflat() varies by both throughput and resolution, then there are edge effects which cause the output fiberflat to not be ~1 * throughput. Attached is an example plot, and below is the code demonstrating this. If you only vary the resolution but not the throughput, it does not have this problem.

from desispec.resolution import Resolution
from desispec.frame import Frame
from desispec.fiberflat import FiberFlat
from desispec.fiberflat import compute_fiberflat, apply_fiberflat

#- Setup sine wave shaped fiber flats with differing throughput
nspec = 10
nwave = 100
wave = np.linspace(0, np.pi, nwave)
y = np.sin(wave)
flux = np.tile(y, nspec).reshape(nspec, nwave)
ivar = np.ones(flux.shape)
mask = np.zeros(flux.shape, dtype=int)

flux[1] *= 1.1
flux[2] *= 1.2
flux[3] /= 1.1
flux[4] /= 1.2

#- Setup resolution matrix data with a varying sigma
sigma = np.linspace(2, 10, nwave*nspec)
ndiag = 21
xx = np.linspace(-ndiag/2.0, +ndiag/2.0, ndiag)
Rdata = np.zeros( (nspec, len(xx), nwave) )
for i in range(nspec):
    for j in range(nwave):
        kernel = np.exp(-xx**2/(2*sigma[i*nwave+j]**2))
        kernel /= sum(kernel)
        Rdata[i,:,j] = kernel

#- Convolve with the resolution matrix
convflux = np.empty_like(flux)
for i in range(nspec):
    convflux[i] = Resolution(Rdata[i]).dot(flux[i])

#- Run the fiberflat code
frame = Frame(wave, convflux, ivar, mask, Rdata, spectrograph=0)
ff = compute_fiberflat(frame)

%pylab
plot(wave, ff.fiberflat[0])
plot(wave, ff.fiberflat[1] / 1.1)
plot(wave, ff.fiberflat[2] / 1.2)
plot(wave, ff.fiberflat[3] * 1.1)
plot(wave, ff.fiberflat[4] * 1.2)

subplot(211)
for i in range(5):
    plot(wave, convflux[i])

subplot(212)
for i in range(5):
    plot(wave, ff.fiberflat[i])
ylabel('fiber flat')

for y in (1/1.2, 1/1.1, 1.0, 1.1, 1.2):
    axhline(y, ls=':', color='0.5')

document conda installation for spectro pipeline 2016a

Document conda installation procedure for all code needed for spectro pipeline 2016a, starting from something like

conda create -n desi2016a python=2.7 astropy=1.1.1 scipy=0.17.0

Include documentation about what environment variables need to be set and how to run the code on a laptop.

HARP+specex for PSF estimation may be the trickiest part of this. Note that only a subset of HARP needs to be installed for this, so make that clear.

Importing redmonster during documentation builds

Refactor the code in desispec.zfind.redmonster so that documentation can be built successfully even if redmonster is not in $PYTHONPATH.

rename envnvar $PRODNAME to $SPECPROD

rename $PRODNAME to $SPECPROD to have better symmetry with $PIXPROD.

Pixel simulation output to $DESI_SPECTRO_SIM/$PIXPROD = $DESI_SPECTRO_DATA.

Spectro pipeline outputs to $DESI_SPECTRO_REDUX/$SPECPROD.

The idea is that $DESI_SPECTRO_REDUX is set once by the DESI environment configuration to a canonical location, and then different production runs (with different pipeline versions, different simulation inputs, different parameters, etc.) set $SPECPROD to keep them separated.

sky test is failing in cholesky wrapper, but wrapper is missing docs and tests

In the test_sky branch I added a simple compute_sky() test. It is failing with an error inside desispec.linalg.cholesky_solve_and_invert():

Traceback (most recent call last):
  File "/Users/sbailey/desi/git/desispec/py/desispec/test/test_sky.py", line 38, in test_uniform_resolution
    skyflux, skyivar, skymask = compute_sky(self.wave, flux, self.ivar, Rdata)
  File "/Users/sbailey/desi/git/desispec/py/desispec/sky.py", line 126, in compute_sky
  File "/Users/sbailey/desi/git/desispec/py/desispec/linalg.py", line 35, in cholesky_solve_and_invert
    raise Exception("covariance has NaN")
  Exception: covariance has NaN

The inputs do not have NaN, but that code has no docstring to guide what the inputs are supposed to be. It is quite possible that the test itself has problematic input.

Please:

Add docstrings to the desispec.linalg functions so that we know what the inputs and outputs are supposed to be
Add some unit tests to confirm that they work
Get sky unit test working or punt it back to me if you think I'm doing something wrong

Note that cholesky_solve_and_invert has the line:

inv,status=potri(L,lower=(not lower)) # 'not lower' is not a mistake, there is a BUG in scipy!!!!

At some point (already?) that bug will get fixed and then this code will break, but we need a test to alert us when that happens...

Merging test_sky created a new dependency on pylab

This test failed because of a dependency on pylab:
https://travis-ci.org/desihub/desispec/jobs/61688779

When I look at test_sky.py, I only see the import pylab statement, but pylab does not appear to be used anywhere. Why was this import included?

streamline io.read_* outputs

Most of the desispec.io.read_* functions return an inconveniently long tuple of outputs. Using it interactively almost always requires looking up the arguments to figure out which arguments are which. e.g.

flux, ivar, wave, resolution_data, header = io.read_frame(filename)

Update these to return lightweight classes instead that contain the equivalent member variables, e.g.

spectra = io.read_frame(filename)
plot(spectra.wave, spectra.flux[0])

My style preference is for these to be super-lightweight classes that are basically syntactic sugar compared to just returning dictionary with keys of "wave", "flux", "ivar", etc. They could provide some minimal convenience functions, but the intention is that these are not containers for core algorithms. They should expose the raw numpy arrays just like the current io.read_* routines return raw numpy arrays.

This change will break a lot of code, so it might be best to wait for basic functional unit tests to be in place first. After making the change, if it breaks code that wasn't caught by a test, that should be added as a test too.

Bitmask set/get/has?

@sbailey , will you add these functions to Bitmask?

Otherwise; do you have example code segments setting masks that I can reference?

I am thinking the scenario of more than 64bits -- Is there a preferred way of dealing with that?

add checksums and code versions to output files

All io.write* functions should use the checksum=True option when writing FITS files to automatically add DATASUM and CHECKSUM keywords to the headers.

Add code versions to the headers. This may require some creative naming to keep within 8 characters.
Make this semi-optional. e.g. if desimodel is configured, include a VDSIMODEL keyword with the desimodel version that is set up. If it isn't set up, don't crash (e.g. writing a frame file doesn't actually require desimodel). Codes that might be considered for caching their version number:

desispec
desimodel
desisim
redmonster
specex
harp

Note: some (all?) of those may not actually provide a version. Fix that too if needed.

resample_flux doesn't conserve flux

desispec.interpolation.resample_flux() doesn't conserve flux, while the docstring says that it does. py/test/test_resample.py contains a failing unit test to show this.

python py/desispec/test/test_resample.py

test_fiberflat needs more tests

py/desispec/test/test_fiberflat.py includes two tests flagged as expected failures since they haven't been written yet. Fill these in. The "test_resolution" test that does exist can serve as a template for a non-trivial test (that found a non-trival bug that has since been fixed).

#- Tests to implement.  Remove the "expectedFailure" line when ready.
@unittest.expectedFailure
def test_throughput(self):
    """
    Test that spectra with different throughputs but the same resolution
    result in the expected variations in fiberflat.
    """
    raise NotImplementedError

@unittest.expectedFailure
def test_throughput_resolution(self):
    """
    Test that spectra with different throughputs and different resolutions
    result in fiberflat variations that are only due to throughput.
    """
    raise NotImplementedError

logging usage

Add python logging.DEBUG, .INFO, .WARN, .WARNING, .ERROR, .CRITICAL, .FATAL to desispec.log module to make it easier to log.get_logger(log.INFO) without having to import both desispec.log and logging just to get the log level to pass into log.get_logger().

Add support for an optional $DESI_LOGLEVEL environment variable in log.get_logger() to override the log level. This will let scripts temporarily turn on debugging by setting an environment variable without having to change any code.

grep for "print" in all code and consider whether each would be better as a log call instead.

Review all code for docstrings

All functions should have a docstring with sufficient detail to know what the inputs, outputs, and side effects are. It shouldn't be necessary to read the code to figure out the units, dimensionality, order of return values etc. Follow the docstring guidelines from Google:

https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

Implement co-addition

There are two stages:

Co-add observations of the same object within a per-band [brz] brick file to write a coadd file with the same structure as the input brick file. This step preserves the input wavelength grid. How should we record the (NIGHT,EXPID) provenance of each co-add in the coadd file? Would we ever not use all the available exposures in the input brick files? What is the maximum number of possible exposures? (needed to set a FITS table array size).
Combine co-adds of each band [brz] into a single spectrum, rebinned to an arbitrary (nominally log-lambda) wavelength grid. Does this need to repeat the HDU4 table info of the previous step?

Unless there are objections, the co-add outputs will go into:

$DESI_SPECTRO_REDUX/$PRODNAME/bricks/{BRICKID}/coadd-{BAND}-{BRICKID}.fits
$DESI_SPECTRO_REDUX/$PRODNAME/bricks/{BRICKID}/coadd-{BRICKID}.fits

desi_zfind.py option to limit what classes are considered

currently desi_zfind.py wraps redmonster and it runs all templates on all objects. That can be overkill for testing and doesn't make sense for running QSO templates (slow!) on BGS and MWS targets that are super bright and almost certainly not a QSO. Maybe something like:

desi_zfind.py ... --templates galaxy,qso,star

Backward-compatible Sphinx configuration

In PR #31, the conf.py file has been changed to assume that the Sphinx Napoleon extension is installed. This is a safe assumption for Sphinx >= 1.3. Can we assume that everyone in DESI who wants to build documentation is already using 1.3? For example, MacPorts has Sphinx 1.3.1.

The alternative is to include a copy of the napoleon code in desispec itself.

bootcalib : add options for flexibility in desi_bootcalib.py

Add options to only extract traces with a continuum frame, or only do wavelength solution with a arc lamp frame (using a psf file as input for the trace locations) ...

We need a code to be able to run with a single exposure for quick analysis.

bootcalib : lamp lines for r,z camera, add configuration file for flexibility

In current version of py/desispec/bootcalib.py , the lines used for the wavelength solution are hardcoded,
and implemented only for the 'b' camera.

Need to extend this to other cameras
A configuration file with all the parameters stored would be useful

(yaml or json) with a default configuration file in the repository.

I found useful to have a default config. file automatically found in the package without the use of environment variables. We can get the absolute path of the loaded module with

import desispec
import os.path
libdir=os.path.dirname(os.path.abspath(desispec.__file__))

and then find the data directory from there.