desihub / desitarget Goto Github PK

View Code? Open in Web Editor NEW

18.0 51.0 23.0 159.13 MB

DESI Targeting

License: BSD 3-Clause "New" or "Revised" License

Python 99.27% Shell 0.04% Fortran 0.69%

desitarget's Introduction

desitarget

Introduction

This package contains scripts and packages for selecting DESI targets from photometric catalogs.

Installation

You can install these tools in a variety of ways. Here are several that may be of interest:

Manually running from the git checkout. Add the "bin" directory to your $PATH environment variable and add the "py" directory to your $PYTHONPATH environment variable.

Install (and uninstall) a symlink to your live git checkout:

$>  python setup.py develop --prefix=/path/to/somewhere
$>  python setup.py develop --prefix=/path/to/somewhere --uninstall

Install a fixed version of the tools:

$>  python setup.py install --prefix=/path/to/somewhere

Versioning

If you have tagged a version and wish to set the package version based on your current git location:

$>  python setup.py version

And then install as usual

Full Documentation

Please visit desitarget on Read the Docs

License

desitarget is free software licensed under a 3-clause BSD-style license. For details see the LICENSE.rst file.

desitarget's People

Contributors

Stargazers

Watchers

desitarget's Issues

Issues with DR2 tractor files

Hi all!

I am not sure if this is a right place to mention this issue, but because I had previously talked to @moustakas and @dstndstn about the DR2 Tractor files and desitarget, I thought it would be good to mention it here (we can delete it later).

When I tried to use the desitarget package to combine the tractor files of DECaLs data (namely DR2), I got the TypeError: invalid type promotion when the script wanted to do
targets = np.concatenate(targets)
in the select_targets function that is defined in desitarget/py/cuts.py.

This error occurs when unconcatenated targets do have different dtypes. So it made me think that maybe different tractor files have different dtypes. To this end, I wrote a code (please see the following) to check if different files have different dtype than the user-defined one. It turned out that these two bricks have different 'BRICKNAME' dtype:

247/tractor-2474p302.fits : [('BRICKNAME', '>f8')]

180/tractor-1806p150.fits : [('BRICKNAME', '>f8')]

Which make the code fail during target selection. Finally, after looking into these two files, I realized there were no data in them (they are empty bricks!?)

from __future__ import division
import os, sys
import numpy as np
from time import time

sys.path.append('/global/homes/m/mehdi/github/desitarget/py')
from desitarget import io
from desitarget.internal import sharedmem

# check different files in parallel
def test_targets(infiles, numproc=4, verbose=False, psfmag=False):
    #- Convert single file to list of files
    if isinstance(infiles,str):
        infiles = [infiles,]

    #- Sanity check that files exist before going further
    for filename in infiles:
        if not os.path.exists(filename):
            raise ValueError("{} doesn't exist".format(filename))

    #- function to run on every brick/sweep file
    def _select_targets_file(filename):
        '''check if filename has the same dtype as the user-defined one'''
        from desitarget import io
        # user-defined columns
        tscolsn = dtlist = ['BRICKID','BRICKNAME', 'OBJID',
        'BRICK_PRIMARY', 'TYPE','RA','DEC','DCHISQ', 
        'DECAM_FLUX', 'DECAM_FLUX_IVAR', 'DECAM_MW_TRANSMISSION',
        'DECAM_FRACFLUX','DECAM_FRACMASKED','DECAM_PSFSIZE',
        'WISE_FLUX', 'WISE_FLUX_IVAR',
        'WISE_MW_TRANSMISSION','DECAM_DEPTH']
        #
        #
        dtlist = np.array([('BRICKID', '>i4'), ('BRICKNAME', '|S8'), ('OBJID', '>i4'), 
        ('BRICK_PRIMARY', '|b1'), ('TYPE', '|S4'), ('RA', '>f8'), ('DEC', '>f8'),
        ('DCHISQ', '>f4', (5,)), ('DECAM_FLUX', '>f4', (6,)), ('DECAM_FLUX_IVAR', '>f4', (6,)),
        ('DECAM_MW_TRANSMISSION', '>f4', (6,)), ('DECAM_FRACFLUX', '>f4', (6,)),
        ('DECAM_FRACMASKED', '>f4', (6,)), ('DECAM_PSFSIZE', '>f4', (6,)),
        ('WISE_FLUX', '>f4', (4,)), ('WISE_FLUX_IVAR', '>f4', (4,)), 
        ('WISE_MW_TRANSMISSION', '>f4', (4,)), ('DECAM_DEPTH', '>f4', (6,))])
        #
        fn = io.read_tractor(filename, columns=tscolsn)
        ndtypes = fn.dtype.descr
        diffrnc = np.setdiff1d(np.array(ndtypes), dtlist).flatten().tolist()
        if len(diffrnc) != 0:
            print "%s : %s"%(filename.rsplit('tractor/')[1], str(diffrnc))
        return 0

    #- Parallel process input files
    if numproc > 1:
        pool = sharedmem.MapReduce(np=numproc)
        with pool:
            targets = pool.map(_select_targets_file, infiles)
    else:
        targets = list()
        for x in infiles:
            targets.append(_select_targets_file(x))

# program starts
src = sys.argv[1]
infiles = io.list_sweepfiles(src)
if len(infiles) == 0:
    infiles = io.list_tractorfiles(src)
if len(infiles) == 0:
    print('FATAL: no sweep or tractor files found')
    sys.exit(1)

import multiprocessing
nproc = multiprocessing.cpu_count() // 2

test_targets(infiles, numproc=nproc)

targetselection.py doesn't work with astropy 1.0.5

With astropy 1.0.5, targetselection.py dies with:

$ python $DESITARGET/bin/targetselection.py /data/legacysurvey/dr1/tractor/000/tractor-0009p187.fits blat.fits
Traceback (most recent call last):
  File "/Users/sbailey/desi/git/desitarget/bin/targetselection.py", line 44, in <module>
    main()
  File "/Users/sbailey/desi/git/desitarget/bin/targetselection.py", line 35, in main
    mask = cut.apply(candidates)
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 145, in apply
    result[s] = v.visit(self)
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 369, in visit
    return self.m[t](node)
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 415, in visit_expr
    ops = [self.visit(a) for a in node.operands]
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 369, in visit
    return self.m[t](node)
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 415, in visit_expr
    ops = [self.visit(a) for a in node.operands]
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 369, in visit
    return self.m[t](node)
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 409, in visit_column
    return node.apply(self.array, self.s)
  File "/Users/sbailey/desi/git/desitarget/py/desitarget/internal/npyquery.py", line 243, in apply
    return array[self.name][s]
  File "/Users/sbailey/anaconda/envs/numpy1.9/lib/python2.7/site-packages/astropy/io/fits/fitsrec.py", line 480, in __getitem__
    return self.field(key)
  File "/Users/sbailey/anaconda/envs/numpy1.9/lib/python2.7/site-packages/astropy/io/fits/fitsrec.py", line 620, in field
    field = _get_recarray_field(base, name)
  File "/Users/sbailey/anaconda/envs/numpy1.9/lib/python2.7/site-packages/astropy/io/fits/fitsrec.py", line 1138, in _get_recarray_field
    field = np.recarray.field(array, key)
  File "/Users/sbailey/anaconda/envs/numpy1.9/lib/python2.7/site-packages/numpy/core/records.py", line 542, in field
    res = fielddict[attr][:2]
KeyError: 'brick_primary'

This works with astropy 1.0.1. I'll dig deeper and submit an astropy bug report and/or see if there is a workaround for desitarget.

What is DESI_TARGET == 1 supposed to mean?

@forero?

In relation to these lines in targets.py.

Is DESI_TARGET == 1 meant to mean that the target is in the dark time survey or that it is in any of the surveys? From its use everywhere else I was expecting the former but that line in calc_nobs made me wonder if I might be missing something.

Asked another way, should MWS and BGS targets have DESI_TARGET == 1 or 0?

fix incorrect QSO selection test

Summary: desitarget.test.test_cuts function test_single_cuts is incorrectly testing that QSO target selection does not depend upon deltaChi2 nor wise_snr. Fix that.

Background: originally cuts.isQSO allowed primary and objtype (psftype) to be optional parameters as a minor convenience for template generation so that it could focus on the fluxes and not have to also pass in arrays of primary=True and objtype='PSF'. A test confirmed that the selection would work with or without them.

Current problem: Since then, cuts.isQSO gained two more optional parameters deltaChi2 and wise_snr. The tests were expanded to include those as optional, but incorrectly is testing that they don't make any difference:

        qso1 = cuts.isQSO(gflux=gflux, rflux=rflux, zflux=zflux, w1flux=w1flux, w2flux=w2flux,
                          objtype=psftype, primary=primary, deltaChi2=deltaChi2, wise_snr=wise_snr)
        qso2 = cuts.isQSO(gflux=gflux, rflux=rflux, zflux=zflux, w1flux=w1flux, w2flux=w2flux,
                          objtype=None, primary=None, deltaChi2=None, wise_snr=None)
        self.assertTrue(np.all(qso1==qso2))

This accidentally passes for some but not all of the test sweep files in test/t. It turns out that our tests use os.walk under the hood to grab the first sweep file it finds, and which file gets selected depends upon where the code is installed. I accidentally tripped over this yesterday when I found that the identical code installed on edison $SCRATCH failed tests while it worked when installed under $HOME... :(

Also note that cuts.isQSO uses wise_snr but cuts.isQSO_randomforest does not.

To do:

Decide if these really should be optional parameters, or whether they should be required since they actually are used in the cuts, even if that is a minor inconvenience for template generation.
- Stephen's opinion: primary feels like a deprecated leftover of an earlier imaging data release; I'm ok with that being optional. The others probably should become genuinely required arguments.
Fix the tests

Target class mask datatype in targets.py

In processing the mws_galaxia mocks, the call to mask.names() here in desitarget.targets raises the following exception:

    239             bitnum = 0
    240             while bitnum**2 <= mask:
--> 241                 if (2**bitnum & mask):
    242                     if bitnum in self._bits.keys():
    243                         names.append(self._bits[bitnum].name)

TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

seemingly because the values that are being passed as tc are scalar np.(u)int64s. This only occurs for larger bitmask values, in this specific case when tc == 65536 (the 'very faint' selection in the mws_galaxia mocks): The relevant lines are:

    target_class_names = np.zeros(len(target_class),dtype=np.object)
    unique_target_classes = np.unique(target_class)
    for tc in unique_target_classes:
        # tc is the encoded integer value of the target bitmask
        has_this_target_class = np.where(target_class == tc)[0]

        tc_name = '+'.join(mask.names(tc))

Can be fixed by tc_name = '+'.join(mask.names(int(tc))) instead but doesn't seem that intuitive and I didn't see this flagged elsewhere -- should it be fixed in desiutil instead?

Some tests:

from desiutil.bitmask import BitMask
import yaml

_bitdefyaml = """\
ccdmask:
  - [BAD,              0, "Pre-determined bad pixel (any reason)"]
  - [HOT,              1, "Hot pixel", {'blat': 'foo'}]
  - [TEST,            16, "A higher bit for this test"]"""

_bitdefs = yaml.load(_bitdefyaml)
mask = BitMask('ccdmask' ,_bitdefs)

# Ok passing a python int directly
mask.names(65536)

# OK passing an unsigned array
a = np.array([65536],dtype=np.uint64)
mask.names(a)

# Fails if the array is signed
a = np.array([65536],dtype=np.int64)
mask.names(a)

# Fails with a numpy scalar, signed or unsigned
mask.names(np.int(65536))
mask.names(np.uint64(65536))

Numpy 1.11.1.

Package dependencies in desitarget

I'm trying to determine exactly which packages need to be installed in order for Travis tests of desitarget to run successfully, so here are some questions about the packages:

desitarget appears to use both astropy.io.fits and fitsio for reading FITS files. It should only be astropy.io.fits, unless there is an overriding, very good reason to use fitsio. They are even both used in the same file!
I see that there is a dependency on h5py. That's not necessarily a problem, but at this stage it is worth asking whether the Durham mocks will always use this format.

Python 2->3 Mock Upgrade

Upgrade the mock submodule to Python 3.5.

Why does desitarget have its own version of maskbits.py?

desitarget contains an old duplicate of maskbits.py from desiutil.

desitarget.cuts.apply_cuts borks if an input catalog has just one object / row

from astropy.io import fits
from desitarget.cuts import apply_cuts
cat = fits.getdata('tractor-3409p277.fits',1)
desi_target, bgs_target, mws_target = apply_cuts(cat[0])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-3a3aa394b215> in <module>()
----> 1 desi_target, bgs_target, mws_target = apply_cuts(cat[0])

/Users/ioannis/repos/git/desihub/desitarget/py/desitarget/cuts.py in apply_cuts(objects)
     70 
     71     #- undo Milky Way extinction
---> 72     flux = unextinct_fluxes(objects)
     73     gflux = flux['GFLUX']
     74     rflux = flux['RFLUX']

/Users/ioannis/repos/git/desihub/desitarget/py/desitarget/cuts.py in unextinct_fluxes(objects)
     38 
     39     dered_decam_flux = objects['DECAM_FLUX'] / objects['DECAM_MW_TRANSMISSION']
---> 40     result['GFLUX'] = dered_decam_flux[:, 1]
     41     result['RFLUX'] = dered_decam_flux[:, 2]
     42     result['ZFLUX'] = dered_decam_flux[:, 4]

IndexError: too many indices for array

incorporate some kind of mask to account for bad/unusable regions for target selection

Pasting from the thread [desi-targets 1068]:

@djschlegel suggests:

We may want to start a list someplace of regions of potential problems to be
looking at with each imaging data release.

We’ll likely need an object-based formalism to deal with some objects,
and pixel-based for others.  For example, for the large NGC galaxies
we may end up with a mask (circular or elliptical?) down to R25 for each.
Within that, we’ll want fiber locations chosen by some algorithm, perhaps
along the major + minor axes.  Anything within the SDSS footprint has
had its coordinates + other parameters fixed up by Mike Blanton for the
NASA-Sloan Atlas.  Or.. someone may argue that we should keep QSO selection
in NGC galaxies to use as backlights.

Could we start putting together a representative list of flavors of problem
areas and a specific example for each (hopefully already in our observed footprint).
— Bright stars
— Really, really bright stars like Vega
— L/T dwarfs that are bright in WISE and not in the optical images
— Andromeda — it’s in our footprint!
— NGC/UGC elliptical galaxies
— NGC/UGC spiral galaxies
— NGC/UGC irregular galaxies
— Abell clusters — for ex, what happens to Abell 370 and Abell 665
— Globular clusters (often in NGC/UGC)
— High stellar density near the Galactic plane; we get as close as |b| ~ 14 deg
— Cirrus emission, optical and/or infrared — Eddie or Aaron would have some suggestions
— Planetary nebulae — depending on our sky modeling, these can be particularly problematic
     (These are *also* a problem for the spectroscopy, as we’ve discovered in SDSS where
     we’ll get a bunch of best-fit redshifts at z=0 from the PN emission lines)
— Jupiter, Saturn, Uranus — we may want to just avoid this in the observing strategy, as we do for DECaLS

What other fun + wonderful regions am I missing?

Python 2->3 Upgrade

Need to rewrite for python 3.5 upgrade. ADM will do this except for the mocks.

flux and shape distributions in DR3

Characterize the flux and object shape distributions for objects that pass the target selection cuts applied to DR3
provide utility functions that can randomly sample those distributions
update select_mock_targets to randomly assign fluxes and shapes to LRG, ELG, QSO targets that come from mocks inputs that don't provide them

Do this in a way that can be trivially re-applied to DR4 and/or with future updates to the target selection cuts.

KeyError: 'BGS_TARGET'

Travis is failing with a KeyError: 'BGS_TARGET' from the tests on mtl. Full report here: https://travis-ci.org/desihub/desitarget/jobs/161457969

Any help to understand that error would be appreciated.

include Lyman-alpha mocks in select_mock_targets output

Include Lyman-alpha mocks in select_mock_targets output.

Even just having the (ra,dec,z) would be useful for combining with other target classes for fiberassigment+quickcat based simulation loops.

Second step is including the spectra to simulate for later studies.

Possibly stale branches in desitarget

You have been working on two branches in desitarget, "truth" and "dr2". At present, both are very out-of-date relative to master, and dr2 has been inactive for four months. Can we delete the inactive dr2 branch? Remember to merge the truth branch with master before continuing to work on it.

reading MWS mocks v.0.02. (with multiprocessing) crashes

The following code

import desitarget.io
import desitarget.mock.io
mock_dir='/project/projectdirs/desi/mocks/mws/galaxia/alpha/0.0.2/bricks'
iter_mock_files = desitarget.io.iter_files(mock_dir, '', ext="fits")
import multiprocessing
file_list = list(iter_mock_files)
nfiles = len(file_list)
ncpu = max(1, multiprocessing.cpu_count() // 2)
print('using {} parallel readers'.format(ncpu))
p = multiprocessing.Pool(ncpu)
target_list = p.map(desitarget.mock.io._load_mock_mws_file, file_list)

gives a long error message starting with

Traceback (most recent call last):
  File "multiproc_bug.py", line 13, in <module>
    target_list = p.map(desitarget.mock.io._load_mock_mws_file, file_list)        
  File "/global/common/edison/contrib/desi/conda/conda_3.5-20160829/lib/python3.5/multiprocessing/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/global/common/edison/contrib/desi/conda/conda_3.5-20160829/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result:

ending with

 Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'

This seems to be caused by too large arrays (present in v.0.0.2 of the mocks, not in version v.0.0.1).

Can you reproduce the error?
Any idea for a workaround?

Stale branches in desitarget

There are several stale branches in desitarget. What is the plan for merging or deleting these branches:

npyquery_crash
npyquery_crash_fix
mocks
dr2

MTL set priority=0 for targets with IN_BRIGHT_OBJECT set

Related to #102 (define the bright object bits) and #103 (bright object mask for mocks):

Update the MTL code to set priority=0 for any object with IN_BRIGHT_OBJECT set. This is a special case, since normally the bit with the highest priority always wins. This is completely independent of the mocks themselves — the same MTL code applies to both mocks and real targets.

Travis-CI ?

Shall we add a travis-ci for, at least a top-level integrated test for targetselection.py ?

@sbailey

I can take care of it -- played quite a bit of travis-ci recently.

basic bright object mask for mocks

Implement a basic bright object mask for mocks as part of select_mock_targets. See #102 for the bits to set. Suggestion for v0:

Use MWS and BGS catalogs to identify bright objects
Set these bits
- IN_BRIGHT_OBJECT: just use circular radius around anything brighter than X in any band
- NEAR_BRIGHT_OBJECT: just larger circular radius around anything brighter than Y in any band
Maybe
- BRIGHT_OBJECT: only if convenient for dataflow while creating the mask
Don't set
- SAFE: don't bother generating SAFE locations for mocks for now

Slightly better would be circular radii that scale with object brightness. The main work for this ticket will be picking something reasonable for the flux thresholds and radii, and implementing that in a manner that runs reasonably efficiently for processing O(50M) objects.

Note: we will want a much more complete bright object mask for real data that considers object shapes, surface brightnesses, safe zones, diffraction spikes, etc. This issue is just for a super-simple version to include in the first rounds of mocks.

improve select_targets traceability

Improve the target selection traceability by tracking the complete list of input files used and their checksums.

The output of select_targets includes the desitarget version, the git revision (if running from a git checkout for testing; otherwise "unknown", though perhaps that should be the tag), and the input directory of files used, e.g.

DEPNAM00= 'desitarget'                                                          
DEPVER00= '0.2.0   '           / desitarget.__version__                         
DEPNAM01= 'desitarget-git'                                                      
DEPVER01= 'unknown '           / git revision                                   
DEPNAM02= 'tractor-files'                                                       
DEPVER02= '/project/projectdirs/desiproc/dr2/sweep/'

but it doesn't actually trace what files it found under /project/projectdirs/desiproc/dr2/sweep/ or what their checksums were (to be able to check whether someone mucked with the files later, e.g. since this isn't the real final locked down publicly released DR2 directory).

Add another bintable HDU that tracks the actual files used as input and their checksums. In the case of tractor files as inputs instead of sweeps this could be a long list so it requires a table and not just DEPNAM/DEPVER header keywords. Don't include the common path prefix that is already kept as DEPVER02.

desitarget not desiInstall compatible

desiInstall at NERSC produces a borked installation of desitarget/0.2.0. This is likely due to desitarget being non-compliant rather than a problem with desiInstall per-se. Figure out what is wrong and fix it so that "desiInstall desitarget 0.2.1" will work.

The error from the current desitarget/0.2.0 is:

[cori target] time select_targets /project/projectdirs/desiproc/dr2/sweep/ targets-dr2-0.2.0.fits -v
Traceback (most recent call last):
  File "/project/projectdirs/desi/software/cori/desitarget/0.2.0/bin/select_targets", line 8, in <module>
    from desitarget import io
  File "/project/projectdirs/desi/software/cori/desitarget/0.2.0/lib/python2.7/site-packages/desitarget-0.2.0-py2.7.egg/desitarget/__init__.py", line 41, in <module>
    from .targetmask import priorities, obsconditions, obsstate
  File "/project/projectdirs/desi/software/cori/desitarget/0.2.0/lib/python2.7/site-packages/desitarget-0.2.0-py2.7.egg/desitarget/targetmask.py", line 13, in <module>
    with open(_filepath) as fx:
IOError: [Errno 2] No such file or directory: '/project/projectdirs/desi/software/cori/desitarget/0.2.0/lib/python2.7/site-packages/desitarget-0.2.0-py2.7.egg/desitarget/targetmask.yaml'

targetmask.yaml is not copied to the installation directory. I will fix this by hand for the current 0.2.0 installation so that we can make progress with it in the meantime.

add BGS, MWS, SKY, FSTD, WDSTD target selection flags

Creating a ticket so we don't forget about it. We need to reserve target bits for the Bright Galaxy Survey (BGS), the Milky Way Survey (MWS), sky fibers (SKY), F-type standard stars (FSTD), and White Dwarf standard starts (WDSTD).

From Jeremy Tinker:

For the BGS, so far we only need two bits. Bright and faint BGS targets. It seems unlikely that we’ll do much more than that, but it’s possible that we might put collided SDSS and BOSS targets as a separate bit in order to give them maximal priority. As with eBOSS QSOs, we might want to have a catalog that includes KNOWN objects, just so we don’t have to go back to some random location to put together the full LSS catalog. Thus, we’d also want to make sure that the tiled bits are an input to the tiling code as well. So the input target catalog has KNOWN in it, but are not assigned fibers. (That may be easy enough by saying that zero observations are required).

So…

BGS_FAINT
BGS_BRIGHT
BGS_KNOWN_COLLIDED
BGS_KNOWN_SDSS
BGS_KNOWN_BOSS

Connie Rockosi indicated that the MWS may want to group targets by their proper motion in order to use different priorities. Let's reserve 5 bits, and maybe in a different range to so that we can expand them a bit more if needed.

Also: we need to resolve how targeting bits will be handled for South (DECaM-based) vs. North (Bok+Mosaic based) vs. overlap regions. One option is to define separate LRG_SOUTH and LRG_NORTH bits, perhaps with a 3rd convenience LRG big that is just the logical OR of LRG_SOUTH and LRG_NORTH. Ditto for ELG, etc.

Assigning to @moustakas since I definitely want his input on this, though others are also welcome to contribute.

place to create unique TARGETID for mocks

Where in desitarget should we do the unique TARGETID computation? What logic should we follow to do that if we have hundreds of input mock files?

spectroscopic truth region wish-list for DECaLS/DR3

Row-matched DECaLS/DR2 catalogs of objects in the spectroscopic truth regions were compiled and documented in this TechNote.

For DR3 we need to update these catalogs but also add:

add target density fluctuations to select_mock_targets

Related to #90 (include target contamination) and #101 (generate target density fluctuations)...

todo: Implement the target density fluctuations in select_mock_targets itself.

Generate target density fluctuations

Generate brick-by-brick depth fluctuations and density fluctuations that are correlated with those depths and E(B-V). Define file format for how those are passed off from @geordie666 to select_mock_targets. 0th order would be a FITS binary table with columns:

BRICKNAME
EBV (median E(B-V) of the brick; or do we want to do it at the per-target level from the start?)
DEPTH_G, DEPTH_R, DEPTH_Z, GALDEPTH_G, GALDEPTH_R, GALDEPTH_Z (use DR3 definition of point-source and galaxy depth; I don't see WISE depths included in DR3 sweeps...)
DENSITY_ELG, DENSITY_LRG, DENSITY_QSO, DENSITY_LYA, DENSITY_BGS, DENSITY_MWS
DENSITY_BAD_ELG, DENSITY_BAD_LRG, DENSITY_BAD_QSO, DENSITY_BAD_LYA, DENSITY_BAD_BGS, DENSITY_BAD_MWS
- some of these may be all zeros, e.g. DENSITY_BAD_MWS
- different name than "BAD"? FAKE? CONTAMINANT seems too long...

Note: This isn't completely flexible for being able to specify densities for different flavors of bright vs. faint BGS, MWS FSTAR vs. WD, etc. nor what type of contaminants to use, but I think it is a good start for the purposes of the December meeting.

Definition of desi_target bits

I have implemented a BDT selection of QSO in cuts.py. So far there is a hack to use it. I checked that both the target density and the quasar completeness make sense on EDR region and on the whole dr2.

The increase in time to run whole dr2 is still reasonable. I will check it once dr3 sweep will be available. However I have still an open issue. To activate it I have two options:
-either I create a new bit for BDT selection
-or I add an argument in select_targets function, i.e useBT, to activate it.

I prefer the latter option, because we can finish with a few hundred bits for target selections as it is now in BOSS and eBOSS. However I recognize that the SDSS strategy has the advantage to keep the history of the target selection in the collaboration.

Christophe.

Encode mock provenance into TARGETID.

Can these merged branches be deleted?

These branches have already been merged, and should be deleted if no one is using them:

cosmeticChangesQSOCut
ChYMorphology
obsconditions

make target selection QA plots

Make QA plots for target selection as a standard part of the output.

Version 0.2.0 produced target densities considerably different than expected; the first sanity check for bugs is to make the basic color color plots and see what the distributions look like pre- and post- cuts. The underlying distributions may be much different than expected, or there may be a simple sign error that would be obvious from a plot but easy to miss when reading code.

Define target bits for a bright object mask

Define DESI_TARGET bits for a bright object mask and add them to py/desitarget/data/targetmask.yaml. See DocDB-2346 for full version of what we want to implement. This issue is just for defining the bits so that we have something to fill in as we implement those algorithms.

IN_BRIGHT_OBJECT (i.e. do not observe; crosstalk or saturation would be bad)
NEAR_BRIGHT_OBJECT (informational only; does not impact priority)
SAFE (fallback locations around bright objects to position fibers in case no science targets are available)
BRIGHT_OBJECT (we may not use this, but it might be handy to pass forward all bright objects from an initial selection so that bright object masking could be based only upon that subset and not requiring going all the way back to the sweeps every time. This would also be convenient for working with mocks).

color cut for MWS_STAR simulated data

Random selection from the stellar templates gives far too many stars with g-r<0 for an old stellar population, which is mostly what we expect for the MWS. A function like isFSTD_colors to cut objects with g-r < 0 would be a quick way to make this better. There would be a corresponding change to the definition of MWS_STAR in desisim's templates.py

Need code which will apply target selection cuts on grzW1W2 fluxes

The function cuts.select_target() takes as input grz magnitudes, but we need something which can be applied to the DECaLS catalogs, which report grz fluxes in nanomaggies (which can be negative).

The flux version of the current set of cuts are here:
https://desi.lbl.gov/trac/wiki/TargetSelection

Refactor desitarget.cuts.apply_cuts to deal with individual spectral classes

Refactoring desitarget.cuts.apply_cuts so that the nominal target-selection cuts can be applied to individual spectral classes would be very helpful for developing / testing target selection and for the templates work (in desisim).

Include target contamination

select_mock_targets needs to output contaminants in addition to real targets.

See #101 for defining input format for contaminant densities
Make fake file with random variations to incorporate with select_mock_targets
Update select_mock_targets to output contaminants for each target class with these densities.

Let's start with all contaminants being stars to start with, and update with more realistic mixes of stars vs. galaxies in a later iteration.

Unique TARGETID across Targets, StandardStars and SkyPositions [mocks]

This concerns mocks. We need unique TARGETIDs across targets, stdstars and skypositions.

mocks for bright time (BGS + MWS)

@crockosi @apcooper @moustakas I've just started in the mocks branch the submodule mocks.brighttime to create target files consistent with quicksurvey.
Any help is appreciated to understand how have you been generating mocks so far. I believe you have been doing this separately, that is running fiberassign either on BGS or MWS targets, without having a joint set of targets mixing these two classes. Right?
We also need your input to decide how to set the number of observations for those targets.

The NERSC batch system

I am opening this issue to see if anyone has already used desitarget on DECAM DR3. My objective is to do target selection on the DR3 sweep files on NERSC. The installed desitarget on Nersc is not up-to-date, so it fails somewhere during galactic extinction correction ( I think it is because some sweep files have problematic decam_mw_transmission s eg. please take a look at @@ -271,12 +382,18 @@ def unextinct_fluxes(objects): in #92 ).

Anyway, I just use an updated version that is cloned under my cscratch1 directory which is relatively slower than the installed one (I think). Running the select_target script on a single sweep file takes ~ 12-15ish mins.

This is the bash script I use to submit the job:

#!/bin/bash
#SBATCH --partition=debug
#SBATCH --nodes=1
#SBATCH --job-name=my_job
#SBATCH --time=00:30:00
##SBATCH --qos=normal
#SBATCH --mail-type=END
#SBATCH [email protected]
#SBATCH --workdir=.
source /project/projectdirs/desi/software/modules/desi_environment.sh
#module load desitarget # wait, it is not updated
export PYTHONPATH=/global/cscratch1/sd/mehdi/github/desitarget/py:$PYTHONPATH
export PATH=/global/cscratch1/sd/mehdi/github/desitarget/bin:$PATH
export indir=/global/project/projectdirs/cosmo/data/legacysurvey/dr3/sweep/3.0
export oudir=/global/cscratch1/sd/mehdi/dr3
export id=0
srun -n 1 python ./select_targets $indir/sweep-170p000-180p005.fits $oudir/decamdr3-section$id.fits --section $id --qsoselection colorcuts
wait

--section is something I added to select_target to have it choose a chunk of the sweep files (that is because I was not sure if it can handle all the sweep files all at the same time):

from __future__ import print_function, division

import numpy as np

from desitarget import io
from desitarget.cuts import select_targets, qso_selection_options

import warnings
warnings.simplefilter('error')

import multiprocessing
nproc = multiprocessing.cpu_count() // 2

def chunkit(lst,n):
    return [ lst[i::n] for i in xrange(n) ]


from argparse import ArgumentParser
ap = ArgumentParser()
ap.add_argument("src", help="Tractor file or root directory with tractor files")
ap.add_argument("dest", help="Output target selection file")
ap.add_argument('-v', "--verbose", action='store_true')
ap.add_argument('--qsoselection',choices=qso_selection_options,default='randomforest',
                help="QSO target selection method")
### ap.add_argument('-b', "--bricklist", help='filename with list of bricknames to include')
ap.add_argument("--numproc", type=int,
    help='number of concurrent processes to use [{}]'.format(nproc),
    default=nproc)
ap.add_argument("--section", help="section of dataset")
ns = ap.parse_args()
infiles = io.list_sweepfiles(ns.src)
if len(infiles) == 0:
    infiles = io.list_tractorfiles(ns.src)
if len(infiles) == 0:
    print('FATAL: no sweep or tractor files found')
    sys.exit(1)

infiles = chunkit(infiles, 4)[int(ns.section)]
print('size {}'.format(len(infiles)))


targets = select_targets(infiles, numproc=ns.numproc, verbose=ns.verbose, 
                         qso_selection=ns.qsoselection)

io.write_targets(ns.dest, targets, indir=ns.src, qso_selection=ns.qsoselection)

print('{} targets written to {}'.format(len(targets), ns.dest))

I started thinking maybe I am not using the desitarget in the most time-efficient way.
I would sincerely appreciate any suggestions or comments.

cheers
Mehdi

FiberAssign scripts in desitarget

@sbailey @apcooper

We have this script to match fiberassign outputs to the input mocks to build the Target File.

The script uses the mocks.io.fiberassign submodule from desitarget. Should we instead move that module into fiberassign/bin?
There are many ordering assumptions to make the matching (and avoid search/finding routines). Do we want to keep implicit assumptions (easy to break)? or do we want to go for explicit search/matching (more robust)?

We also have this script to create directories and prepare a fiberassign run.

Shouldn't we also have that in fiberassign/bin?

What tag/release are we on?

I (think I) need to create a new release of desitarget so the Travis tests for desihub/desisim#132 can complete. However, https://github.com/desihub/desitarget/releases shows that even though 0.3.3 (which is what .travis.yml in desisim currently depends on) is the current release, @kaylanb tagged a version 0.4.0 on July 12.

Shouldn't we be on 0.3.4 or were the changes introduced by @kaylanb significant enough to bump us to 0.4.0? And if so, shouldn't this become the "current release" of desitarget?

@sbailey Can you advse on what we should do (and specifically so the desihub/desisim#132 tests can complete)?

What is the origin of desitarget.internal?

desitarget has a subpackage called internal, which makes no sense because it actually contains copies of code from external packages. The standard Python package convention in these cases is to call such subpackages 'external' or 'extern'.

Formatting functions for Stanford mocks

I'm writing functions to create Tractor and Truth catalogs from Stanford mock galaxy catalogs.

enable subsetting for select_mock_targets

For development/debugging/testing, it would be convenient to be able to run select_mock_targets on a subset of the sky, e.g. by specifying ra_min,ra_max,dec_min,dec_max, or by giving a list of tiles and only including targets that are covered by those tiles.

Problem in desitarget.mtl with table ztargets

problem.txt

Move db directory inside the Python package structure

In order for desitarget to work with Anaconda environments, the db/ directory needs to be moved inside the Python package itself, i.e. to py/desitarget/db/.

npyquery crashes and performance issues

Branch sjb_multifile_ts refactors targetselection.py to write one target catalog output for for many tractor file inputs. This uncovered two problems with the npyquery cuts:

1. It crashes after ~350 input bricks with "Fatal Python error: deallocating None". This is not a catchable exception, it is a hard hard fatal crash. It's likely that the actual bug is in one of the underlying compiled libraries, but the complexity of the npyquery tree code makes it difficult to isolate exactly where and why.

To see this, use branch npyquery_crash, which is a branch off of sjb_multifile_ts that uses the npyquery cuts instead of the workaround vanilla numpy cuts in sjb_multifile_ts:

[edison sjbailey] targetselection.py /project/projectdirs/cosmo/data/legacysurvey/dr1/tractor blat.fits --verbose
50 bricks; 22.0 bricks/sec
100 bricks; 22.7 bricks/sec
150 bricks; 21.1 bricks/sec
200 bricks; 24.4 bricks/sec
250 bricks; 23.0 bricks/sec
300 bricks; 24.6 bricks/sec
350 bricks; 27.0 bricks/sec
Fatal Python error: deallocating None
Aborted

2. It is ~2.5x - 9x slower than just naively making cuts on the input numpy structured arrays.

from desitarget.cuts import select_targets as select_oldstyle
from desitarget.cuts_npyquery import select_targets as select_npyquery
import fitsio
from astropy.io import fits

d1 = fitsio.read('tractor-0001m002.fits', 1, upper=True)
d2 = fits.getdata('tractor-0001m002.fits')

#- Using numpy arrays from fitsio, npyquery is 2.5x slower
%timeit x = select_oldstyle(d1)   #- 2.77 ms
%timeit x = select_npyquery(d1)   #- 7.07 ms

#- And using FITS_rec objects from astropy.io.fits, npyquery is 9x slower
#- (and both are much slower than the real numpy arrays from fitsio... grr astropy...)
%timeit x = select_oldstyle(d2)  #- 13.4 ms
%timeit x = select_npyquery(d2)  #- 126 ms

The ability to introspect the npyquery cuts to find out what variables are needed, and the potential future ability of having them output SQL is cool, but it seems that we should favor obvious, straightforward (and faster and non-crashing) code over clever code. I'm leaving the npyquery code available in the repo for now, but the upcoming sjb_multifile_ts pull request will offer alternate numpy based cuts instead while this gets sorted out.

I am kind of new to the topic. I would like to see some examples (if available any). I have started reading this doc on the wiki (http://desi.lbl.gov/wp-content/uploads/2014/04/tdr-science-biblatex2.pdf). I was wondering if someone could point me out in the right direction. Another question is that how I can find these directory on NERSC?
Thank you,
--mehdi