zwickytransientfacility / scope-ml Goto Github PK

SCoPe: ZTF source classification project

Home Page: https://zwickytransientfacility.github.io/scope-docs/

License: MIT License

Python 22.59% Jupyter Notebook 73.67% Shell 2.20% Fortran 1.54%

scope-ml's Introduction

SCoPe: ZTF Source Classification Project

scope-ml uses machine learning to classify light curves from the Zwicky Transient Facility (ZTF). The documentation is hosted at https://zwickytransientfacility.github.io/scope-docs/. To generate HTML files of the documentation locally, clone the repository and run scope-doc after installing.

Funding

We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for Accelerated AI Algorithms for Data-Driven Discovery (A3D3) under Cooperative Agreement No. PHY-2117997.

scope-ml's People

Contributors

Stargazers

Watchers

scope-ml's Issues

Add an introduction on the guide

Feature Summary
An introduction of the guide is needed. Classification can be done in many different ways; classes overlap, some classes need more than just lightcurves etc. Point to other resources etc.

fritz.yaml unnecessary?

I only see one place in the scope code where fritz.yaml is opened (lines 68-70 of fritz.py), and I don't think anything is being used from that file now that the Fritz URL is hardcoded.

Field guide section on 'bogus'

show all types of examples of bogus lightcurves in the PSF lightcurves

Feature Summary
list the different types of 'bogus' lightcurves, and how to recognise them. Discuss the (likely) origin of the problem and show lightcurve examples. Describe (and show) how these bogus lightcurves can be confused with real classes, and how to disambiguate the two possibilities

Field guide section on AGN

A field guide entry of AGNs is needed

Feature Summary
write a description, show an example light curve, and add the golden set for AGN binaries

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

Field guide section on 'pulsating'

describe how we define pulsating

Feature Summary
This section should briefly introduce pulsating stars an mostly summary the different subtypes and point towards the relevant sections in the fieldguide.

Some example LCs

Field guide section on Beta Lyr binaries

A field guide entry of Beta Lyr binaries is needed

Feature Summary
write a description, show an example light curve, and add the golden set

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

zsh: illegal hardware instruction

(scope-env) user@user's MacBook-Pro scope-main % ./scope.py
zsh: illegal hardware instruction  ./scope.py

After I have completed the environment configuration according to the tutorial, I found that this environment can not operate normally under the arm64 architecture, which will be the same as the above.

(scope-env) user@user's MacBook-Pro % uname -m
arm64

The computer is MacBook Pro 2021 with M1 pro CPU.

Field guide section on Cepheid variables

A field guide entry of Cepheid variables is needed

Feature Summary
write a description, show an example light curve, and add the golden set

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

Field guide section on CVs

A field guide entry of Cataclysmic Variables is needed

Feature Summary
write a description, show an example light curve, and add the golden set

Usage / behavior
similar to RRLyr page; subtypes can get their own page if needed

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

Use annotations to post ML algorithm scores

I think the best way to post DNN/XGB scores to Fritz is using annotations. The scope_manage_annotation.py script can be used to upload these annotations in bulk, but at the moment it only works with one type of annotation at a time. We should update this to allow multiple origins/keys/values for each source to be uploaded with one call of the script.

Enable photometry downloads

Feature Summary
We currently have code to make and upload photometry (within fritz.py, called by scope_upload_classification.py). It would also be helpful to have the ability to programmatically download photometry.

Usage / behavior
This functionality could be implemented within scope_download_classification.py, toggled by an optional keyword. It should save the user's requested photometry locally.

Implementation details
This may need to be implemented in the section of the code that loops over each individual source. I am not aware of a current way to download photometry in bulk for multiple sources.

add folded lightcurves to the 'group' page

We need to be able to see the folded lightcurve on the 'group' page, instead of clicking on the objects and having to open a new window. This needs to be done in skyportal.

Compatibility bug

Environment

Linux (Ubuntu 20.04 LST) of AMD 64 architecture

Problem

When I used the python in oneAPI HPC Toolkit for Linux, there will be conflicts between some packages.

Solution

Avoid using the python in oneAPI HPC Toolkit for Linux.

Improve speed of source downloads

Complementary to #88, but separate because the download script runs differently. Iterating over the pages of existing sources takes a non-negligible amount of time before the download loop begins. Once it does, it is possible to download a few thousand sources per hour.

Feature Summary
To prepare for the need to download large source lists, we should streamline scope_download_classification.py to run as quickly as possible.

Implementation details
The page-by-page loop before downloads begin, along with the download loop that starts afterwards, are both good areas to focus for this enhancement.

Field guide section on 'irregular'

Describe the irregular variable type

Feature Summary
give a description of irregular variables. List examples and point towards the relevant sections in the fieldguide.

Field guide section on 'periodic'

Describe how we define 'periodic'.

Feature Summary
write a precise description of how we define 'periodic' and link to typical types of variables stars.

Also discuss how we handle intermediate cases (semi-regular variables)

Hosted documentation is down

Describe the bug
The documentation hosted at scope.ztf.dev is currently down.

HR Diagram Unit Test failing

/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: invalid value encountered in log10
result = getattr(ufunc, method)(*inputs, **kwargs)
/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: invalid value encountered in log10
result = getattr(ufunc, method)(*inputs, **kwargs)
Traceback (most recent call last):
[·] Generating HR diagrams for Golden sets
File "/home/runner/work/scope/scope/./scope.py", line 755, in
fire.Fire(Scope)
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/runner/work/scope/scope/./scope.py", line 424, in doc
plot_gaia_hr(
File "/home/runner/work/scope/scope/scope/utils.py", line 243, in plot_gaia_hr
ax.errorbar(
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/matplotlib/init.py", line 1423, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 3587, in errorbar
raise ValueError(
ValueError: 'yerr' must not contain negative values
[✗] Generating HR diagrams for Golden sets

Upload jupyter notebook with example kowalski queries

Adding this notebook will help new users get familiar with querying the database. For those working with the draft version, I welcome comments on how to improve it before posting.

Field guide section on 'ellipsoidal'

Add the fieldguide section on ellipsoidal variables, and add a golden set

Centralize fritz/kowalski authentication

Feature Summary
Any script/import that involves fritz.py will run the code at lines 68-70 of that file, right before api(). After changing the file opened by those lines from fritz.yaml to config.yaml (see #91), we could read in the user's fritz token within fritz.py and eliminate the need to explicitly supply it in a secrets.json file for scripts like scope_download_classification, scope_upload_classification, and scope_manage_annotation. This would save the user time and streamline the scripts' inputs.

Re-run updated DNN

#53 removed a dropout that was limiting DNN performance. I don't think the DNN was re-run on a large scale since then, so we should do that to gauge its current performance.

Missing sample CVs

Sample objects for the CV subclasses are missing in config.defaults.yaml, which results in empty plots here.

Improve setup documentation

The body, headers, and order of the "Setting up your environment" documentation should be updated to describe the installation process in a more linear way. For example, there are some sections under the current Windows/Linux/MacOS (x86-64) header that a MacOS ARM64 user should perform as well.

EPIC Scope catalogue

Epic describing Minimal Viable Product for the Scope catalogue

General

Features

trainingset

ML classifiers (DNN/XGB)

Active Learning

Visualization

Field guide entries

script to download labels from Fritz and generate a trainingset

The goal is to have a python script to downloads all available labels from Fritz and collects their features and puts it in an easy to use .csv file. The format of this file should be the same as the older d15 trainingset.

Field guide section on 'dipping'

Write a fieldguide section on dipping

Feature Summary
Write a concise description of dipping. Make clear how this differs from eclipsing, and give example types of 'dipping'

Make scope_download_classification.py robust to server errors

The scope_upload_classification.py and fritz.py code was recently updated to handle certain errors that pertain to brief interruptions in the connection to Fritz. We should do the same for the download script, which currently fails and does not save any output in the case of a single interruption to the connection.

Feature Summary
Added robustness to scope_download_classification.py

Usage / behavior

Upon a connection error, retry the API call a specified number of times.
Establish checkpoints to occasionally save downloaded data so all progress is not lost if the download loop is broken.
Allow the user to continue an interrupted download from a specified source

Implementation details

The api calls in scope_upload_classification.py and fritz.py have examples of handling server connection errors.

Query across multiple machines

There are catalogs of interest spread across the kowalski, gloria, and melman machines. It would be optimal to be able to use one query to get data without needing to specify the machine on which it resides.

A good longer-term solution could be sharding (e.g. here). For the short term, see this PR to penquins that introduces a new KowalskiInstances class that authenticates each machine and queries the one containing the catalog of interest. If/when this PR gets approved, we should revise the scope code to include this functionality, preempting confusion amid the growing number of machines.

python/jupyternotebook code to upload examples including tags

We need an easy to use and flexible python function to upload a set of examples. This includes uploading the set, but also labelling the uploaded set appropriately. Most of this functionality is already there, but we need to make sure that it works under all circumstances; e.g. what happens if a source already exists etc.

Field guide section on W UMa binaries

A field guide entry of WUMa binaries is needed

Feature Summary
write a description, show an example light curve, and add the golden set for WUMa binaries

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

add new Fritz classification scheme to label uploaded sets

A new 'classification' structure needs to be uploaded (and regularly updated) in order to label our uploads. For example; classifications called 'XGB high confidence RR lyrae", or "DNN high confidence RR lyrae".

top-labels for figures

Feature Summary
Figures like HR diagram and sky coverage should have top labels (==variable type)
So that they can stand on their own.

Usage / behavior
Replace current figures.

Field guide section on 'eclipsing'

Describe the phenomenological class 'eclipsing'

Feature Summary
Give a description of eclipses. Also present and discuss the EA, EB, and EW types. Show lightcurves as illustration

Field guide section on YSOs

A field guide entry of YSOs is needed

Feature Summary
write a description, show an example light curve, and add the golden

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

Field guide section on RS CVn variables

A field guide entry of RS CVn variables

Feature Summary
write a description, show an example light curve, and add the golden set

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

tensorflow error using GPU for training on M1 Mac

When running ./scope.py train and including --gpu 0 to specify use of the GPU, I get an error even though the GPU is recognized and available. I think this may happen because the ResourceApplyAdamWithAmsgrad operation is not currently supported by tensorflow-metal (see e.g. this discussion and its similarity to the error messages below). I've tried upgrading to the latest version of tensorflow-metal (0.6.0) but still get the error. Fortunately training still runs reasonably fast on the CPU.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/conv_conv_1/separable_conv2d/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_conv_1/separable_conv2d/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].

Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdamWithAmsgrad: CPU ReadVariableOp: GPU CPU _Arg: GPU CPU

Document/video with detailed example of how to classify an object

A document/video is needed that explains in detail how to classify a lightcurve

Feature Summary
A document with multiple examples of how to classify objects. This includes a concise description with all the actions and information used (PS1 cutout, folded lightcurve etc).

Implementation details
google doc with multiple examples; and possibly a video

Lightcurve y-range is reverse for WUma

The lightcurve figure y-limits are reversed for the WUma figure

Describe the bug
for the example lightcurve figure, the y-limits are reversed. A better method to determine the y-limits are needed.

To Reproduce
see the WUma fieldguide.

Expected behavior
The y-limits of the lightcurve should show correct range, with high magnitudes at the bottom and low magnitudes at the top.

Screenshots

Combine get_features and get_field_features

It appears that the main difference between these two functions in get_features.py is the outfile variable containing the path to the directory for saved results. I think these functions can be combined into one that uses the existing whole_field keyword to assign the correct path.

Improve speed of source uploads

Currently scope_upload_classification.py can upload ~1000 sources/classifications/light curves/annotations to Fritz per hour if two instances are run simultaneously on independent parts of source lists (using the -start -stop arguments). The server starts to complain if three or more instances are run together. See also #89.

Feature Summary
With the potential for millions of sources to be uploaded going forward, we should determine how much our code can be streamlined to make uploads complete faster.

Implementation details
Anything within our loop over each source is a good target for speed enhancements.

A field guide entry of Long Period Variables (LPV)s

A field guide entry of Long Period Variables (LPV)s is needed

Feature Summary
write a description, show an example light curve, and add the golden set

Usage / behavior
similar to RRLyr page, should also describe the subtypes

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

Generate a .csv trainingset from labels in Fritz

code is needed to take the classifications from Fritz, clean them, put them in the correct format and add the features

Feature Summary
The input are labels from Fritz, and the main (single) output should be a .csv file that is the same format as the D15 trainingset.

The code should

download objects and labels from Fritz
download and combine labels from multiple groups on Fritz
handle duplicate objects and labels
verify the upstream labels (e.g. a delta scuti should be variable and periodic)
get the features per lightcurves and join the with the labels
save everything as a .csv file (exactly the same format as d15). Add a header that specifies the code version, and time of download from Fritz

Short feature description

Please fill out relevant sections below; remove those that are unused.

Feature Summary
In the page where the light curve can be folded, show the folded light curve ("period" tag on top) by default.
There are cases where the metadata does not include a period. It should do the mag in that case (currently the
period tag does not appear in such a case).

Additional context
See the following for an example: https://fritz.science/source/ZTFJ025839.18+542006.7

Field guide section on Delta Scuti pulsators

A field guide entry of Delta Scuti variables is needed

Feature Summary
write a description, show an example light curve, and add the golden set

Usage / behavior
similar to RRLyr page

Implementation details
See https://scope.ztf.dev/developer.html#contributing-field-guide-sections

Field guide section on 'variables'

Write a section on what we define as 'variable' nd what is a 'non-variable'

Feature Summary
The main point of this section is to have a clear definition of what we define is a 'variable' and non-variable.

Usage / behavior
same as other pages with a few example lightcurves, but an HR diagram or skylocation might not be relevant here

Implementation details
similar to other pages

Additional context
Add any other context or screenshots about the feature request here.

Duplicate rows in group source downloads

While this issue is connected to and raised on SkyPortal, it is worth noting here as well that group source downloads from Fritz can yield files with duplicate rows. For the ~80,000 sources in the golden dataset group, ~5-10% of rows may be duplicated when using scope_download_classification.py. See the linked issue for more discussion about potential solutions to the problem.

upload labels from the D15 label set to Fritz

The D15 training set is only available as a CSV files. We need to make sure it is uploaded to Fritz so we have all labels there.

Doc build is failing on the CI

Describe the bug
See https://github.com/ZwickyTransientFacility/scope/runs/4027700900?check_suite_focus=true#step:6:48

PS1 cutout and photometry "alternately" missing

Please fill out relevant sections below; remove those that are unused.

Describe the bug
When object details are loaded through expanding the '>' sign on the left, various cutouts are loaded among other things. The PS1 cutout does not appear. Once you click on the object id, it opens in the new page, and now it is seen in the original page too, but the photometry has vanished. See the before and after screenshots below.

To Reproduce
Go to https://fritz.science/group_sources/371 and do as suggested above

Expected behavior
PS1 cutout should load at the start. Photometry should not vanish.

Screenshots

Platform information:
Chrome on Mac

Additional context
Program 371: https://fritz.science/group_sources/371

zwickytransientfacility / scope-ml Goto Github PK

scope-ml's Introduction

SCoPe: ZTF Source Classification Project

Funding

scope-ml's People

Contributors

Stargazers

Watchers

Forkers

scope-ml's Issues

Compatibility bug

Environment

Problem

Solution

General

Features

trainingset

ML classifiers (DNN/XGB)

Active Learning

Visualization

Field guide entries

Recommend Projects

Recommend Topics

Recommend Org

Jobs