GithubHelp home page GithubHelp logo

zwickytransientfacility / scope-ml Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 47.0 11.92 MB

SCoPe: ZTF source classification project

Home Page: https://zwickytransientfacility.github.io/scope-docs/

License: MIT License

Python 22.59% Jupyter Notebook 73.67% Shell 2.20% Fortran 1.54%

scope-ml's Introduction

SCoPe: ZTF Source Classification Project

PyPI version arXiv arXiv arXiv

scope-ml uses machine learning to classify light curves from the Zwicky Transient Facility (ZTF). The documentation is hosted at https://zwickytransientfacility.github.io/scope-docs/. To generate HTML files of the documentation locally, clone the repository and run scope-doc after installing.

Funding

We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for Accelerated AI Algorithms for Data-Driven Discovery (A3D3) under Cooperative Agreement No. PHY-2117997.

A3D3 NSF

scope-ml's People

Contributors

bfhealy avatar dmitryduev avatar mcoughlin avatar ashishmahabal avatar guiga004 avatar jasonmsetiadi avatar marc-shen avatar jillhughes141 avatar janvanroestel avatar leighannaz avatar saagar-parikh avatar cjol245 avatar przemekmroz avatar reed0824 avatar

Stargazers

Theophile du Laz avatar Hibiki Seki avatar  avatar  avatar Simon Goode avatar Song Wang avatar  avatar

Watchers

James Cloos avatar  avatar  avatar  avatar  avatar

scope-ml's Issues

Add an introduction on the guide

Feature Summary
An introduction of the guide is needed. Classification can be done in many different ways; classes overlap, some classes need more than just lightcurves etc. Point to other resources etc.

fritz.yaml unnecessary?

I only see one place in the scope code where fritz.yaml is opened (lines 68-70 of fritz.py), and I don't think anything is being used from that file now that the Fritz URL is hardcoded.

Field guide section on 'bogus'

show all types of examples of bogus lightcurves in the PSF lightcurves

Feature Summary
list the different types of 'bogus' lightcurves, and how to recognise them. Discuss the (likely) origin of the problem and show lightcurve examples. Describe (and show) how these bogus lightcurves can be confused with real classes, and how to disambiguate the two possibilities

Field guide section on 'pulsating'

describe how we define pulsating

Feature Summary
This section should briefly introduce pulsating stars an mostly summary the different subtypes and point towards the relevant sections in the fieldguide.

Some example LCs

zsh: illegal hardware instruction

(scope-env) user@user's MacBook-Pro scope-main % ./scope.py
zsh: illegal hardware instruction  ./scope.py

After I have completed the environment configuration according to the tutorial, I found that this environment can not operate normally under the arm64 architecture, which will be the same as the above.

(scope-env) user@user's MacBook-Pro % uname -m
arm64

The computer is MacBook Pro 2021 with M1 pro CPU.

Use annotations to post ML algorithm scores

I think the best way to post DNN/XGB scores to Fritz is using annotations. The scope_manage_annotation.py script can be used to upload these annotations in bulk, but at the moment it only works with one type of annotation at a time. We should update this to allow multiple origins/keys/values for each source to be uploaded with one call of the script.

Enable photometry downloads

Feature Summary
We currently have code to make and upload photometry (within fritz.py, called by scope_upload_classification.py). It would also be helpful to have the ability to programmatically download photometry.

Usage / behavior
This functionality could be implemented within scope_download_classification.py, toggled by an optional keyword. It should save the user's requested photometry locally.

Implementation details
This may need to be implemented in the section of the code that loops over each individual source. I am not aware of a current way to download photometry in bulk for multiple sources.

add folded lightcurves to the 'group' page

We need to be able to see the folded lightcurve on the 'group' page, instead of clicking on the objects and having to open a new window. This needs to be done in skyportal.

Compatibility bug

Compatibility bug

Environment

Linux (Ubuntu 20.04 LST) of AMD 64 architecture

Problem

When I used the python in oneAPI HPC Toolkit for Linux, there will be conflicts between some packages.

Solution

Avoid using the python in oneAPI HPC Toolkit for Linux.

Improve speed of source downloads

Complementary to #88, but separate because the download script runs differently. Iterating over the pages of existing sources takes a non-negligible amount of time before the download loop begins. Once it does, it is possible to download a few thousand sources per hour.

Feature Summary
To prepare for the need to download large source lists, we should streamline scope_download_classification.py to run as quickly as possible.

Implementation details
The page-by-page loop before downloads begin, along with the download loop that starts afterwards, are both good areas to focus for this enhancement.

Field guide section on 'irregular'

Describe the irregular variable type

Feature Summary
give a description of irregular variables. List examples and point towards the relevant sections in the fieldguide.

Field guide section on 'periodic'

Describe how we define 'periodic'.

Feature Summary
write a precise description of how we define 'periodic' and link to typical types of variables stars.

Also discuss how we handle intermediate cases (semi-regular variables)

HR Diagram Unit Test failing

/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: invalid value encountered in log10
result = getattr(ufunc, method)(*inputs, **kwargs)
/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: invalid value encountered in log10
result = getattr(ufunc, method)(*inputs, **kwargs)
Traceback (most recent call last):
[ยท] Generating HR diagrams for Golden sets
File "/home/runner/work/scope/scope/./scope.py", line 755, in
fire.Fire(Scope)
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/runner/work/scope/scope/./scope.py", line 424, in doc
plot_gaia_hr(
File "/home/runner/work/scope/scope/scope/utils.py", line 243, in plot_gaia_hr
ax.errorbar(
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/matplotlib/init.py", line 1423, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 3587, in errorbar
raise ValueError(
ValueError: 'yerr' must not contain negative values
[โœ—] Generating HR diagrams for Golden sets

Centralize fritz/kowalski authentication

Feature Summary
Any script/import that involves fritz.py will run the code at lines 68-70 of that file, right before api(). After changing the file opened by those lines from fritz.yaml to config.yaml (see #91), we could read in the user's fritz token within fritz.py and eliminate the need to explicitly supply it in a secrets.json file for scripts like scope_download_classification, scope_upload_classification, and scope_manage_annotation. This would save the user time and streamline the scripts' inputs.

Re-run updated DNN

#53 removed a dropout that was limiting DNN performance. I don't think the DNN was re-run on a large scale since then, so we should do that to gauge its current performance.

Missing sample CVs

Sample objects for the CV subclasses are missing in config.defaults.yaml, which results in empty plots here.

Improve setup documentation

The body, headers, and order of the "Setting up your environment" documentation should be updated to describe the installation process in a more linear way. For example, there are some sections under the current Windows/Linux/MacOS (x86-64) header that a MacOS ARM64 user should perform as well.

Field guide section on 'dipping'

Write a fieldguide section on dipping

Feature Summary
Write a concise description of dipping. Make clear how this differs from eclipsing, and give example types of 'dipping'

Make scope_download_classification.py robust to server errors

The scope_upload_classification.py and fritz.py code was recently updated to handle certain errors that pertain to brief interruptions in the connection to Fritz. We should do the same for the download script, which currently fails and does not save any output in the case of a single interruption to the connection.

Feature Summary
Added robustness to scope_download_classification.py

Usage / behavior

  • Upon a connection error, retry the API call a specified number of times.
  • Establish checkpoints to occasionally save downloaded data so all progress is not lost if the download loop is broken.
  • Allow the user to continue an interrupted download from a specified source

Implementation details

  • The api calls in scope_upload_classification.py and fritz.py have examples of handling server connection errors.

Query across multiple machines

There are catalogs of interest spread across the kowalski, gloria, and melman machines. It would be optimal to be able to use one query to get data without needing to specify the machine on which it resides.

A good longer-term solution could be sharding (e.g. here). For the short term, see this PR to penquins that introduces a new KowalskiInstances class that authenticates each machine and queries the one containing the catalog of interest. If/when this PR gets approved, we should revise the scope code to include this functionality, preempting confusion amid the growing number of machines.

python/jupyternotebook code to upload examples including tags

We need an easy to use and flexible python function to upload a set of examples. This includes uploading the set, but also labelling the uploaded set appropriately. Most of this functionality is already there, but we need to make sure that it works under all circumstances; e.g. what happens if a source already exists etc.

top-labels for figures

Feature Summary
Figures like HR diagram and sky coverage should have top labels (==variable type)
So that they can stand on their own.

Usage / behavior
Replace current figures.

Field guide section on 'eclipsing'

Describe the phenomenological class 'eclipsing'

Feature Summary
Give a description of eclipses. Also present and discuss the EA, EB, and EW types. Show lightcurves as illustration

tensorflow error using GPU for training on M1 Mac

When running ./scope.py train and including --gpu 0 to specify use of the GPU, I get an error even though the GPU is recognized and available. I think this may happen because the ResourceApplyAdamWithAmsgrad operation is not currently supported by tensorflow-metal (see e.g. this discussion and its similarity to the error messages below). I've tried upgrading to the latest version of tensorflow-metal (0.6.0) but still get the error. Fortunately training still runs reasonably fast on the CPU.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/conv_conv_1/separable_conv2d/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_conv_1/separable_conv2d/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].

Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdamWithAmsgrad: CPU ReadVariableOp: GPU CPU _Arg: GPU CPU

Document/video with detailed example of how to classify an object

A document/video is needed that explains in detail how to classify a lightcurve

Feature Summary
A document with multiple examples of how to classify objects. This includes a concise description with all the actions and information used (PS1 cutout, folded lightcurve etc).

Implementation details
google doc with multiple examples; and possibly a video

Lightcurve y-range is reverse for WUma

The lightcurve figure y-limits are reversed for the WUma figure

Describe the bug
for the example lightcurve figure, the y-limits are reversed. A better method to determine the y-limits are needed.

To Reproduce
see the WUma fieldguide.

Expected behavior
The y-limits of the lightcurve should show correct range, with high magnitudes at the bottom and low magnitudes at the top.

Screenshots
image

Combine get_features and get_field_features

It appears that the main difference between these two functions in get_features.py is the outfile variable containing the path to the directory for saved results. I think these functions can be combined into one that uses the existing whole_field keyword to assign the correct path.

Improve speed of source uploads

Currently scope_upload_classification.py can upload ~1000 sources/classifications/light curves/annotations to Fritz per hour if two instances are run simultaneously on independent parts of source lists (using the -start -stop arguments). The server starts to complain if three or more instances are run together. See also #89.

Feature Summary
With the potential for millions of sources to be uploaded going forward, we should determine how much our code can be streamlined to make uploads complete faster.

Implementation details
Anything within our loop over each source is a good target for speed enhancements.

Generate a .csv trainingset from labels in Fritz

code is needed to take the classifications from Fritz, clean them, put them in the correct format and add the features

Feature Summary
The input are labels from Fritz, and the main (single) output should be a .csv file that is the same format as the D15 trainingset.

The code should

  • download objects and labels from Fritz
  • download and combine labels from multiple groups on Fritz
  • handle duplicate objects and labels
  • verify the upstream labels (e.g. a delta scuti should be variable and periodic)
  • get the features per lightcurves and join the with the labels
  • save everything as a .csv file (exactly the same format as d15). Add a header that specifies the code version, and time of download from Fritz

Short feature description

Please fill out relevant sections below; remove those that are unused.

Feature Summary
In the page where the light curve can be folded, show the folded light curve ("period" tag on top) by default.
There are cases where the metadata does not include a period. It should do the mag in that case (currently the
period tag does not appear in such a case).

Additional context
See the following for an example: https://fritz.science/source/ZTFJ025839.18+542006.7

Screen Shot 2022-01-26 at 5 06 15 PM

Field guide section on 'variables'

Write a section on what we define as 'variable' nd what is a 'non-variable'

Feature Summary
The main point of this section is to have a clear definition of what we define is a 'variable' and non-variable.

Usage / behavior
same as other pages with a few example lightcurves, but an HR diagram or skylocation might not be relevant here

Implementation details
similar to other pages

Additional context
Add any other context or screenshots about the feature request here.

Duplicate rows in group source downloads

While this issue is connected to and raised on SkyPortal, it is worth noting here as well that group source downloads from Fritz can yield files with duplicate rows. For the ~80,000 sources in the golden dataset group, ~5-10% of rows may be duplicated when using scope_download_classification.py. See the linked issue for more discussion about potential solutions to the problem.

PS1 cutout and photometry "alternately" missing

Please fill out relevant sections below; remove those that are unused.

Describe the bug
When object details are loaded through expanding the '>' sign on the left, various cutouts are loaded among other things. The PS1 cutout does not appear. Once you click on the object id, it opens in the new page, and now it is seen in the original page too, but the photometry has vanished. See the before and after screenshots below.

To Reproduce
Go to https://fritz.science/group_sources/371 and do as suggested above

Expected behavior
PS1 cutout should load at the start. Photometry should not vanish.

Screenshots
Screen Shot 2022-01-26 at 4 49 58 PM
Screen Shot 2022-01-26 at 4 50 55 PM

Platform information:
Chrome on Mac

Additional context
Program 371: https://fritz.science/group_sources/371

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.