GithubHelp home page GithubHelp logo

nmma_fitter's People

Contributors

breed137 avatar tahumada avatar tylerbarna avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

nmma_fitter's Issues

Replace built-in plotting in nmma_fit.py with the new nmma functionality

nmma has been updated since the creation of the pipeline so that light_curve_analysis now has the --plot flag; this plots the best fit of the lightcurve, making the plotting part of nmma_fit.py redundant. Rework nmma_fit.py and any relevant scripts so they make use of the new nmma feature for plotting

Reduce wait times for overall job

The structure of make_jobs.py means that it won't send fits of targets until all models have been run for every target or one of them times out. This is a bit of an issue because occasionally one model gets stuck for an object (or one object gets stuck on all the models). Not sure exactly how to revise this code, but it would be good if there was some way to identify jobs that are holding the overall script back and possibly submit the other fits to be shown on slack ahead of time with some message about another job or jobs taking too long

Add flag for days that were not fit in real-time

Add some sort of flag, likely a small .txt file, a line in the log file, or something inside the daily directory that indicates it wasn't run during the standard daily fitting pipeline but rather via the manual trigger or catch up scripts

MSI job submission

Currently seems that MSI has nmma_make_jobs held up in the queue with the reason being "launch failed requeued held." Checking the ztfrest email, it hasn't run since August 2nd, though it seems one job was running until late on the 3rd going by file change times. The scrontab job is still active, but it hasn't run. Will check with MSI to see what the issue is.

Remove plotting from nmma_fit.py

Removing the latter part of nmma_fit.py and making it a separate script would make it easier to plot data that's been run outside of the script

Correct mag values for paper_candidates

When creating formatted files for the pipeline to automatically process the paper candidates, I was unsure whether "magpsf" or "magzpsci" corresponded to the correct data to use as the "mag" variable.

alter check for candidate_fits directory existence

Currently, the make_jobs script will check if there is a directory in candidate_fits that corresponds to the new candidates daily folder. If it exists, the make_jobs.py script will exit. This behaviour prevents make_jobs from overwriting fits that are still being run, but I think we need to have a more robust method so we can run the script on existing objects for new models.

One option would be to create a .fitting file for each model that would be deleted upon completion or timeout to be replaced with the current .fin file. Then we could check if a .fitting or .fin file exists and exit if they are present. We could also probably make the check for candidate fits specific to each object and each model. Essentially, right now, as long as the script has been run on one object for one model, the script won't execute again, which is probably not the best option.

Issues with Piro2021 come to mind

create dry run functionality

add a way to do a "dry run" of the pipeline that doesn't actually submit jobs to slurm on msi but just checks that everything works correctly.

More descriptive file names

It would be nice to alter the behavior of the plot and log filenames such that both the name of the candidate and the model being used is included in the file names

bash script file name inconsistency

In the flow of scripts executed every day, there's a mix of .sh and .txt files that are executed, but there isn't a particularly clear reason for files being one or the other. One could either define the distinction and alter the files to reflect that or one could also just make them all .sh files. Because some of the files are referenced explicitly in other files, this would probably be good to branch and ensure compatibility before merging back in to the main branch.

Piro2021 Job Failing

It seems that the Piro2021 model is consistently failing to execute after about 30 seconds, will need to investigate further. Some information should be available in the daily Piro2021.log files located on MSI and schoty in the subdirectories generated in candate_fits/

Add Force Fit

Might be good to add a flag to force a fit of an object, even if there aren't 2 detections. Probably wouldn't use it for the daily runs so as to not take too many resources on msi, but might be good just for illustrative purposes. Would probably be added to make_jobs.py and would alter behavior around Line 138 of this commit

histplot for bar plot issue

So, inspecting data a bit more closely, I think that there's an issue with the way I use the histplot argument which results in double counting or something like that; this is noticeable comparing the lineplot and histplot implementations of numDailyCands plot. need to investigate further, especially for the sampling times plots since those are planned to be added to the paper.

[Model]job.txt layer redundant?

Reviewing the code, I'm unsure if the step that creates jobs using the [Model]job.txt scripts are really necessary. The only significant difference between them is the hard-coded definition of the number of live points and cpus as well as the specific cluster to run the job on. I realize we have each object/model fit submitted as a separate job to create a pseudo-parallelized process, but perhaps we could streamline this by creating one generic script through which the jobs are created? It's currently manageable, but it we start to add more models or create a system for dynamically adding models, it would quickly become excessively unwieldy

Method for plotting/counting the number of days in stats.py may cause issues

The way a lot of stats.py functions work is finding the number of directories corresponding to a day of fits, but this might cause issues if there are days where not all models are fit (think trying to plot or sum arrays/lists of different lengths).

A solution might be to use the day directory name as the index or a column of a pandas dataframe to add candidates.fits directories/files to, then a column of the specific object, then a column corresponding to the object in question, then a column for each model. Could then search for each model expected in the object directory and place a Null flag or 0 value if the fit (probably the result.json) is not present.

This would potentially allow for an easy way of comparing the number of models that were fit to candidates against the ones that were not successfully fit. This would require a change in the behavior at the start of stats.py and some of the ways different functions process the data for plotting, but this would also offload the amount of redundant work being done to compile file statistics, as this would all be completed at the start

Address Issues with stalled jobs

Would be good to add something to handle stalled fitting jobs outside of just having MSI kill the job at the time limit because the kill behaviour results in no creation of a .fin file for that job. This messes with the overall job as a result.

Directory Bloat and explicit file paths

Need to reduce bloat in main directory, particularly with respect to priors being kept in the root directory. Also need to make clear what scripts do what and what order they're called.

Current state of repo is geared explicitly towards using nmma_fitter on the msi system; at various points, the pipeline makes explicit assumptions about absolute file paths that are located in the msi directory. Work needs to be done to make these various file paths either relative paths or arguments provided when executing the scripts.

One of my ongoing projects is getting the pipeline working on local systems, which should hopefully motivate more platform-agnostic changes to the codebase

Schoty connection is broken

As of August 4th, it seems that the pipeline is unable to connect with schoty in order to check for new data or sync new fits. When attempting to connect to schoty independently, I am told the previously-used password is incorrect. Will need to follow up about schoty status.

Account for potentially unreadable/corrupt json files in stats.py

As it stands, the current implementation doesn't account for instances where fits have occured, but for whatever reason, the json file can't be read. While this is an edge case, this is something the slack bot accounts for, so it would be good to find a way to account for it in the stats area

Edge-Case Issue with Candidate Name Processing

When attempting to run the pipeline on paper candidates, I noticed that the current behavior is such that the pipeline can't process two files on the same day that represent the forced and not-forced lightcurves of one object. This is because, in the current behavior for make_jobs.py, the candname variable is generated by splitting the file name wherever an underscore is used and then defining candname to be the second element in the split list (see line 113 of make_jobs.py). This is a fairly edge-case issue, especially for daily automated runs, but it shouldn't be too hard to add something to prevent it from occurring.

rework pipeline to use settings.json

Reworking make_jobs.py and nmma_fit.py so any variables that assume a specific computer (eg MSI) are moved out of the scripts and into settings.json, which are then read into the scripts so they aren't hard coded into the pipeline. This will make it a lot easier to deploy the pipeline on other slurm-based systems

add wrapper script to stats.py

Would probably be good to pull the arguments and whatnot out of the stats.py function itself and have stats.py contain only function definitions.

Would require some changes to default arguments, as some of the functions assume the system arguments to exist

Packaging of repo

There's a lot of functionality that could probably be abstracted into an importable python package, but this would have to occur after #2 is addressed or else it won't be particularly useful. This would also make it easier to merge nmma_fitter into nmma main eventually

stats.py get_dataframe reads in csv files incorectly

The file read in behavior results in a dataframe that is different compared to if the dataframe was freshly baked. indices with np.nan are read in as blank, causing issues later. need to account for this when reading in the file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.