tylerbarna / nmma_fitter Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 607.62 MB

Realtime node-based lightcurve fitting using NMMA.

Python 20.00% Shell 1.38% Jupyter Notebook 78.62%

nmma_fitter's People

Contributors

Stargazers

Watchers

Forkers

tahumada borderbenja05

nmma_fitter's Issues

Replace built-in plotting in nmma_fit.py with the new nmma functionality

nmma has been updated since the creation of the pipeline so that light_curve_analysis now has the --plot flag; this plots the best fit of the lightcurve, making the plotting part of nmma_fit.py redundant. Rework nmma_fit.py and any relevant scripts so they make use of the new nmma feature for plotting

Reduce wait times for overall job

The structure of make_jobs.py means that it won't send fits of targets until all models have been run for every target or one of them times out. This is a bit of an issue because occasionally one model gets stuck for an object (or one object gets stuck on all the models). Not sure exactly how to revise this code, but it would be good if there was some way to identify jobs that are holding the overall script back and possibly submit the other fits to be shown on slack ahead of time with some message about another job or jobs taking too long

Add flag for days that were not fit in real-time

Add some sort of flag, likely a small .txt file, a line in the log file, or something inside the daily directory that indicates it wasn't run during the standard daily fitting pipeline but rather via the manual trigger or catch up scripts

MSI job submission

Currently seems that MSI has nmma_make_jobs held up in the queue with the reason being "launch failed requeued held." Checking the ztfrest email, it hasn't run since August 2nd, though it seems one job was running until late on the 3rd going by file change times. The scrontab job is still active, but it hasn't run. Will check with MSI to see what the issue is.

Remove plotting from nmma_fit.py

Removing the latter part of nmma_fit.py and making it a separate script would make it easier to plot data that's been run outside of the script

Correct mag values for paper_candidates

When creating formatted files for the pipeline to automatically process the paper candidates, I was unsure whether "magpsf" or "magzpsci" corresponded to the correct data to use as the "mag" variable.

alter check for candidate_fits directory existence

Currently, the make_jobs script will check if there is a directory in candidate_fits that corresponds to the new candidates daily folder. If it exists, the make_jobs.py script will exit. This behaviour prevents make_jobs from overwriting fits that are still being run, but I think we need to have a more robust method so we can run the script on existing objects for new models.

One option would be to create a .fitting file for each model that would be deleted upon completion or timeout to be replaced with the current .fin file. Then we could check if a .fitting or .fin file exists and exit if they are present. We could also probably make the check for candidate fits specific to each object and each model. Essentially, right now, as long as the script has been run on one object for one model, the script won't execute again, which is probably not the best option.

Issues with Piro2021 come to mind

set more consistent plot styling in stats.py

A function for this exists in stats.py, just needs to be implemented

create dry run functionality

add a way to do a "dry run" of the pipeline that doesn't actually submit jobs to slurm on msi but just checks that everything works correctly.

More descriptive file names

It would be nice to alter the behavior of the plot and log filenames such that both the name of the candidate and the model being used is included in the file names

replace dictionaries in stats.py with grouped dataframes

The 24db955 commit shows that there's a much more straightforward method for grouping data without all these nested dictionary and list comprehensions. This will make the code much more readable

rework stats.py to use dateTimes

would have to convert the day column to a datetime with astropy or something

bash script file name inconsistency

In the flow of scripts executed every day, there's a mix of .sh and .txt files that are executed, but there isn't a particularly clear reason for files being one or the other. One could either define the distinction and alter the files to reflect that or one could also just make them all .sh files. Because some of the files are referenced explicitly in other files, this would probably be good to branch and ensure compatibility before merging back in to the main branch.

Piro2021 Job Failing

It seems that the Piro2021 model is consistently failing to execute after about 30 seconds, will need to investigate further. Some information should be available in the daily Piro2021.log files located on MSI and schoty in the subdirectories generated in candate_fits/

Add Force Fit

Might be good to add a flag to force a fit of an object, even if there aren't 2 detections. Probably wouldn't use it for the daily runs so as to not take too many resources on msi, but might be good just for illustrative purposes. Would probably be added to make_jobs.py and would alter behavior around Line 138 of this commit

histplot for bar plot issue

So, inspecting data a bit more closely, I think that there's an issue with the way I use the histplot argument which results in double counting or something like that; this is noticeable comparing the lineplot and histplot implementations of numDailyCands plot. need to investigate further, especially for the sampling times plots since those are planned to be added to the paper.

plotCandidates not counting number of candidates correctly

max should be 37, but as of the current commit, it seems to be somewhere north of 50 according to numDailyCandHist. probably a mistake in the dataframe filter, but I'll have to investigate further

[Model]job.txt layer redundant?

Reviewing the code, I'm unsure if the step that creates jobs using the [Model]job.txt scripts are really necessary. The only significant difference between them is the hard-coded definition of the number of live points and cpus as well as the specific cluster to run the job on. I realize we have each object/model fit submitted as a separate job to create a pseudo-parallelized process, but perhaps we could streamline this by creating one generic script through which the jobs are created? It's currently manageable, but it we start to add more models or create a system for dynamically adding models, it would quickly become excessively unwieldy

Method for plotting/counting the number of days in stats.py may cause issues

The way a lot of stats.py functions work is finding the number of directories corresponding to a day of fits, but this might cause issues if there are days where not all models are fit (think trying to plot or sum arrays/lists of different lengths).

A solution might be to use the day directory name as the index or a column of a pandas dataframe to add candidates.fits directories/files to, then a column of the specific object, then a column corresponding to the object in question, then a column for each model. Could then search for each model expected in the object directory and place a Null flag or 0 value if the fit (probably the result.json) is not present.

This would potentially allow for an easy way of comparing the number of models that were fit to candidates against the ones that were not successfully fit. This would require a change in the behavior at the start of stats.py and some of the ways different functions process the data for plotting, but this would also offload the amount of redundant work being done to compile file statistics, as this would all be completed at the start

Address Issues with stalled jobs

Would be good to add something to handle stalled fitting jobs outside of just having MSI kill the job at the time limit because the kill behaviour results in no creation of a .fin file for that job. This messes with the overall job as a result.

fitTimeStack potentially incorrect cumulative summation

Looking at the cumFitTimeStack vs cumFitTime, it looks like there might be an issue with the way seaborn calculates the cumulative fit time for the histplot that's ordered by hue vs the overlapped version.

Directory Bloat and explicit file paths

Need to reduce bloat in main directory, particularly with respect to priors being kept in the root directory. Also need to make clear what scripts do what and what order they're called.

Current state of repo is geared explicitly towards using nmma_fitter on the msi system; at various points, the pipeline makes explicit assumptions about absolute file paths that are located in the msi directory. Work needs to be done to make these various file paths either relative paths or arguments provided when executing the scripts.

One of my ongoing projects is getting the pipeline working on local systems, which should hopefully motivate more platform-agnostic changes to the codebase

Schoty connection is broken

As of August 4th, it seems that the pipeline is unable to connect with schoty in order to check for new data or sync new fits. When attempting to connect to schoty independently, I am told the previously-used password is incorrect. Will need to follow up about schoty status.

Account for potentially unreadable/corrupt json files in stats.py

As it stands, the current implementation doesn't account for instances where fits have occured, but for whatever reason, the json file can't be read. While this is an edge case, this is something the slack bot accounts for, so it would be good to find a way to account for it in the stats area

Increase utility of job logging

Would be nice for breaking down stats on job lengths to have a consistent way of checking how long jobs take

Edge-Case Issue with Candidate Name Processing

When attempting to run the pipeline on paper candidates, I noticed that the current behavior is such that the pipeline can't process two files on the same day that represent the forced and not-forced lightcurves of one object. This is because, in the current behavior for make_jobs.py, the candname variable is generated by splitting the file name wherever an underscore is used and then defining candname to be the second element in the split list (see line 113 of make_jobs.py). This is a fairly edge-case issue, especially for daily automated runs, but it shouldn't be too hard to add something to prevent it from occurring.

rework pipeline to use settings.json

Reworking make_jobs.py and nmma_fit.py so any variables that assume a specific computer (eg MSI) are moved out of the scripts and into settings.json, which are then read into the scripts so they aren't hard coded into the pipeline. This will make it a lot easier to deploy the pipeline on other slurm-based systems

add wrapper script to stats.py

Would probably be good to pull the arguments and whatnot out of the stats.py function itself and have stats.py contain only function definitions.

Would require some changes to default arguments, as some of the functions assume the system arguments to exist

candidate_fits folder has manual data

stats.py csv suggests there's an instance of manual fitting in candidate_fits, with the directory '000000-000001' or something like that

Packaging of repo

There's a lot of functionality that could probably be abstracted into an importable python package, but this would have to occur after #2 is addressed or else it won't be particularly useful. This would also make it easier to merge nmma_fitter into nmma main eventually

stats.py get_dataframe reads in csv files incorectly

The file read in behavior results in a dataframe that is different compared to if the dataframe was freshly baked. indices with np.nan are read in as blank, causing issues later. need to account for this when reading in the file

sampling_time plots in stats.py misbehave

It seems like most of the sampling_times plots either fail or don't work as expected in stats.py as of right now

tylerbarna / nmma_fitter Goto Github PK

nmma_fitter's People

Contributors

Stargazers

Watchers

Forkers

nmma_fitter's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs