tylerbarna / nmma_fitter Goto Github PK
View Code? Open in Web Editor NEWRealtime node-based lightcurve fitting using NMMA.
Realtime node-based lightcurve fitting using NMMA.
nmma has been updated since the creation of the pipeline so that light_curve_analysis now has the --plot flag; this plots the best fit of the lightcurve, making the plotting part of nmma_fit.py redundant. Rework nmma_fit.py and any relevant scripts so they make use of the new nmma feature for plotting
The structure of make_jobs.py means that it won't send fits of targets until all models have been run for every target or one of them times out. This is a bit of an issue because occasionally one model gets stuck for an object (or one object gets stuck on all the models). Not sure exactly how to revise this code, but it would be good if there was some way to identify jobs that are holding the overall script back and possibly submit the other fits to be shown on slack ahead of time with some message about another job or jobs taking too long
Add some sort of flag, likely a small .txt file, a line in the log file, or something inside the daily directory that indicates it wasn't run during the standard daily fitting pipeline but rather via the manual trigger or catch up scripts
Currently seems that MSI has nmma_make_jobs held up in the queue with the reason being "launch failed requeued held." Checking the ztfrest email, it hasn't run since August 2nd, though it seems one job was running until late on the 3rd going by file change times. The scrontab job is still active, but it hasn't run. Will check with MSI to see what the issue is.
Removing the latter part of nmma_fit.py and making it a separate script would make it easier to plot data that's been run outside of the script
When creating formatted files for the pipeline to automatically process the paper candidates, I was unsure whether "magpsf" or "magzpsci" corresponded to the correct data to use as the "mag" variable.
Currently, the make_jobs script will check if there is a directory in candidate_fits that corresponds to the new candidates daily folder. If it exists, the make_jobs.py script will exit. This behaviour prevents make_jobs from overwriting fits that are still being run, but I think we need to have a more robust method so we can run the script on existing objects for new models.
One option would be to create a .fitting file for each model that would be deleted upon completion or timeout to be replaced with the current .fin file. Then we could check if a .fitting or .fin file exists and exit if they are present. We could also probably make the check for candidate fits specific to each object and each model. Essentially, right now, as long as the script has been run on one object for one model, the script won't execute again, which is probably not the best option.
Issues with Piro2021 come to mind
A function for this exists in stats.py, just needs to be implemented
add a way to do a "dry run" of the pipeline that doesn't actually submit jobs to slurm on msi but just checks that everything works correctly.
It would be nice to alter the behavior of the plot and log filenames such that both the name of the candidate and the model being used is included in the file names
The 24db955 commit shows that there's a much more straightforward method for grouping data without all these nested dictionary and list comprehensions. This will make the code much more readable
would have to convert the day column to a datetime with astropy or something
In the flow of scripts executed every day, there's a mix of .sh and .txt files that are executed, but there isn't a particularly clear reason for files being one or the other. One could either define the distinction and alter the files to reflect that or one could also just make them all .sh files. Because some of the files are referenced explicitly in other files, this would probably be good to branch and ensure compatibility before merging back in to the main branch.
It seems that the Piro2021 model is consistently failing to execute after about 30 seconds, will need to investigate further. Some information should be available in the daily Piro2021.log files located on MSI and schoty in the subdirectories generated in candate_fits/
Might be good to add a flag to force a fit of an object, even if there aren't 2 detections. Probably wouldn't use it for the daily runs so as to not take too many resources on msi, but might be good just for illustrative purposes. Would probably be added to make_jobs.py and would alter behavior around Line 138 of this commit
So, inspecting data a bit more closely, I think that there's an issue with the way I use the histplot argument which results in double counting or something like that; this is noticeable comparing the lineplot and histplot implementations of numDailyCands plot. need to investigate further, especially for the sampling times plots since those are planned to be added to the paper.
max should be 37, but as of the current commit, it seems to be somewhere north of 50 according to numDailyCandHist. probably a mistake in the dataframe filter, but I'll have to investigate further
Reviewing the code, I'm unsure if the step that creates jobs using the [Model]job.txt scripts are really necessary. The only significant difference between them is the hard-coded definition of the number of live points and cpus as well as the specific cluster to run the job on. I realize we have each object/model fit submitted as a separate job to create a pseudo-parallelized process, but perhaps we could streamline this by creating one generic script through which the jobs are created? It's currently manageable, but it we start to add more models or create a system for dynamically adding models, it would quickly become excessively unwieldy
The way a lot of stats.py functions work is finding the number of directories corresponding to a day of fits, but this might cause issues if there are days where not all models are fit (think trying to plot or sum arrays/lists of different lengths).
A solution might be to use the day directory name as the index or a column of a pandas dataframe to add candidates.fits directories/files to, then a column of the specific object, then a column corresponding to the object in question, then a column for each model. Could then search for each model expected in the object directory and place a Null flag or 0 value if the fit (probably the result.json) is not present.
This would potentially allow for an easy way of comparing the number of models that were fit to candidates against the ones that were not successfully fit. This would require a change in the behavior at the start of stats.py and some of the ways different functions process the data for plotting, but this would also offload the amount of redundant work being done to compile file statistics, as this would all be completed at the start
Would be good to add something to handle stalled fitting jobs outside of just having MSI kill the job at the time limit because the kill behaviour results in no creation of a .fin file for that job. This messes with the overall job as a result.
Looking at the cumFitTimeStack vs cumFitTime, it looks like there might be an issue with the way seaborn calculates the cumulative fit time for the histplot that's ordered by hue vs the overlapped version.
Need to reduce bloat in main directory, particularly with respect to priors being kept in the root directory. Also need to make clear what scripts do what and what order they're called.
Current state of repo is geared explicitly towards using nmma_fitter on the msi system; at various points, the pipeline makes explicit assumptions about absolute file paths that are located in the msi directory. Work needs to be done to make these various file paths either relative paths or arguments provided when executing the scripts.
One of my ongoing projects is getting the pipeline working on local systems, which should hopefully motivate more platform-agnostic changes to the codebase
As of August 4th, it seems that the pipeline is unable to connect with schoty in order to check for new data or sync new fits. When attempting to connect to schoty independently, I am told the previously-used password is incorrect. Will need to follow up about schoty status.
As it stands, the current implementation doesn't account for instances where fits have occured, but for whatever reason, the json file can't be read. While this is an edge case, this is something the slack bot accounts for, so it would be good to find a way to account for it in the stats area
Would be nice for breaking down stats on job lengths to have a consistent way of checking how long jobs take
When attempting to run the pipeline on paper candidates, I noticed that the current behavior is such that the pipeline can't process two files on the same day that represent the forced and not-forced lightcurves of one object. This is because, in the current behavior for make_jobs.py, the candname variable is generated by splitting the file name wherever an underscore is used and then defining candname to be the second element in the split list (see line 113 of make_jobs.py). This is a fairly edge-case issue, especially for daily automated runs, but it shouldn't be too hard to add something to prevent it from occurring.
Reworking make_jobs.py and nmma_fit.py so any variables that assume a specific computer (eg MSI) are moved out of the scripts and into settings.json, which are then read into the scripts so they aren't hard coded into the pipeline. This will make it a lot easier to deploy the pipeline on other slurm-based systems
Would probably be good to pull the arguments and whatnot out of the stats.py function itself and have stats.py contain only function definitions.
Would require some changes to default arguments, as some of the functions assume the system arguments to exist
stats.py csv suggests there's an instance of manual fitting in candidate_fits, with the directory '000000-000001' or something like that
There's a lot of functionality that could probably be abstracted into an importable python package, but this would have to occur after #2 is addressed or else it won't be particularly useful. This would also make it easier to merge nmma_fitter into nmma main eventually
The file read in behavior results in a dataframe that is different compared to if the dataframe was freshly baked. indices with np.nan are read in as blank, causing issues later. need to account for this when reading in the file
It seems like most of the sampling_times plots either fail or don't work as expected in stats.py as of right now
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.