GithubHelp home page GithubHelp logo

benw1 / wings Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 161.57 MB

WFIRST Infrared Nearby Galaxies Survey helpful code for making and processing WFIRST simulations

Python 1.84% Perl 0.06% Jupyter Notebook 64.21% Dockerfile 0.01% Roff 33.89% Shell 0.01%

wings's People

Contributors

athob avatar benw1 avatar rubab1 avatar tristan3214 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

wings's Issues

Versioning tables to control attempted stale updates

With concurrent sessions we are likely to read data and have it become stale later on. When we have highly independent and isolated sessions then we can't easily notify these sessions when their data has gone stale. One way to solve this, is to simply add a version to the database tables. When you go to update a value we can check the table version and if there is a mismatch then read the new value and update again. Each update then gives a new table version. This can just be done for the tables where you expect you'll will need to read and update concurrently and avoid putting this extra work on tables you won't concurrently update.

Here is an example of how to do this in SQLAlchemy.
https://docs.sqlalchemy.org/en/13/orm/versioning.html

"Lock wait timeout exceeded" on import

When I attempt importing wpipe in 2 distinct python instances, the second attempt hold, on a Lock wait timeout exceeded until I either close the first python instance or call a wpipe.si.commit() or wpipe.si.rollback() in it. That second attempt appears to be blocked when it tries to create the DefaultUser, specifically when retrying repeatedly to SELECT FOR UPDATE the users table. Somehow, it fails to obtain the lock from the FOR UPDATE clause, and this seems to be affected by the import attempt of the first python instance. As far as my understanding of MySQL goes, this should mean that the session in the first python instance is keeping a lock on that table. However, running the query SHOW OPEN TABLES IN wpipe; from within a MySQL interactive console between the 2 imports does not return that the users table is being used or locked.

I have sqlalchemy 1.3.19 installed and tried with 1.3.17, but the error is still there. I also have MySQL 5.7.31, although I haven't investigated whether or not the version of MySQL is impacting this. I have tried adding an explicit session.execute("UNLOCK TABLES") at the end of the wpipe.si.commit function, without any success.

Job feature: handling "out-of-python" sub-processes

The current implementation of wpipe keeps track of a job workload as long as it is part of the python process that corresponds to that job. It remains ignorant of any workload handled by a sub-process that this python process may submit. It would be handy to equip the Job object with methods that could help to keep track of these sub-processes.

Job feature: kill child events jobs if parent job terminates with an error

The instance method _ending_todo of a Job object handles the exiting procedure, whether or not the Job completed or exited with an error.

WINGS/src/wpipe/Job.py

Lines 512 to 519 in e3d7519

def _ending_todo(self):
if hasattr(sys, "last_value"):
self.state = repr(sys.last_value)
else:
self.state = JOBCOMPSTATE
self._job.endtime = datetime.datetime.utcnow()
self._job.timestamp = datetime.datetime.utcnow()
si.commit()

In the latter case, it only updates the state column in the database to reflect the type of error encountered. Additionally, it should also force-terminates the jobs of any child event it generated and fired.

Testing: improve testing features

At the moment there is a functional set of data to test the functionalities of wingspipe in the corresponding directory. This can already be used to add a small test of wingspipe init, wingspipe run and wingspipe delete to the github workflow.

Ideally, the testing unit needs to be more thorough: notably, it would need to check:

  • if the database has been updated accordingly
  • if the wingspipe run did process correctly
    But to be fully complete, the testing unit could also test each object and methods, for which each source file could have its own testing unit.

Pipeline feature: diagnose method

WINGS/src/wpipe/Pipeline.py

Lines 507 to 511 in e3d7519

def diagnose(self):
"""
Diagnose current state of the pipeline. TODO
"""
pass

This method is meant to do a thorough diagnostics of the pipeline overall state, in particular, if it is currently active or inactive - i.e. if it has currently running jobs or not. It is mostly meant to be used via the wingspipe diagnose command that calls the explicitly corresponding case in the wpipe.wingspipe function:

WINGS/src/wpipe/__init__.py

Lines 296 to 297 in e3d7519

elif args.which == 'diagnose':
my_pipe.diagnose()

As such, its implementation shall print a report of the current status of the pipeline tree of jobs. One such report should give a clear overview of whether or not the pipeline is currently running, and if not if it has completed or terminated prematurely due to an error.

Pipeline run: implement more re-run cases

The current implementation of wingspipe run goes iteratively through all child events and fired jobs, skipping or re-running jobs based on a case by case assessment described in the fire method of every child events:

WINGS/src/wpipe/Event.py

Lines 311 to 346 in e3d7519

def fire(self):
"""
Fire the task associated to this event.
Notes
-----
If this task was already previously fired, the method check if its
job has completed. In the case it did, it calls the fire method of
each child event of that jobs. In the case, it did not complete, it
either fires the task again if the job did not complete due to an
error, or it does nothing if the job is just still running.
"""
if len(self.fired_jobs):
fired_job = self.fired_jobs[-1]
if fired_job.has_completed:
if len(fired_job.child_events):
for child_event in fired_job.child_events:
child_event.fire()
else:
print() # that branch has completed
else:
if fired_job.is_active:
print() # fired_job keep going
else:
if fired_job.task_changed:
self.__fire(fired_job.task)
else:
print() # task will produce same error
else:
for task in self.pipeline.tasks:
for mask in task.masks:
if (self.name == mask.name) & ((self.value == mask.value) | (mask.value == '*')):
self.__fire(task)
return
raise ValueError(
"No mask corresponding to event signature {name='%s',value='%s'}" % (self.name, self.value))

It should be implemented in such a way that it re-run jobs:

  • if the parent task was changed even if the job completed (currently it simply skip completed jobs, exploring its child events if it generated some)
  • if the parent config file was modified (to take into account cases where jobs exited with an error due to incomplete or wrong config)

wingspipe: add verbose flag

All wingspipe subcommands could have a -v/--verbose feature to describe in detail what it does. It is most important for wingspipe run to indicate when it meets a job that has completed and it routes through its child events, and when it meets a job that didn't complete and attempts to re-run it if it had exited with an error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.