radiasoft / devops Goto Github PK
View Code? Open in Web Editor NEWsysadmin
License: Apache License 2.0
sysadmin
License: Apache License 2.0
Trying to run Warp in an IPython notebook on the Jupyter server on the blade by running the standard Warp import command:
from warp import *
Results in the error:
usage: main.py [-h] [-p DECOMP DECOMP DECOMP] [-l LOCALFLAGS]
[--pnumb PNUMB]
main.py: error: unrecognized arguments: -f /home/vagrant/.local/share/jupyter/runtime/kernel-5336e0f5-6a55-47c3-ba4f-1a37654a5290.json
Trying to import Warp through warp_init_tools gives a similar error.
Calling a Python Warp command file from the command line will run Warp successfully.
The postgres db isn't crucial as the user data are all in NFS. However, we should have a backup of the db on a regular basis. Has to be run on the JupyterHub machine, because it is writing locally.
@cchall writes:
So I discovered today while using an ipython notebook on the blade that the notebook will fail to save if it becomes too large. I'm not sure exactly what the size limit is or where the failure point occurs, but I was in the middle of analyzing 1000, 1MB files when I discovered the failure.
While I can restart the kernel and clear all output and then make a save this is a little unsatisfactory as it would be preferable to be able to just save the notebook cells without having the notebook try to save output so that I don't have to stop work just to make a save.
I'm not sure if this is a restriction that is being imposed by ipython or by the fact that I am running on the blade but I wanted to throw the problem out there.
In order to help deter attacks, it made sense to me to give host names secret names. However, I'm beginning to rethink this strategy.
Let's say we have salt.radiasoft.net, which is our salt master. That's a very important host. Would it matter that the host be secret? What if it's IP address was known? The host has to be secured so external probes of any port won't be vulnerable. Therefore, simply knowing the name is not really knowing anything useful.
Once inside our network, the name would be obvious (just look in /etc/salt/* or for a file with the contents "master:". That's where the real attack might come, because the salt minions call into the master. If there are any bugs in salt, that's where the breach occur.
So I think we should drop the secret hostname policy.
Debian uses aufs as the docker storage device by default. Need to use devicemapper, because user ids do not match on shared volumes and chowns have to happen sometimes. Need to reinstall docker with devicemapper using loopback (usually vagrant or test machines) unless already installed with devicemapper with real block devices (production).
Need to configure ipython on apa19 for multiple users.
BNL uses Debian Jessie (8.1). Need installer for full service
Not sure what's going on apa1, but the builds are failing with rpmdb errors. Need to create a Fedora 2X build system. For now will build on my VM.
We are getting found by robots which may be causing user creations.
At the very least protect against force pushing. See:
The SSL cert is expiring today, and we aren't renewing. Updaters will get confused so need to kill them.
We need to decide which server or VM.
License info will be communicated privately.
I need to run Synergia via Jupyter hub on the blades.
Recent fixes made by the Synergia development team are required.
The last commit that must be included:
commit 8d8761f3597a668b7d2ecc6bd770640006412af6
Author: Eric G. Stern [email protected]
Date: Fri Mar 25 17:17:11 2016 -0500
We don't get that much spam to radiasoft-misc (maybe one a day). I did a very quick check and half of it would not be flagged even by a loose spam filter.
I'll leave this ticket here, but it is not a high priority
Sirepo image builds happen nightly. Ideally, push to alpha automatically. Eventually, run test suite.
Would like to have build machine use Salt to manage the builds and pushes. This may add too
much complexity right now.
Don't want automatic checkin builds, because state may be inconsistent on build (e.g. SRW not updated simultaneously). Nightly seems good compromise.
Needed to move off some personal stuff that was clogging the backups so we have space for owncloud
@mrakitin please verify alpha works to your satisfaction. We'll push to beta once you give ok.
Services are starting, but systemctl is reporting that they aren't running.
A cluster at BNL fails to use all cores when run in Docker. Running outside Docker runs normally.
We have isolated it to running N instances of SRW inside a single container. If you run 2 instances (i=2) with 4 slaves (n=4, mpiexec -n 4), 8 cores are used. However, if your run, say, i=4 and n=4, it only uses 8 cores.
Adding @mrakitin
https://github.com/radiasoft/rssynergia
We have users of the Jupyter server, who need this library for use of our IPython notebooks.
We need them to be able to do 'pip install' and 'pip install --upgrade'
@bruhwile writes:
I was able to login and run the 'bragg reflector' example.
However, after 30 seconds or so, my browser got a '504 Gateway Time-out' error.
Using latest version of Chrome on Win10
Files ending in .py and .txt can no longer be opened through the file browser on JupyterHub. The files can still be selected and renamed, if the extension is changed the file can be opened as normal.
@bruhwile wrote:
uploaded the following IPython Notebook:
https://github.com/radiasoft/rssynergia/blob/master/rssynergia/spacecharge_tests/sc_drift_expansion.ipynb
In the 2nd cell, I got the error that 'tables' could not be imported.
I opened a new terminal window in Jupyter and typed
pip install tables
This executed correctly and then the 2nd cell of the NB executed without error.
I then tried to execute the 3rd cell and got the following errors:
ImportErrorTraceback (most recent call last)
in ()
----> 1 from base_diagnostics import utils
2 from base_diagnostics import StandardBeam6D
3 from base_diagnostics import read_bunch
4 from base_diagnostics import workflow
5 from base_diagnostics import lfplot
ImportError: No module named base_diagnostics
So I did the following:
cd /home/vagrant/src
mkdir radiasoft
cd radiasoft
git clone https://github.com/radiasoft/rssynergia.git
cd rssynergia
pip install -e .
Trying again to execute the 3rd cell, I got a long string of error messages of the following form:
[autoreload of IPython.terminal.ipapp failed: Traceback (most recent call last):
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/IPython/extensions/autoreload.py", line 247, in check
superreload(m, reload, self.old_objects)
ImportError: cannot import name check_for_old_config
]
I decided to stop and restart the server.
Stopping it worked, but there was no way to restart it.
I logged out and logged back in.
Still no server and no way to start one.
Use the wiki associated with the 'containers' repo.
The official Synergia build system doesn't work inside our containers.
I know some sort of fix is being used to make this work for Synergia installation when the containers are built.
RadiaSoft personnel need to be able to clone and build the latest version of Synergia2.
Is the fix documented somewhere?
Even better, can we communicate with the Synergia development team to make the fix unnecessary?
Presumably the OS being used in our containers is sufficiently 'standard' that the Synergia team will want to support it?
Install mrakitin/SRW@b4023e1 on alpha
Be able to hide and show the list of participants etc.
Need to identify a machine.
The license requires node locking.
Interaction with the code involves a GUI, which implies tunneling X through ssh.
We also want to execute physics extensions, like Shadow3, which implies use of a Docker container.
We need to explore use of the Application Builder technology, which requires browser-based access.
We need a "one button" solution to the jupyterhub push, because it goes to three machines, and we would really like our internal users to do the pushes.
The problem is that updates often require changing version numbers/commits, e.g. radiasoft/containers@e412694, which I forgot to update before running the build. It would be cool if there was a Q&A (does warp need updating? Synergia? etc.) at the time of the push.
Fedora Atomic is too difficult to manage. With the use of containers for all deployment, there's little point to having a system that has transactional OS updates. Andy S. pointed out that the fact that we are deploying with containers makes it possible to have transactional updates. This doesn't work for the containers themselves anyway.
Need to setup owncloud server and copy files from copy.com
https://github.com/radiasoft/rsbeams
We have users of the Jupyter server, who need this library for use of our IPython notebooks.
We need them to be able to do 'pip install' and 'pip install --upgrade'
Each repo directory might have a "template.ipynb" which would be loaded when new notebook was created. @ncook862
Two essential updates have been added to Synergia:
We need these on the Jupyter server for the IOPTICS project.
The 'chef' library has definitely been updated.
Here's a comment from the development team:
The NonLinearPropagators.cc file lives in the build/chef-libs hierarchy. If you do a brand new build of everything you will get the updated code. Alternatively, if you do a
git pull
make install
inside of the chef-libs directory, you will be updated with the fix.
Need to document the various procedures for pushing to various systems including manual system checks.
Prepare and give tutorial on git topics for the team. This includes expanding the Git wiki.
The beta server fails to start with:
You need to start Celery:
celery worker -A sirepo.celery_tasks -l info -c 1
This check needs to be more flexible and/or loop in production. Celery is running on another server(s). When the front end reboots, rabbitmq disconnects from all clients including celery. Celery has to have time to reconnect and sirepo needs to wait for it.
Automate build, test, and deployment of srio/shadow3.
Installed comsol in multi-computer mode on apa19
It's inevitable that zfs fails to mount after a yum update.
I think the machine is just in a funny state. All the other servers work fine.
We want to move to Fedora so this would be one of the machines that moves, since it is really a production server now.
If you don't remove the container, you keep on getting the old version of the code
Need an overview of all repos. There are many. :)
Also may want general documentation in radiasoft/index, too.
For example FFMPEG https://ffmpeg.org/
and/or other matplotlib-compatible encoders
This capability is already required for the Jupyter notebooks we are delivering to a customer.
We need a system for configuring MPI cluster from jupyter
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.