GithubHelp home page GithubHelp logo

devops's People

Contributors

robnagler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

devops's Issues

Warp Import Error on Jupyter Server

Trying to run Warp in an IPython notebook on the Jupyter server on the blade by running the standard Warp import command:
from warp import *

Results in the error:

usage: main.py [-h] [-p DECOMP DECOMP DECOMP] [-l LOCALFLAGS]
[--pnumb PNUMB]
main.py: error: unrecognized arguments: -f /home/vagrant/.local/share/jupyter/runtime/kernel-5336e0f5-6a55-47c3-ba4f-1a37654a5290.json

Trying to import Warp through warp_init_tools gives a similar error.
Calling a Python Warp command file from the command line will run Warp successfully.

Backup JupyterHub Postgres

The postgres db isn't crucial as the user data are all in NFS. However, we should have a backup of the db on a regular basis. Has to be run on the JupyterHub machine, because it is writing locally.

large ipython notebook saves fail

@cchall writes:
So I discovered today while using an ipython notebook on the blade that the notebook will fail to save if it becomes too large. I'm not sure exactly what the size limit is or where the failure point occurs, but I was in the middle of analyzing 1000, 1MB files when I discovered the failure.

While I can restart the kernel and clear all output and then make a save this is a little unsatisfactory as it would be preferable to be able to just save the notebook cells without having the notebook try to save output so that I don't have to stop work just to make a save.

I'm not sure if this is a restriction that is being imposed by ipython or by the fact that I am running on the blade but I wanted to throw the problem out there.

secret host names?

In order to help deter attacks, it made sense to me to give host names secret names. However, I'm beginning to rethink this strategy.

Let's say we have salt.radiasoft.net, which is our salt master. That's a very important host. Would it matter that the host be secret? What if it's IP address was known? The host has to be secured so external probes of any port won't be vulnerable. Therefore, simply knowing the name is not really knowing anything useful.

Once inside our network, the name would be obvious (just look in /etc/salt/* or for a file with the contents "master:". That's where the real attack might come, because the salt minions call into the master. If there are any bugs in salt, that's where the breach occur.

So I think we should drop the secret hostname policy.

chown fails inside docker container using AUFS

Debian uses aufs as the docker storage device by default. Need to use devicemapper, because user ids do not match on shared volumes and chowns have to happen sometimes. Need to reinstall docker with devicemapper using loopback (usually vagrant or test machines) unless already installed with devicemapper with real block devices (production).

file download option on Jupyter server is broken

If you select the square to the left of a filename in the Jupyter server, a new UI pops up with options:
Duplicate Rename Move Download

See screenshot:
image

Selecting the 'Download' button should download the file, but instead you see the error message here:
image

kill radtrack updaters

The SSL cert is expiring today, and we aren't renewing. Updaters will get confused so need to kill them.

filter spam on radiasoft-misc

We don't get that much spam to radiasoft-misc (maybe one a day). I did a very quick check and half of it would not be flagged even by a loose spam filter.

I'll leave this ticket here, but it is not a high priority

Automate sirepo alpha builds

Sirepo image builds happen nightly. Ideally, push to alpha automatically. Eventually, run test suite.

Would like to have build machine use Salt to manage the builds and pushes. This may add too
much complexity right now.

Don't want automatic checkin builds, because state may be inconsistent on build (e.g. SRW not updated simultaneously). Nightly seems good compromise.

Docker not using all cores

A cluster at BNL fails to use all cores when run in Docker. Running outside Docker runs normally.

We have isolated it to running N instances of SRW inside a single container. If you run 2 instances (i=2) with 4 slaves (n=4, mpiexec -n 4), 8 cores are used. However, if your run, say, i=4 and n=4, it only uses 8 cores.

Adding @mrakitin

timeouts on comsol websockets

@bruhwile writes:

I was able to login and run the 'bragg reflector' example.
However, after 30 seconds or so, my browser got a '504 Gateway Time-out' error.
Using latest version of Chrome on Win10

jupyter server "tables not installed"

@bruhwile wrote:

uploaded the following IPython Notebook:
https://github.com/radiasoft/rssynergia/blob/master/rssynergia/spacecharge_tests/sc_drift_expansion.ipynb

​In the 2nd cell, I got the error that 'tables' could not be imported.

I opened a new terminal window in Jupyter and typed

pip install tables

This executed correctly and then the 2nd cell of the NB executed without error.

I then tried to execute the 3rd cell and got the following errors:

ImportErrorTraceback (most recent call last)
in ()
----> 1 from base_diagnostics import utils
2 from base_diagnostics import StandardBeam6D
3 from base_diagnostics import read_bunch
4 from base_diagnostics import workflow
5 from base_diagnostics import lfplot

ImportError: No module named base_diagnostics​

​So I did the following:

cd /home/vagrant/src
mkdir radiasoft
cd radiasoft
git clone https://github.com/radiasoft/rssynergia.git​
cd rssynergia
pip install -e .

Trying again to execute the 3rd cell, I got a long string of error messages of the following form:
[autoreload of IPython.terminal.ipapp failed: Traceback (most recent call last):
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/IPython/extensions/autoreload.py", line 247, in check
superreload(m, reload, self.old_objects)
ImportError: cannot import name check_for_old_config
]

I decided to stop and restart the server.
Stopping it worked, but there was no way to restart it.
I logged out and logged back in.
Still no server and no way to start one.

Synergia2 'devel' branch doesn't build inside RadiaSoft containers

The official Synergia build system doesn't work inside our containers.

I know some sort of fix is being used to make this work for Synergia installation when the containers are built.

RadiaSoft personnel need to be able to clone and build the latest version of Synergia2.

Is the fix documented somewhere?
Even better, can we communicate with the Synergia development team to make the fix unnecessary?
Presumably the OS being used in our containers is sufficiently 'standard' that the Synergia team will want to support it?

install COMSOL Multiphysics

Need to identify a machine.
The license requires node locking.
Interaction with the code involves a GUI, which implies tunneling X through ssh.
We also want to execute physics extensions, like Shadow3, which implies use of a Docker container.
We need to explore use of the Application Builder technology, which requires browser-based access.

Automate jupyterhub push

We need a "one button" solution to the jupyterhub push, because it goes to three machines, and we would really like our internal users to do the pushes.

The problem is that updates often require changing version numbers/commits, e.g. radiasoft/containers@e412694, which I forgot to update before running the build. It would be cool if there was a Q&A (does warp need updating? Synergia? etc.) at the time of the push.

Switch from Fedora Atomic to Fedora or CentOS

Fedora Atomic is too difficult to manage. With the use of containers for all deployment, there's little point to having a system that has transactional OS updates. Andy S. pointed out that the fact that we are deploying with containers makes it possible to have transactional updates. This doesn't work for the containers themselves anyway.

Add templates to JupyterHub

Each repo directory might have a "template.ipynb" which would be loaded when new notebook was created. @ncook862

Synergia needs to be updated on the Jupyter server

Two essential updates have been added to Synergia:

  1. explicitly linear space charge forces (some weeks ago)
  2. corrected implementation of nonlinear 'elliptic' magnet for the IOTA ring (yesterday)

We need these on the Jupyter server for the IOPTICS project.

The 'chef' library has definitely been updated.
Here's a comment from the development team:

The NonLinearPropagators.cc file lives in the build/chef-libs hierarchy. If you do a brand new build of everything you will get the updated code. Alternatively, if you do a

git pull
make install

inside of the chef-libs directory, you will be updated with the fix.

Operating procedures

Need to document the various procedures for pushing to various systems including manual system checks.

git tutorial

Prepare and give tutorial on git topics for the team. This includes expanding the Git wiki.

  • GitHub comments/subjects (maybe
  • reflog, what's that?
  • rebase/merge?
  • resets (commits, unstaged, etc.)
  • stash
  • what's the RadiaSoft workflow look like

sirepo restart fails on beta

The beta server fails to start with:

You need to start Celery:
celery worker -A sirepo.celery_tasks -l info -c 1

This check needs to be more flexible and/or loop in production. Celery is running on another server(s). When the front end reboots, rabbitmq disconnects from all clients including celery. Celery has to have time to reconnect and sirepo needs to wait for it.

apa11 yum update dkms fails

It's inevitable that zfs fails to mount after a yum update.

I think the machine is just in a funny state. All the other servers work fine.

We want to move to Fedora so this would be one of the machines that moves, since it is really a production server now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.