ozten / browserid-devops Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 235 KB

A short lived repo to hack on Devops with 6a68 and others

Puppet 67.75% Ruby 13.60% Shell 18.65%

browserid-devops's People

Contributors

Stargazers

Watchers

Forkers

jrgm hfeeki

browserid-devops's Issues

Put Squid in a seperate VM

@gene1wood via email:

Indeed, it's a distinct host. I haven't found or shared the squid confs
yet which constrain outbound access.

Put swebhead and keysigner behind nginx

Daemons on swebhead and keysigner listen on 127.0.0.1.

For webhead to contact them, it needs to go through nginx.

Ensure heartbeat only returns 200 if everything is cool

The existing nagios monitors rely on the heartbeat endpoints to diagnose any errors across the whole system.

It's essential that the heartbeats tell the truth.

Make sure they tell the truth.

Selenium VM: run.py lacks permissions to write out results html

Ozten's traceback from emailed bug report:

it looks good through a couple screens then when it goes to write to disk it dies.
I'm running as the user 'vagrant'.

This is the output from when it fails:
collecting ... collected 1 items
.bid_selenium)vagrant@selenium:~/mozilla-browserid-b1c9cca/automation-tests$ ./run.py 2>&1 > ~/run_py_output.txt
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/lib/python2.7/site-packages/py/test.py", line 4, in
sys.exit(pytest.main())
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 469, in main
exitstatus = config.hook.pytest_cmdline_main(config=config)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 421, in call
return self._docall(methods, kwargs)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 432, in _docall
res = mc.execute()
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 350, in execute
res = method(**kwargs)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/main.py", line 100, in pytest_cmdline_main
return wrap_session(config, _main)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/main.py", line 92, in wrap_session
exitstatus=session.exitstatus or (session._testsfailed and 1))
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 421, in call
return self._docall(methods, kwargs)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 432, in _docall
res = mc.execute()
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 350, in execute
res = method(**kwargs)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/terminal.py", line 314, in pytest_sessionfinish
multicall.execute()
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/_pytest/core.py", line 350, in execute
res = method(**kwargs)
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/local/lib/python2.7/site-packages/pytest_mozwebqa/html_report.py", line 181, in pytest_sessionfinish
logfile = py.std.codecs.open(self.logfile, 'w', encoding='utf-8')
File "/home/vagrant/mozilla-browserid-b1c9cca/automation-tests/bid_selenium/lib/python2.7/codecs.py", line 881, in open
file = builtin.open(filename, mode, buffering)
IOError: [Errno 13] Permission denied: 'results/index.html'

keysigner issuing certs for localhost

tail -300f /home/browserid/browserid/var/log/keysigner.log
{"level":"info","message":"Certs will be issued from: localhost","timestamp":"2012-08-14T18:38:30.673Z"}

That won't go well ;)

Extract secrets from configs

Figure out one or more resuable patterns for reorganizing where secrets live, so all configs can be in a public git.

Saucelabs details for SMTP config

Move daemontools run files under /var/service

Currently these are installed directly under /service.

We should match how its done in production, putting them under /var/service and symlinking.

Enhance watchmouse tests

Watchmouse is a gomez-like tool for testing that sites are up and testing performance/response time from a global network of clients.

Right now, watchmouse is only being used to hash include.js every so often, to ensure it hasn't gotten munged or corrupted.

But, presumably, watchmouse has more under the hood, and might even be able to execute selenium tests?

Investigate what watchmouse can do, hopefully integrating our Selenium tests with it.

Investigate pinging identity-staff or some other list when tests fail. Caution: address procedural concerns before signing devs up for pager duty.

Investigate complex event processing systems for monitoring

Goal: learn enough about available monitoring solutions to suggest several tools appropriate for our project needs. It's possible that existing tools are way more horsepower than we need, such that writing a few simple scripts would be a better approach--but it's a large field, and existing tools are worth some attention.

Based on an initial skim, this thesis seems to cover all the main approaches to complex event processing, as well as available implementations of the various approaches.

http://www.open.ch/tl_files/OpenSystems/_img/1-3_high_re_org/master_thesis_report_mueller.pdf

(Andreas Mueller, ETH masters thesis, 'Event Correlation Engine,' 2009.)

Note that ops already has a tool, cepmon, which analyzes statsd data, and alarms when the data deviates sharply from past trends. Compatibility with the existing statsd/pencil/nagios infrastructure is a significant factor.

move browserid from /home/browserid to /opt

Make where the code is match production.

Slim down dbwriter VM's puppet configs, services, etc

dbwriter and mysql have puppet, deamontools, and other configs for all services. Delete these.

Move multi-vm to Puppet master/agent setup

We currently use puppet in stand-alone mode and have Vagrant map a /puppet directory for each VM.

Instead of that, each node should ask our puppet master VM for config.

Setup Nagios

Setup Nagios in the admin VM.

Simplify puppet manifests

We should match production puppet manifests and improve them where it makes sense.

puppet
  files
    # daemontools
    var
      services
        browserid-router
  manifests
    webhead.pp
    swebhead.pp
    keysigner.pp

monitoring dashboard - investigate

look into existing graphing/charting tools used to visualize monitors. document findings if not already documented.

brainstorm some ideas for improvements, log them in issues, prioritize based on discussions with the team.

Rename vagrant.json to intcluster.json

This config is machine independent... the point it that they are for intcluster.

Could run on baremetal, awsbox, or other someday.

Tracking: Dev browserid drops some config values

The following are in produciton configs, but aren't needed for the dev branch of BrowserID:

locale_dir

Remove example and example-primary

I hooked up example and example-primary, but they have no place in stage and prod...

Remove daemontools and other config which launches them.

use blueprint to extract puppet configs from existing VMs

I read about a tool called blueprint, which can analyze an existing one-off server configuration and convert it into puppet scripts + move the config files into the puppet modules (e.g., nginx.conf).

blueprint: https://github.com/devstructure/blueprint

It looks abandoned (no commits in 4 months?), but hopefully is still worth a look.

Replace bin/proxy with Squid

In stage/production we proxy all our traffic through Squid.

remove bin/proxy daemontools files, etc from webhead
Add various configuration and puppet action to get squid setup
2a) Figure out if it can live in webhead or if it's in it's own VM to be a useful simulation of stage/prod.

@benadida or @6a68 nice next bug, fairly self contained and non-trivial

Missing configs noted in Issue #7.

Current VM is missing guest additions

If you grab the current vm and do vagrant up you'll get ssh errors.

This is because the VM doesn't have Guest Additions installed.

The next time we respin the VM, they will be baked in.

Short term fix:

put config.vm.boot_mode = :gui in your Vagrantfile.
vagrant up
In Virtual Box menu choose Devices > Install Guest Addtions
login to the box vagrant / vagrant
mkdir /media/cdrom
sudo mount -t iso9660 /dev/cdrom1 /media/cdrom
cd /media/cdrom && sudo ./VBoxLinuxAdditions.run
sudo shutdown -h now
vagrant up

This got ssh to accept connections, but still didn't full work.

There is no /home/vagrant/.ssh

I had to create .ssh/authorized_keys and append my public ssh key.

I'm still prompted for my password... which is weird.
I don't remember if tweaking /etc/sshd/sshd_config would help...

For now, I can continue to type vagrant as a password... except that it means I have to run the VM in GUI mode :/

Tracking: Missing config files

We'll use this Issue to track all the config files which are not in the tarball that @gene1wood provided.

Squid configs

Install apps via browserid-server.rpm

Once they filesystem layout is compatible, install apps via browserid-server.rpm.

This is needed so we can use bid-push scripts on intcluster

get test running in jenkins VM

The jenkins VM is failing to connect to firefox on the image.

Run mozilla/browserid in production mode

Putting the webhead behind nginx and SSL requires mozilla/browserid services to be run in production mode.

After vagrant reload, site sometimes unavailable

Filing to share knowledge (and track issue).

If you do a vagrant reload and the site isn't working...

I've found I have to sometimes do

sudo /command/svstat /service/browserid-*

Note which services keep restarting. Say it's the static process...

$ ps aux | grep static
root      2267  0.0  0.0   3924   400 ?        S    13:42   0:00 supervise browserid-static
450       2268  0.1  4.4 900576 22156 ?        Sl   13:42   0:01 node bin/static
vagrant   2705  0.0  0.1 103228   864 pts/0    S+   13:59   0:00 grep static
sudo kill -9 2267

What should normally work is just a

sudo sudo /command/svc -t /service/browserid-static

Not sure why this isn't working.

Once we have monitoring in place, these will be easier to spot, when the VMs seem broken.

Setup statsd, pencil

We should add statsd to all the VMs and install Pencil in the "admin" VM.

Put webhead behind nginx and SSL

Port Zeus configs

https://bugzilla.mozilla.org/show_bug.cgi?id=781644

Integrate (very stable) selenium tests into monitoring system

If we had one test which was highly reliable, we could include it in our monitoring/alerting routine and allow for end-user level verification that the site really works.

It would probably be useful to have both login tests and create account tests, as the latter will fail if the master goes down. If we create users 100+ times each day, we'll probably want to also delete those users as part of the test, or cull them from the DB regularly.

We could trivially reduce the error rate by, say, re-running the test 5 or 10 times before triggering a late-night alert.

ozten / browserid-devops Goto Github PK

browserid-devops's People

Contributors

Stargazers

Watchers

Forkers

browserid-devops's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs