mozilla-services / tokenserver Goto Github PK
View Code? Open in Web Editor NEWThe Mozilla Token Server
Home Page: http://docs.services.mozilla.com/token/index.html
License: Mozilla Public License 2.0
The Mozilla Token Server
Home Page: http://docs.services.mozilla.com/token/index.html
License: Mozilla Public License 2.0
For our background processing scripts like process_account_events, it would be nice for ops if we:
It would be handy to have a way to trigger token server to re-read configuration settings, or pickup changes in the DB without restarting it.
Here is the log dump. The result is that the SQS queue didn't get cleared and it grew until monitoring kicked in. It looks like:
Feb 26 19:25:15 docker[31898]: Processing account reset for u'<REDACTED>'
Feb 26 19:25:15 docker[31898]: Error while processing account deletion events
Feb 26 19:25:15 docker[31898]: Traceback (most recent call last):
Feb 26 19:25:15 docker[31898]: File "/app/tokenserver/scripts/process_account_events.py", line 100, in process_account_events
Feb 26 19:25:15 docker[31898]: backend.update_user(SERVICE, user, generation - 1)
Feb 26 19:25:15 docker[31898]: File "tokenserver/assignment/sqlnode/sql.py", line 321, in update_user
Feb 26 19:25:15 docker[31898]: 'email': user['email'],
Feb 26 19:25:15 docker[31898]: TypeError: 'NoneType' object has no attribute '__getitem__'
Feb 26 19:25:15 docker[31898]: Traceback (most recent call last):
Feb 26 19:25:15 docker[31898]: File "/usr/local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
Feb 26 19:25:15 docker[31898]: "__main__", fname, loader, pkg_name)
Feb 26 19:25:15 docker[31898]: File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
Feb 26 19:25:15 docker[31898]: exec code in run_globals
Feb 26 19:25:15 docker[31898]: File "/app/tokenserver/scripts/process_account_events.py", line 145, in <module>
Feb 26 19:25:15 docker[31898]: tokenserver.scripts.run_script(main)
Feb 26 19:25:15 docker[31898]: File "tokenserver/scripts/__init__.py", line 19, in run_script
Feb 26 19:25:15 docker[31898]: exitcode = main()
Feb 26 19:25:15 docker[31898]: File "/app/tokenserver/scripts/process_account_events.py", line 140, in main
Feb 26 19:25:15 docker[31898]: opts.aws_region, opts.queue_wait_time)
Feb 26 19:25:15 docker[31898]: File "/app/tokenserver/scripts/process_account_events.py", line 100, in process_account_events
Feb 26 19:25:15 docker[31898]: backend.update_user(SERVICE, user, generation - 1)
Feb 26 19:25:15 docker[31898]: File "tokenserver/assignment/sqlnode/sql.py", line 321, in update_user
Feb 26 19:25:15 docker[31898]: 'email': user['email'],
Feb 26 19:25:15 docker[31898]: TypeError: 'NoneType' object has no attribute '__getitem__'
Feb 26 19:25:16 systemd[1]: docker-tokenserver-account-events.service: main process exited, code=exited, status=1/FAILURE
In testing, I am seeing lots of issues with token server timestamps. Specifically, my devices seem to be a second or two ahead of the token server. I anticipate we'll see lots of issues in production.
As a first step, I suggest we distinguish "real" permission denied from "trivial" assertion rejected due to timestamp errors. Then clients can selectively retry after adjusting timestamps. Clients can use the HTTP Date: header for local timestamp adjustments until we find that resolution insufficient.
@rfk is the code change within scope?
@ckarlof this could impact the desktop client, which needs to handle timestamp skew itself.
From a discussion today in IRC:
14:15 < mostlygeek> jbonacci: fyi: in prod i had to scale up the TS to 6xm3.medium
14:16 < mostlygeek> the database load has been pretty low but the CPU load has been pretty high...
we should update our load tests to match prod traffic :)
14:17 < jbonacci> mostlygeek we might already have a bug or two on that for rfkelly|away to look at
14:17 < mostlygeek> jbonacci: cool!
Related bugs in Bugzilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=997344
https://bugzilla.mozilla.org/show_bug.cgi?id=1022721
Hi, I ran into an issue making the token server work behind nginx because Gentoo's default config file blocks headers larger than 2k.
The error was the following:
2014/07/23 19:35:25 [info] 3970#0: *163 client sent too long header line: "Authorization: BrowserID eyJhbGciOiJSUzI1NiJ9.......
After discussing it with ckarlof on IRC, he told me that the limit should be increased up to 8k which is nginx's default.
The instructions should mention this issue and suggest to replace the default large_client_header_buffers
with nginx's, i.e. large_client_header_buffers 4 8k;
Here is what I seen when I run "make test" on the AWS ubuntu instance:
https://jbonacci.pastebin.mozilla.org/4032532
See the following:
mozilla-services/loads#220
mozilla-services/loads#221
This requires a change to the Makefile, and maybe a change to the config/megabench.ini file...
@rfk to fill in the blanks.
We ran into issues with load testing TS Stage - a significant number of 503s with not much info/data to go on in terms of debug.
Found this with @tarekziade while performing load test on a local install of tokenserver to the qa2 VM.
The entire discussion about this went on in #identity on 2012-04-13
But after making some changes to verifier.py:
diff verifiers.py verifiers.py.BAK
7,10d6
< from browserid.tests.support import patched_key_fetching
< patched = patched_key_fetching()
< patched.enter()
<
Location on qa2: /opt/tokenserver/tokenserver
I am still seeing failures on my system, running the very simple load test as follows:
Terminal 1: ./bin/paster serve etc/tokenserver-dev.ini
Terminal 2:
cd /opt/tokenserver/tokenserver/loadtest
make build
make test
../bin/fl-run-test loadtest.py
Traceback (most recent call last):
File "/opt/tokenserver/tokenserver/lib/python2.6/site-packages/funkload-1.16.1-py2.6.egg/funkload/FunkLoadTestCase.py", line 946, in call
testMethod()
File "/opt/tokenserver/tokenserver/loadtest/loadtest.py", line 52, in test_bad_assertions
self._do_token_exchange(wrong_issuer, 401)
File "/opt/tokenserver/tokenserver/loadtest/loadtest.py", line 24, in _do_token_exchange
res = self.get(self.root + self.token_exchange, ok_codes=[status])
File "/opt/tokenserver/tokenserver/lib/python2.6/site-packages/funkload-1.16.1-py2.6.egg/funkload/FunkLoadTestCase.py", line 391, in get
method="get", load_auto_links=load_auto_links)
File "/opt/tokenserver/tokenserver/lib/python2.6/site-packages/funkload-1.16.1-py2.6.egg/funkload/FunkLoadTestCase.py", line 299, in _browse
response = self._connect(url, params, ok_codes, method, description)
File "/opt/tokenserver/tokenserver/lib/python2.6/site-packages/funkload-1.16.1-py2.6.egg/funkload/FunkLoadTestCase.py", line 216, in _connect
raise self.failureException, str(value.response)
AssertionError: /1.0/aitc/1.0
HTTP Response 200: OK
Ran 2 tests in 1.638s
FAILED (failures=1)
make: *** [test] Error 1
So that we can use the Load env if we want...
REF:
https://github.com/mozilla-services/server-syncstorage/blob/master/loadtest/Makefile
Running under uwsgi with gevent-monkey-patching disabled causes the RemoteVeifier to hang. That's bad. Debug this.
Just a tracker issue to figure out the use/feasibility...
Running TokenServer only load tests or Combined load tests, we continue to see some amount of 503s in Stage. This is with default settings for the load test, various configurations of TS Stage, with 1 to 3 instances of various sizes.
We find here /media/ephemeral0/logs/nginx/access.log
54.245.44.231 2014-05-13T23:58:39+00:00 "GET /1.0/sync/1.5 HTTP/1.1" 503 1922 320 "python-requests/2.2.1 CPython/2.7.3 Linux/3.5.0-23-generic" 0.007
And here
"name": "token.assertion.connection_error"
"name": "token.assertion.verify_failure"
Something to research and document going forward...
Building upon our discussions of how the tokenserver and the storage nodes handle pre-shared secrets we will make the following changes:
Are we doing it now? If so what is it restricted to?
I keep getting some sort of "hang" condition running 'make test" after building tokenserver on qa2 (and other locations).
So, after talking to both of you about upgrading to 0.8.2 to get what is in TS Stage, I did the following:
$ git clone git://github.com/mozilla-services/tokenserver
$ cd tokenserver
$ make build CHANNEL=prod TOKENSERVER=rpm-0.8.2
$ make test
This is what I see:
(note the ^C in the output - that is where I tried to break the apparent "hang" condition)
bin/nosetests --with-xunit tokenserver
...........F....^CE.......
...etc...
Ran 24 tests in 206.026s
FAILED (errors=1, failures=1)
make: *** [test] Error 1
On qa2, I tried each of these steps:
$ make build
vs.
$ make build CHANNEL=dev
vs.
$ make build CHANNEL=prod TOKENSERVER=rpm-0.8.4
vs.
$ make build CHANNEL=prod
In all cases, I get a "hang" condition on "make test".
Once the "hang" condition is remedied with a Ctrl-C, I get several errors.
The pastebin is here: http://jbonacci.pastebin.mozilla.org/1701461
Here is more info:
http://jbonacci.pastebin.mozilla.org/1705814
Looks like one of the most recent commits may have broken something here:
make test
bin/flake8 tokenserver
tokenserver/verifiers.py:33:5: E301 expected 1 blank line, found 0
tokenserver/verifiers.py:40:19: E225 missing whitespace around operator
tokenserver/verifiers.py:42:23: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
tokenserver/verifiers.py:45:19: E225 missing whitespace around operator
tokenserver/verifiers.py:51:80: E501 line too long (80 > 79 characters)
tokenserver/verifiers.py:53:12: E127 continuation line over-indented for visual indent
tokenserver/verifiers.py:56:80: E501 line too long (91 > 79 characters)
tokenserver/verifiers.py:61:1: E303 too many blank lines (3)
tokenserver/verifiers.py:66:1: E302 expected 2 blank lines, found 3
make: *** [test] Error 1
Like this one:
https://github.com/mozilla/fxa-auth-server/blob/master/loadtest/Makefile
๐
In my dev deployment of tokenserver, I have to downgrade to cornice=0.11 to make things work. Unfortunately I don't recall the specific error I was seeing; this is a note for me to reproduce and debug the issue.
Similar to what we have for FxA-auth
A decent set of integration tests that can be pointed to a remote server
Here is what I see on all platforms I tested:
make test
bin/flake8 --exclude=messages.py,test_remote_verifier.py tokenserver
bin/nosetests tokenserver/tests
..................E......S......
======================================================================
ERROR: test_purging_of_old_user_records (tokenserver.tests.test_purge_old_records.TestPurgeOldRecordsScript)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/James/tokenserver/tokenserver/tests/test_purge_old_records.py", line 81, in test_purging_of_old_user_records
user_records = list(self.backend.get_user_records(service, email))
AttributeError: 'SQLNodeAssignment' object has no attribute 'get_user_records'
-------------------- >> begin captured logging << --------------------
circus: INFO: Arbiter exiting
circus: DEBUG: stopping the broker watcher
circus: DEBUG: gracefully stopping processes [broker] for 30.0s
circus: DEBUG: broker: kill process 62437
circus: DEBUG: sending signal 15 to 62437
circus: DEBUG: stopping the workers watcher
circus: DEBUG: gracefully stopping processes [workers] for 30.0s
circus: DEBUG: workers: kill process 62438
circus: DEBUG: sending signal 15 to 62438
circus: DEBUG: manage_watchers is conflicting with another command
circus: INFO: broker stopped
circus: INFO: workers stopped
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Ran 32 tests in 12.770s
FAILED (SKIP=1, errors=1)
make: *** [test] Error 1
From irc:
Although, hmm the loadtest passes, but I see the tokenserver with an
error 'missing p in data'. I seem to recall some mention of that at some
point.
jrgm: this is "normal" because the server gets a wrong assertion
and we want to make sure it does not work
tarek: ah, duh (me).
jrgm: but plz send that to alexis with me in cc. the TS should be more graecful here
cc me too
;-)
/cc @tarekziade, @jbonacci
ERROR [powerhose][worker 8] missing p in data - [u'e', u'algorithm', u'n']
File "/home/jrgm/tokenserver/deps/https:/github.com/mozilla-services/powerhose/powerhose/worker.py", line 60, in _handle_recv_back
res = self.target(Job.load_from_string(msg[0]))
File "/home/jrgm/tokenserver/tokenserver/crypto/pyworker.py", line 201, in call
res = getattr(self, function_id)(**data)
File "/home/jrgm/tokenserver/tokenserver/crypto/pyworker.py", line 214, in check_signature
cert = jwt.load_key(algorithm, data)
File "/home/jrgm/tokenserver/lib/python2.6/site-packages/browserid/jwt.py", line 76, in load_key
return key_class(key_data)
File "/home/jrgm/tokenserver/lib/python2.6/site-packages/browserid/jwt.py", line 160, in init
_check_keys(data, ('p', 'q', 'g', 'y'))
File "/home/jrgm/tokenserver/lib/python2.6/site-packages/browserid/jwt.py", line 242, in _check_keys
raise ValueError(msg)
File "/home/jrgm/tokenserver/deps/https:/github.com/mozilla-services/powerhose/powerhose/client.py", line 54, in execute
raise ExecutionError(res[len('ERROR:'):])
Because it does not need to be there and it's confusing me...
;-)
HOST = http://localhost:5000
these lines:
https://github.com/mozilla-services/wimms/archive/rfk/update-deps.zip
https://argparse.googlecode.com/files/argparse-1.2.1.tar.gz
See this older issues for reference: #6
We should consider the idea of debugging this and getting Doc building to work on the older platforms and OPs platforms running OS with Python 2.6.6, etc.
Copied from https://bugzilla.mozilla.org/show_bug.cgi?id=757520
For an unknown reason, the crypto worker of the tokenserver stop working after some time running (about after 24 hours).
The particular piece of code hanging out is:
The problem is either here or on the layers on top of that, which are ensuring that the main python server is able to communicate with the workers (circus or powerhose).
This came up in today's meeting.
A bit more work to really fill out the TS load test.
Something similar to what we have for server-syncstorage: ./syncstorage/tests/functional/test_storage.py
and for fxa-auth-server: npm run test-remote
If I remember correctly the current unit tests (make test) are only designed to run locally.
Per issue #16, once upstream pyzmq release is made we can stop depending on custom fork
I am using the very latest setup of TS and Sync and Verifier (all now in US East).
I am just running the basic test against TS Stage:
$ make test SERVER_URL=https://token.stage.mozaws.net
I see 401s in the nginx access logs.
[21/Feb/2014:14:03:12 -0500] "GET /1.0/sync/1.5 HTTP/1.1" 401 110...
[21/Feb/2014:14:03:37 -0500] "GET /1.0/sync/1.5 HTTP/1.1" 401 96...
I see errors in the tokenserver token.log file:
"name": "token.assertion.verify_failure"
"token.assertion.audience_mismatch_error"
(always paired)
These are consistent across all 3 instances of TS in Stage.
I am basically running this from a local host:
make bench SERVER_URL=https://token-stage3.stage.mozaws.net
What I see is a 50% failure rate, but only listings for 3 (types of) failures: 503s and 500s.
Here is the pastebin of the results:
https://jbonacci.pastebin.mozilla.org/4133736
We want to start using the Loads cluster with an associated Loads config file to run our load tests from now on.
Edits to Makefile and loadtest.py
Change the lines in requirements.txt to point to specific versions instead of master which could change and cause hard to find bugs.
On these lines:
https://github.com/mozilla-services/tokenserver/blob/master/requirements.txt#L41-L44
This code currently expects only delete
events, logging an error for anything else. Soon there will be other account related events broadcast that tokenserver may or may not be interested in.
We could broadcast each event on a separate SNS topic, but I think I'm in favor of a single topic for all account events. Thoughts?
Steps to reproduce:
I get
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named interface
zope.interface is located here. Not sure why it does not pick that up.
./lib/python2.6/site-packages/zope.interface-3.8.0-py2.6-linux-x86_64.egg/zope/interface
WIthout it available, two of the tests of 'make test' fail.
We've been having an email discussion about how to prevent a "slap fight" between two devices on the same account, when the FxA account has been reset (changing the encryption key), but the devices disagree about whether the old key or the new key is correct. The worst case here is that the two devices keep deleting each other's data, because they get an HMAC error when they see something encrypted by the other device, so they wipe the server and re-upload with their own key. This could keep happening until the old-key device's token finally expires, and it cannot get a new one without re-logging in (at which point it will get the new key).
We haven't yet decided how to address this, but we're converging on a couple of possible solutions. Most involve the fxa-auth-server including a "generation number" its certificates, so the tokenserver can distinguish between the "new" device and the old ones (mozilla/fxa-auth-server#486). Some also involve the tokenserver getting a hash of the encryption key, and mapping different (uid, keyhash) pairs to different sync-id values (so old and new devices get different sets of ciphertext).
Opening this ticket so we'll have something to point at for the discussion.
CC @ckarlof @rfk
Looks like we are still at Funkload.
Need to get this ported over before we get our TS Stage env in AWS.
Sound familiar ;-)
mozilla-services/loads#195
mozilla/fxa-auth-server#268
So, this took a very simple install and build of TokenServer.
So, either I did not need pre-reqs, or they were already on my VM.
Here is a pastebin of the complete error list:
https://jbonacci.pastebin.mozilla.org/4032487
The token server is leaking fds on stage2.
This is probably in powerhose, either in the client Pool, or in the workers restarting.
Will write a test that counts the number of fds before and after each request to find out where the problem happens
/cc @fetep @ametaireau
Because we don't have one. Just getting this on the radar...
Slightly related:
https://bugzilla.mozilla.org/show_bug.cgi?id=981974#c6
https://bugzilla.mozilla.org/show_bug.cgi?id=982412
If I hit this site directly: https://token-stage3.stage.mozaws.net/
I get this back:
{"services": {"queuey": ["1.0"], "simple_storage": ["2.0", "2.1"], "durable_storage": ["1.0"]}, "auth": "https://token.services.mozilla.com"}
Which apparently is meaningless...
Same for Dev/Prod, generally...
We should just change this to "ok"
Was attempting the following:
08:45 < tarek> its in docs/ https://github.com/mozilla-services/tokenserver/tree/master/docs
08:45 < tarek> if you want to build it:
08:46 < tarek> cd docs; SPHINXBUILD=../bin/sphinx-build make html
08:46 < tarek> then you have it in docs/build/html
Was not able to do this from a local install of tokenserver (git clone)
SPHINXBUILD=../bin/sphinx-build make html
../bin/sphinx-build -b html -d build/doctrees source build/html
Running Sphinx v1.1.2
loading pickled environment... done
building [html]: targets for 8 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
preparing documents... done
IOError: [Errno 2] No such file or directory: u'../../lib/python2.6/site-packages/docutils-0.8.1-py2.6.egg/docutils/writers/html4css1/html4css1.css'
Unable to open source file for reading ('../../lib/python2.6/site-packages/docutils-0.8.1-py2.6.egg/docutils/writers/html4css1/html4css1.css'). Exiting.
make: *** [html] Error 1
From the "docs" directory:
ls ../../lib/python2.6/site-packages/docutils-0.8.1-py2.6.egg/docutils/writers/html4css1/html4css1.css
ls: cannot access ../../lib/python2.6/site-packages/docutils-0.8.1-py2.6.egg/docutils/writers/html4css1/html4css1.css: No such file or directory
Instead I try this:
ls ../lib/python2.6/site-packages/docutils-0.8.1-py2.6.egg/docutils/writers/html4css1/html4css1.css
../lib/python2.6/site-packages/docutils-0.8.1-py2.6.egg/docutils/writers/html4css1/html4css1.css
That works, so it's looking in the wrong place for docutils?
Because awesome.
Similar to here:
https://github.com/mozilla/persona/issues
So QA can better triage open issues by star value (priority).
Also give QA rights to add and mark labels.
Our scripts look for a requirements.txt file to add the python dependencies into the virtualenv. When the code is stable enough turn dev-reqs.txt into requirements.txt
Well this is new and frustrating (since this was working even on OS 10.9/XCode 5.1.1).
git clone tokenserver
cd tokenserver
make build
make test
this all works
cd loadtest
make build
Now I see this:
make build
ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future ../bin/pip install pexpect
/bin/sh: ../bin/pip: No such file or directory
make: *** [build] Error 127
And, indeed, there is no "bin" directory in tokenserver
I've just been patching around this issue for a while, and should have put it
in as an issue. Anyways test_service.py and test_crypto_pyworker both use new
features from unittest in 2.7 (assertIn and context manager for assertRaises
respectively).
Since the production boxes run python 2.6.6, can we fix these tests to run there.
Logging into Docker is not available when PR come from forks. Fix this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.