pinterest / teletraan Goto Github PK

Teletraan is Pinterest's deploy system.

License: Apache License 2.0

Java 49.31% Shell 0.64% Python 26.34% Makefile 0.01% CSS 1.97% JavaScript 9.86% HTML 11.80% Dockerfile 0.01% Starlark 0.07%

teletraan's Introduction

Teletraan Deploy Service

What is Teletraan?

Teletraan is Pinterest's deploy system. It deploys thousands of Pinterest internal services, supports tens of thousands hosts, and has been running in production for over many years. It empowers Pinterest Engineers to deliver their code to pinners fast and safe. Check out the wiki or blog post Under the hood: Teletraan Deploy System for more details.

The name Teletraan comes from a character in the Transformer TV series! wikipedia

Why use Teletraan?

Teletraan is designed to do one thing and one thing only - deploy. It supports critical features such as 0 downtime deploy, rollback, staging, continuous deploy; and many convenient developer facing features such as showing commit details, comparing different deploys, notifying deploy state changes through email or slack, displaying metrics and more. Teletraan does not support container based deploy yet. Currently you can still use Teletraan Deploy Scripts to call docker or docker-compose to run containers.

How to use Teletraan?

Teletraan is designed to be a flexible building block. You can plug Teletraan into your existing release workflow given the following requirements met:

Run Deploy Agent on every host
Add Deploy Scripts to your application code
Publish Build Artifacts to Teletraan in the end of each build

Check out Integrate with Teletraan for more details.

Quick start

Quick start guide!

Documentation

Check out our wiki!

Help

If you have any questions or comments, you can reach us at [email protected]

teletraan's People

Contributors

Stargazers

Watchers

Forkers

karimjedda aerickson testpulse eksenga prayagverma jeffpeiyt mshean timopek is00hcw xuweili tareq-s sbaogang intfrr sdgdsffdsfff vincentshiqi bluerain20 8090team zeus911 lue828 fnet123 jacktang qq254963746 devopschina schoolatfingertips maxwhale chakra-coder thunder-spb bfodeke oneoaas tibelf waterdrops folkevil yshan0216 rougang yangql caden714 alihalabyah sandance ava78 pysysops david-dv nickdechant jinruh trentsky tom2jack nymbalraj dmyerscough hamedmajidpoor yangspeaking yujunglo jayzhs zhangjunjie svmtracking nnuujj vixfive discoposse sgringwe webon100 sangkyunyoon lemonhall uguarder atatus danielshir gengmao mohamabid yongwen tharanga-abeyseela jango2015 lilida linearregression emaxerrno rossaffandy igorziegler abbabb123 domenzero reinaldogranado hellovivi dodoru iselu kevin-zhangwen theradcoder rucky2013 shuzhang1989 liweizhao jsoref zhangmuxi liqingrikeiikyeong seanbradley apipanda microee origama pombredanne khanchan sonia-y zhangjiantftc eweizman yhtsnda han-ian jawhnycooke chaunceyhan

teletraan's Issues

Error while running teletraan server

Error: Could not find or load main class com.pinterest.teletraan.TeletraanService

error while creating env

Am getting following error while trying to add new enviornment

Teletraan failed to call backend server. Contact your friendly Teletraan owners for assistance. Hint: 500, Message: Cannot create PoolableConnectionFactory (Unknown database 'deploy')

Traceback (most recent call last): File "/root/teletraan-demo/venv/lib/python2.7/site-packages/django/core/handlers/base.py", line 114, in get_response response = wrapped_callback(request, _callback_args, *_callback_kwargs) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/env_views.py", line 566, in post_create_env environs_helper.create_env(request, data) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/environs_helper.py", line 92, in create_env return deployclient.post("/envs", request.teletraan_user_id.token, data=data) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 78, in post return self.__call('post')(path, token, params=params, data=data) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/decorators.py", line 67, in f_retry return f(_args, *_kwargs) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 65, in api "Hint: %s, %s" % (response.status_code, response.content)) TeletraanException: Teletraan failed to call backend server. Contact your friendly Teletraan owners for assistance. Hint: 500, Message: Cannot create PoolableConnectionFactory (Unknown database 'deploy')

Feature Request: autoscaling group activities to display hostname instead of instance id

in the group page UI, section "auto scaling group activities", the list shows the AWS instance id, which is hard to refer to a particular hostname in the group info section above:
example:
2017-01-04 14:54:32 Terminating EC2 instance: i-051ac7c584ed7d009 Successful
2017-01-04 14:54:32 Terminating EC2 instance: i-0b94328f07944c082 Successful

would be nice to show the hostname as the hostname is used mostly everywhere in the UI.

deploy-board should require boto

When I install deploy-board, I get an error about a missing module "boto".
Adding "boto==2.42.0" to the requirements.txt fixes the problem.

check_version.sql- insert ignore

In check_version.sql, why not to use a simple INSERT IGNORE instead of this?

Insert if not already

INSERT INTO schema_versions (version)
SELECT 0 FROM DUAL
WHERE NOT EXISTS (SELECT * FROM schema_versions);

Unable to fully delete environments via API

With the API, it is not currently possible to completely delete an environment - it is possible to delete the non-primary stages within an environment, but deleting the primary stage results in an 500 error.

Call:

curl -X DELETE -H "Authorization: token <token>" localhost:8080/v1/envs/environmentname/prod

Response:

{"code":500,"message":"There was an error processing your request. It has been logged (ID 9a31b752c7e8ea27)."}

As far as I can find, there's not a corresponding UI element to delete the environment - the only option is to disable it.

No logout option within GUI

Currently there is no way to log out of a session within the GUI. This does cause some issues with other OAUTH providers which allow users to select multiple accounts during the login process.

In the event that a user selects a non-desired account during the login process, they are stuck in an orphaned state as they are unable to log out to change accounts.

Improve teletraan GC behavior

Currently when it's doing GC, it's doing it globally. it searches the build download dir, and keeps num_builds_to_retain builds. However, if there are >= 2 build target sharing the same build dir, the GC would potentially remove all the builds from one build target.

Instead, it should do it separately for each build target. For each build target, search for the builds with the given build name, and try to keep those builds under num_builds_to_retain.

Hardcoded OAUTH2 OAUTH_ACCESS_TOKEN_URL variable causes OAUTH to fail with non-pinterest providers

There's an issue where, even when the OAUTH token url is defined in manage.py, that deploy-board uses a hard coded Pinterest URL which causes OAUTH to fail.

manage.py config:

   #
    # OAuth based authentication settings. By default, OAuth based authentication is disabled
    # See documentation for how to enable OAuth
    #
    # os.environ.setdefault("OAUTH_ENABLED", "OFF")
    os.environ.setdefault("OAUTH_ENABLED", "ON")
    os.environ.setdefault("OAUTH_CLIENT_ID", "<ID>")
    os.environ.setdefault("OAUTH_CALLBACK", "URL")
    os.environ.setdefault("OAUTH_DOMAIN", "URL")
    os.environ.setdefault("OAUTH_CLIENT_TYPE", "Public")
    os.environ.setdefault("OAUTH_USER_INFO_URI", "https://www.googleapis.com/oauth2/v3/userinfo")
    os.environ.setdefault("ACCESS_TOKEN_URL", "https://accounts.google.com/o/oauth2/token")
    os.environ.setdefault("OAUTH_AUTHORIZE_URL", "https://accounts.google.com/o/oauth2/auth")
    os.environ.setdefault("OAUTH_DEFAULT_SCOPE", "email")

Logs:

2016-06-01 21:52:21,926 [INFO] deploy_board.webapp.security: clientid = <ID>
2016-06-01 21:52:21,926 [INFO] deploy_board.webapp.security: Successfully created OAuth!
2016-06-01 21:52:21,928 [DEBUG] deploy_board.webapp.security: Redirect oauth for authentication!, url = https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=<ID>
2016-06-01 21:52:26,679 [DEBUG] deploy_board.webapp.security: Bypass OAuth redirect request /auth/
2016-06-01 21:52:26,680 [DEBUG] deploy_board.webapp.security: Redirect back from oauth!
2016-06-01 21:52:26,681 [DEBUG] oauth: Request 'https://auth.pinadmin.com/oauth/token/' with 'POST' method
2016-06-01 21:53:09,710 [DEBUG] deploy_board.webapp.security: Bypass OAuth redirect request /auth/
2016-06-01 21:53:09,711 [DEBUG] deploy_board.webapp.security: Redirect back from oauth!
2016-06-01 21:53:09,712 [DEBUG] oauth: Request 'https://auth.pinadmin.com/oauth/token/' with 'POST' method
2016-06-01 21:53:30,865 [DEBUG] deploy_board.webapp.security: Redirect oauth for authentication!, url = https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=<ID>
02016-06-01 21:53:45,249 [DEBUG] deploy_board.webapp.security: Bypass OAuth redirect request /auth/
2016-06-01 21:53:45,250 [DEBUG] deploy_board.webapp.security: Redirect back from oauth!
2016-06-01 21:53:45,251 [DEBUG] oauth: Request 'https://auth.pinadmin.com/oauth/token/' with 'POST' method

autoscaling

when ec2 host is charge

deploy-downloader fails to execute when called by cron.

We have a functioning teletraan stack which works wonderfully except for one crucial issue - the deploy agent completely fails when called by cron. It works fine when called directly on the shell, but during execution called by cron, it fails during the deploy-downloader command with the following error:

  File "build/bdist.linux-x86_64/egg/deployd/common/executor.py", line 80, in run_cmd
    preexec_fn=os.setsid, **kw)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

We patched common/executor.py to print the contents of cmd to stdout so we could debug the issue and found that the following was occurring during the cron executed command:

[u'/data/nodejs/artifactname/teletraan/PRE_DOWNLOAD']
['deploy-downloader', '-f', '/etc/deployagent.conf', '-v', u'UUID-OG-A_1bb1b62', '-u', u's3://s3url/20/artifact.tar.gz', '-e', u'Environment-Name']
['deploy-downloader', '-f', '/etc/deployagent.conf', '-v', u'UUID-OG-A_1bb1b62', '-u', u's3://s3url/20/artifact.tar.gz', '-e', u'Environment-Name']
['deploy-downloader', '-f', '/etc/deployagent.conf', '-v', u'UUID-OG-A_1bb1b62', '-u', u's3://s3url/20/artifact.tar.gz', '-e', u'Environment-Name']

The respective crontab entry is:

* * * * * /usr/local/bin/deploy-agent -f /etc/deployagent.conf >> /var/log/deployd.log 2>&1

All of the respective files in the config (log directory, deployd directory, builds directory etc.) exist and are writable by the user. Everything works like a charm when the command /usr/local/bin/deploy-agent -f /etc/deployagent.conf is called on the shell, it's just cron which fails.

Any ideas about this issue would be greatly appreciated!

This is on Debian 8 with python 2.7.9 and the latest version of master (as of this morning).

feature request: connect to MySQL over SSL with client certificate

I'm setting up and evaluating Teletraan for our primary deployment manager. In our infrastructure requirements, we have to connect to MySQL over SSL with a client certificate and a self-signed server certificate. I looked for ways to do this for a while, but there seems to be no ways.

So it would be great if there ware a way to connect to MySQL over SSL.

(I joined [email protected] but it doesn't seem to be active; If GitHub Issues is not appropriate place to do feature request, please close this and point me an appropriate place. )

Request: custom deploy action on Staging of Host level deploy cycle.

First, this solution is very good. thanks for the work.

I have a idea.

In my test, deploy action on Staging of Host level deploy cycle seems to been uncustomized. (just a untar action.)
So can we custom this deploy action, that make this deploy platform more generic?

thanks.

Publish using Swagger

I am trying to build using swagger POST method, I am attaching the screenshot, the build is published, but I couldnt get it on the UI board of teletraan. I tried the POST method from REST API to teletraan as well.

Current builds on ngapp2 overview pages are incorrect

When viewing the overview page for an A/B deploy, the currently serving SHA is different than when viewing an environment directly:

In the A cluster:

In the B cluster:

`pip install deploy-agent` doesnt work

Based on the wiki documents I understood that deploy-agent would be available through pypi, but running pip install deploy-agent doesn't work. Is the intended usage to install manually via local install?

teletraan & docker

For using teletraan with docker containers teletraan must be installed in container?

add ability to run teletraanservice/bin/run.sh in the foreground

Currently the script only allows running in the background. If I wrap this in something like systemd or supervisord then sometimes it can be helpful to just have it run in the foreground, or write out a PID file that a daemon watcher can watch.

I would suggest just creating a new command called 'run' similar to the way that catilina.sh does it. Where 'start' will run in the background and 'run' will run the background.

username and password for teletraan UI

hi
how can i create a username access to the webui?

AutoDeploy doesn't respect cron schedule as expected

When I set a schedule for autodeploy, the expected behavior is that the deploy only happens within some reasonable amount of time at or after when the schedule is set.

For example, 0 0 11 * * * should only fire at 11am utc or within some reasonable amount of time after it, say up to 30 minutes after 11am. This buffer time should be set the same everywhere, or configurable.

As it is, the current algorithm does the following in

teletraan/deploy-service/teletraanservice/src/main/java/com/pinterest/teletraan/worker/AutoPromoter.java

Line 275 in a4201f2

boolean autoDeployDue(DeployBean deployBean, String cronExpressionString) {

In the autodeploydue function:

get last deploy time
calculate the next time (after the last deploy time) when the cron is satisfied
if the current time is after the next calculated time, it deploys

This has several unintended side effects:

On the first deploy, it deploys immediately
If deploys are made available at a smaller frequency than the cron schedule's frequency, the deploys will be auto deployed as soon as they are ready, rather than within the expected time window

This is functionality important for systems which should only deploy at certain times of day. For example, we have a critical tool which uses lots of memory during the day to hold User's data and perform fast computation. Many long-running computations are performed during the day. Deploys may not happen every day, but when they do, we want them published at 3-4am when Users are generally not working.

Failed deploy-board UI

Request: deploy-service artefact available on maven.org

Can we get a deployable artefact for deploy-service published to maven.org?

The post-deploy webhook is currently only called on successful deploys

Right now the post-deploy webhook is only called on successful deploys and the deploy state is not updated before calling the webhook. This causes the deploy state to always be RUNNING when the webhook is being called, so the deploy state variable substitution is currently meaningless.

Request: Containerization/Kubernetes?

Has anyone looked at supporting containers/Kubernetes as a deployment object, or containerizing Teletraan for running on Kubernetes? We'd love to help :)

Full disclosure: I work at Google on Kubernetes/GKE.

Error In the Web UI

'NoneType' object is not iterable

After I installed the teletraan under my user and tried to create a new Env as per the documentaion, I am getting the following error.

Webhook silently fails when header contains colon

Hi Pinterest,

We're seeing webhooks fail when headers contain colons - we presume it's the line here:

teletraan/deploy-service/common/src/main/java/com/pinterest/deployservice/handler/WebhookJob.java

Line 70 in a4201f2

 headers = Splitter.on(';').trimResults().withKeyValueSeparator(":").split(webhook.getHeaders()); 

Are the headers supposed to be kv pairs split with colons (as the code suggests), or equals signs (as the docs/tooltips) suggest?

An example header string that repros the behavior: Accept=application/json;Content-Type=application/json;Authorization=tok:747703

Log output:

cat service.log | grep "com.pinterest.deployservice.handler.WebhookJob"
INFO  [2016-11-15 11:29:48,397] com.pinterest.deployservice.handler.WebhookJob: Url after transform is https://api.webhook.com
INFO  [2016-11-15 11:29:48,397] com.pinterest.deployservice.handler.WebhookJob: Header string after transform is Accept=application/json;Content-Type=application/json;Authorization=tok:747703
INFO  [2016-11-15 11:29:48,397] com.pinterest.deployservice.handler.WebhookJob: Body string after transform is {"json":"data"}

There is no other entries in the log, so I presume that it is silently failing somewhere between L66 and L78.

Hosts are duplicated in GUI - one appears to be non-functional and causes deploy to fail

When running a standard deploy, servers sometimes are duplicated in the GUI. This causes the deploy to fail, as one of the hosts in the gui fails to deploy. The host ID's are the same for both hosts, and it was not changed during the deploy. Under the host's details, it shows two deploys rather than just one. I have also waited much longer than the SimpleAgentJanitor thresholds.

I'm not sure in this case what logs would be helpful, but here are screenshots of the behavior:

Deploy interface: http://ookla.d.pr/1HC4
Host Information: http://ookla.d.pr/TsSA

Is it possible to remove the erroneous host, and do you have any ideas what may have caused this/how to prevent it?

It's happened 3 times on different deploys, and the only way we have solved it is destroying the stage, which isn't a great solution at scale.

Add sample deployable app with scripts

Please,

Is it possible to add a sample demo application with sample deployment scripts?

Thanks

Wrong url in board's host link when ec2-ip-... is used

e.g.
http://teletraan/env/Bidder/Production/host/x.x.x.x
instead of
http://teletraan/env/Bidder/Production/host/ip-x-x-x-x

It doesn't load too as the host becomes just the first x, i.e.:
Home / Environments / Env1 (Production) / host / x

Feature request: ability to stop/start environments

I'm evaluating Teletraan as a replacement for Capistrano. One important feature in our Capistrano deployment system is the ability to run our init scripts to stop/start all hosts in what Teletraan calls an "environment".

It looks like I can roughly emulate this with the following workflow, when hosts are enumerated explicitly in the "Capacity" page:

Go to the environment -> all hosts page.
Terminate the desired hosts.
Do whatever workflow tasks are required while the hosts are down.
Go to the environment -> all hosts page.
Perform the RESET action on the previously stopped hosts.

This is pretty clunky, however: it downloads a new build and goes through the complete lifecycle for a host (from PRE_DOWNLOAD to POST_RESTART). Really, all we need to run is RESTARTING and POST_RESTART to resume a host from the STOPPED state. It would be nice if there were a way to do this directly.

two pop up displays on progress bar

Hovering over the progress bar on current deploy page display two pop ups with same information.

Let me know, if I can help to fix the issue :)

Can not create environment

I've installed Python 2.7, virtualenv, mysql, Java 8 into dockerized Ubuntu, configured server.yml to look at local mysql database, started service and deploy board, but after 'create env' submit I got error:

Teletraan failed to call backend server. 
Contact your friendly Teletraan owners for assistance. 
Hint: 500, {"code":500,"message":"There was an error processing your request. It has been logged

vagrant version works fine, but what's the problem with Manual Option?

here is a traceback from /tmp/deploy_board/service.log

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/deploy-board/deploy_board/webapp/env_views.py", line 566, in post_create_env
    environs_helper.create_env(request, data)
  File "/home/deploy-board/deploy_board/webapp/helpers/environs_helper.py", line 92, in create_env
    return deployclient.post("/envs", request.teletraan_user_id.token, data=data)
  File "/home/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 78, in post
    return self.__call('post')(path, token, params=params, data=data)
  File "/home/deploy-board/deploy_board/webapp/helpers/decorators.py", line 67, in f_retry
    return f(*args, **kwargs)
  File "/home/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 65, in api
    "Hint: %s, %s" % (response.status_code, response.content))
TeletraanException: Teletraan failed to call backend server. Contact your friendly Teletraan owners for assistance. Hint: 500, {"code":500,"message":"There was an error processing your request. It has been logged (ID ee4fbdc96583d8f8)."}

It has been open source?

Deploy Agent python dependencies requirements

I'm trying to make a Debian package for the deploy agent: the python dependencies required versions are quite recent, is it really necessary? Debian provides packages for those deps, but their versions are a bit less recent: re-package all deps to the required version is quite tedious operationally...

Undocumented things

So apologies if I missed something. The quick start guide (and vagrant image + demo script) seem to pull from the last release which is over two years old. It appears to me that the documentation + usability outside of pinterest is a little bit broken.

Specifically around these environment variables that dont seem to have any documentation:

teletraan/deploy-board/deploy_board/settings.py

Line 68 in 6fa1e25

CMDB_API_HOST = os.getenv("CMDB_API_HOST")

grep -r CMDB_API_HOST *
deploy-board/deploy_board/webapp/host_views.py:from deploy_board.settings import IS_PINTEREST, CMDB_API_HOST, CMDB_INSTANCE_URL, CMDB_UI_HOST, PHOBOS_URL
deploy-board/deploy_board/webapp/host_views.py:    host_url = CMDB_API_HOST + CMDB_INSTANCE_URL + host_id
deploy-board/deploy_board/settings.py:CMDB_API_HOST = os.getenv("CMDB_API_HOST")

I've got master running in out lab, but those env vars are not set to anything which causes the deploy board to not be happy when navigating around:
Error snip:

 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7faee081e1e0>,
 'wsgi.file_wrapper': <class wsgiref.util.FileWrapper at 0x7faeddfc5050>,
 'wsgi.input': <socket._fileobject object at 0x7faedf2692d0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/generic/base.py", line 87, in dispatch
    return handler(request, *args, **kwargs)
  File "/opt/deploy-board/deploy_board/webapp/host_views.py", line 167, in get
    host_details = get_host_details(host_id)
  File "/opt/deploy-board/deploy_board/webapp/host_views.py", line 86, in get_host_details
    host_url = CMDB_API_HOST + CMDB_INSTANCE_URL + host_id
TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

I'd be happy to make a PR to add some defaults, but I have no idea what the defaults should be. Any chance someone who knows could drop a note here with some insight? (Or even better - update with some defaults?)

Thank you very much!

d

How to separate developers permissions in teletraan?

Or git audit should be used for permissions restriction ?

Authentication

Can you explain a bit on how to do authentication and authorization using OAuth for teletraan. I encountered following error while uncommenting on server.yaml file

Starting Teletraan server...
/home/archana/Desktop/workspace/teletraan/deploy-service/teletraanservice/bin/server.yaml has an error:

Failed to parse configuration at: db; Could not resolve type id 'role' into a subtype of [simple type, class com.pinterest.teletraan.config.DataSourceFactory]: known type ids = [DataSourceFactory, embedded, mysql, zkmysql]
at [Source: N/A; line: -1, column: -1] (through reference chain: com.pinterest.teletraan.TeletraanServiceConfiguration["db"])

When Script Tokens are enabled, POST's to /v1/system/ping with correct Authorization header return 403

We recently enabled OAUTH for deploy-board and the deploy service so we can use role based permissions (dev deploy to staging, eng deploy to prod). Everything is working properly on the UI end except that we now cannot get any deploy-agents to communicate with deploy-service.

We discovered that due to this change (uncommenting the authorization and authentication blocks in server.yaml), deploy-service is now requiring the "Authorization: token " header. As a result, we generated new script tokens for the specific environment, and added it to the respective deploy-agent config.

We expected that this would work - however we are now receiving 403 errors on the deploy-agent ping. It appears that the token is somewhat accepted by the deploy-service, as it's not calling to the Oauth Token provider (when we place a fake/broken token in the request, it responds with a 401 from the Oauth provider).

Is there a way to A) have the deploy-agents stay on anonymous auth, B) determine why the script tokens are not being accepted (specific debug point)

Curl with correct token:

curl -v -k -H "Authorization: token <token>" -H "Content-Type: application/json" -X POST -d '{"hostName": "aw-mapi-consumer3", "hostId": "371008528", "reports": [{"failCount": 0, "envId": "xI9Nd0RgTnWel4VlSU52Ig", "errorMessage": null, "deployStage": "SERVING_BUILD", "errorCode": 0, "deployId": "BWLYzB70QOeVdtIjKRIQ-w", "deployAlias": null, "agentStatus": "SUCCEEDED"}], "groups": ["speedtest-mobile-reports-prod-consumer"], "hostIp": "127.0.0.1"}' http://deploy.internal.ookla.com:8080/v1/system/ping
* Hostname was NOT found in DNS cache
*   Trying 172.16.1.84...
* Connected to deploy.internal.ookla.com (172.16.1.84) port 8080 (#0)
> POST /v1/system/ping HTTP/1.1
> User-Agent: curl/7.38.0
> Host: deploy.internal.ookla.com:8080
> Accept: */*
> Authorization: token <token>
> Content-Type: application/json
> Content-Length: 356
>
* upload completely sent off: 356 out of 356 bytes
< HTTP/1.1 403 Forbidden
< Date: Thu, 02 Jun 2016 17:26:08 GMT
< Content-Type: application/json
< Content-Length: 40
<
* Connection #0 to host deploy.internal.ookla.com left intact

Curl with Incorrect token:

curl -v -k -H "Authorization: token <faketoken>" -H "Content-Type: application/json" -X POST -d '{"hostName": "aw-mapi-consumer3", "hostId": "371008528", "reports": [{"failCount": 0, "envId": "xI9Nd0RgTnWel4VlSU52Ig", "errorMessage": null, "deployStage": "SERVING_BUILD", "errorCode": 0, "deployId": "BWLYzB70QOeVdtIjKRIQ-w", "deployAlias": null, "agentStatus": "SUCCEEDED"}], "groups": ["speedtest-mobile-reports-prod-consumer"], "hostIp": "127.0.0.1"}' http://deploy.internal.ookla.com:8080/v1/system/ping
* Hostname was NOT found in DNS cache
*   Trying 172.16.1.84...
* Connected to deploy.internal.ookla.com (172.16.1.84) port 8080 (#0)
> POST /v1/system/ping HTTP/1.1
> User-Agent: curl/7.38.0
> Host: deploy.internal.ookla.com:8080
> Accept: */*
> Authorization: token <faketoken>
> Content-Type: application/json
> Content-Length: 356
>
* upload completely sent off: 356 out of 356 bytes
< HTTP/1.1 401 Unauthorized
< Date: Thu, 02 Jun 2016 17:29:28 GMT
< Content-Type: application/json
< Content-Length: 202
<
* Connection #0 to host deploy.internal.ookla.com left intact
{"code":401,"message":"Failed to authenticate user. java.io.IOException: Server returned HTTP response code: 401 for URL: https://www.googleapis.com/userinfo/v2/me?access_token=<faketoken>"}

The only respective log entry is in deploy-service/access.log and it states:
First curl:

172.16.1.192 - - [02/Jun/2016:17:30:42 +0000] "POST /v1/system/ping HTTP/1.1" 403 40 "-" "curl/7.38.0" 2

Second curl:

172.16.1.192 - - [02/Jun/2016:17:30:33 +0000] "POST /v1/system/ping HTTP/1.1" 401 202 "-" "curl/7.38.0" 5060

Feature Request: add a + sign next to the last stage tab for easily adding a new stage

Currently to add a new stage, user have to click on a specific potentially non-related stage, then click on config and click add a stage. It is kind of confusing and not intuitive.

Would be nice to have a "+" sign/button/tab on the stage tab and click it will go to the create a new stage UI.

Release Cadence? Last release was 2 years ago.

I was curious if there are plans on creating new releases? As it stands, https://github.com/pinterest/teletraan/releases/tag/v1.0.1 is 2 years old. Is it expected that master branch is stable?

Quick start documentation glitches

I've tried getting the demo working, and it wasn't quite a smooth ride. I hope this feedback helps you guys smoothen that.

after vagrant up: no service is running at http://127.0.0.1:8888/
Trying the demo failed on sed calls in Linux:

vagrant@vagrant-ubuntu-trusty-64:~$ curl -O https://raw.githubusercontent.com/pinterest/teletraan/master/deploy-sentinel/demo_run.sh
...
...
Teletraan server downloaded
+ sed -i '' 's/type: mysql/#type: mysql/' ./deploy-service/bin/server.yaml
sed: can't read s/type: mysql/#type: mysql/: No such file or directory

the fix is using sed -i'' instead of sed -i ''. I hope that doesn't conflict with OSX's sed.

"Create a new Environment" (https://github.com/pinterest/teletraan/wiki/Quickstart-Guide#run-demo )
I was a bit confused about pushing the Capacity button. It took me some time to realize the "Create" button had to be used first. Funny I know :)
Creating the deploy gives KeyError: 'provisioningHosts'

I like your error message though 😂

Why you choose pull architecture?

Why teletraan use puppet-style pull-architecture?
Why not like ansible push-design - without agents on clients side - why not enough ssh?

Reduce traffic between deploy-agent and deploy-service

For each deployment, there are about 8 deployment stages/steps (from PRE_DOWNLOAD to SERVING_BUILD), and for each step the deploy-agent talks to deploy-service to find out what's the next step to do. So, there are at least 8 rounds of traffic for each deployment, which also creates more database access.

Another way to handle each deployment is to let the deploy-agent handle all the deployment stages/steps together and then report back to the deploy-service. In this way, there will be only one or two rounds of communications for each deployment. This would give the deploy-service and also the database less load.

This might be helpful for adding new features, e.g.
#491, where there are concerns about hitting scale limit because of too many database access.

How do you think?

Pinging pinterest jira in env_landing

Not an error really, just spammy.
Link:

teletraan/deploy-board/deploy_board/templates/deploys/warning_no_deploy.tmpl

Line 11 in 4487779

 <script type="text/javascript" src="https://jira.pinadmin.com/s/be5518e4fe3d7d4da611de525c30ea8f-T/-5a7v9w/75008/d5ce662ce434066877551cbdd6cd6070/2.0.24/_/download/batch/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector.js?locale=en-US&collectorId=81085280"></script> 

Does not seem to be behind a IS_PINTEREST guard.

Using teletraan as the puppet replacement

Can we use teletraan to deploy some files (configurations) as what puppet does? Does teletraan have no limitation on the format of things it deploy?

Teletraan deploy-agent is unable to find aws_access_key_id within configuration file.

When the deploy agent is run with a s3:// url, it is unable to find the aws_access_key_id and fails to perform the deploy.

aws_access_key_id and aws_secret_access_key are correctly defined within the configuration file, and other directives within the configuration file are correctly parsed (not falling back to defaults).

Command being used:
/usr/local/bin/deploy-agent -f /etc/deployagent.conf

Copy of /etc/deployagent.conf (sensitive parts redacted):

[default_config]
deploy_agent_dir = /data/deployd/
target_default_dir = /tmp
builds_dir = /data/deployd/builds
log_directory = /data/deployd/logs
log_level = DEBUG
process_wait_interval = 2
process_timeout = 1800
min_running_time = 60
back_off_factor = 2
max_sleep_interval = 60
num_builds_to_retain = 2
package_format = tar.gz
max_retry = 3
max_tail_bytes = 10240
aws_access_key_id = AKAAAAAAAAAAAAAALQ
aws_secret_access_key =  x9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaanP
teletraan_service_url = http://172.1.1.1:8080
teletraan_service_version = v1
# teletraan_service_token = test

Error log message received:

INFO:deployd.common.executor:Failed: deploy-downloader -f /etc/deployagent.conf -v yb9mGyjqQtuxsNvGiFhtOQ_b3fac1c -u s3://s3-us-west-2.amazonaws.com/ookla/artifact.tgz -e Speedtest-Intelligence, at 2 retry. Error:
INFO:deployd.download.downloader:Start to download the package.
Traceback (most recent call last):
  File "/usr/local/bin/deploy-downloader", line 9, in <module>
    load_entry_point('deploy-agent==1.2.5', 'console_scripts', 'deploy-downloader')()
  File "build/bdist.linux-x86_64/egg/deployd/download/downloader.py", line 113, in main
  File "build/bdist.linux-x86_64/egg/deployd/download/downloader.py", line 56, in download
  File "build/bdist.linux-x86_64/egg/deployd/download/download_helper_factory.py", line 33, in gen_downloader
  File "build/bdist.linux-x86_64/egg/deployd/common/config.py", line 187, in get_aws_access_key
  File "build/bdist.linux-x86_64/egg/deployd/common/config.py", line 131, in get_var
deployd.common.exceptions.DeployConfigException: aws_access_key_id cannot be found.
DEBUG:deployd.common.executor:start: 2016-04-20 21:58:20.442645, now: 2016-04-20 21:58:30.456953, process: 10
DEBUG:deployd.common.executor:start: 2016-04-20 21:58:20.442645, now: 2016-04-20 21:58:30.458551, process: 10

default vagrant file maybe should export 8080?

I've been toying with this, a few hurdles I ran into:

Vagrantfile doesn't export 8080 (deploy-service) by default, so you need to add that before you can publish a build.
Docs don't mention that "v1/build" is a target on the deploy-service and not the deploy-board.
That commit-date is an integer, and not a string (and can be generated with git show -s --format="%ct000")

Other than that, looking cool so far.

How to use Teletraan on AWS

I have Spring boot application running. Currently i am deploying jars. Is there any how to guide for deploying applications on AWS bean stalk?

Review apps

Heroku has the functionality to be able to deploy review apps for projects on a per PR basis. It's incredibly useful when trying to review code as you inspect visual changes via. a deployment and run integration tests on it.

Is this possible with Teletraan?