GithubHelp home page GithubHelp logo

pinterest / teletraan Goto Github PK

View Code? Open in Web Editor NEW
1.8K 61.0 248.0 17.4 MB

Teletraan is Pinterest's deploy system.

License: Apache License 2.0

Java 49.31% Shell 0.64% Python 26.34% Makefile 0.01% CSS 1.97% JavaScript 9.86% HTML 11.80% Dockerfile 0.01% Starlark 0.07%

teletraan's Introduction

Teletraan Deploy Service

What is Teletraan?

Teletraan is Pinterest's deploy system. It deploys thousands of Pinterest internal services, supports tens of thousands hosts, and has been running in production for over many years. It empowers Pinterest Engineers to deliver their code to pinners fast and safe. Check out the wiki or blog post Under the hood: Teletraan Deploy System for more details.

The name Teletraan comes from a character in the Transformer TV series! wikipedia

Why use Teletraan?

Teletraan is designed to do one thing and one thing only - deploy. It supports critical features such as 0 downtime deploy, rollback, staging, continuous deploy; and many convenient developer facing features such as showing commit details, comparing different deploys, notifying deploy state changes through email or slack, displaying metrics and more. Teletraan does not support container based deploy yet. Currently you can still use Teletraan Deploy Scripts to call docker or docker-compose to run containers.

How to use Teletraan?

Teletraan is designed to be a flexible building block. You can plug Teletraan into your existing release workflow given the following requirements met:

  • Run Deploy Agent on every host
  • Add Deploy Scripts to your application code
  • Publish Build Artifacts to Teletraan in the end of each build

Check out Integrate with Teletraan for more details.

Quick start

Quick start guide!

Documentation

Check out our wiki!

Help

If you have any questions or comments, you can reach us at [email protected]

teletraan's People

Contributors

ankilosaurus avatar cjpilbdev avatar disparter avatar euccas avatar eweizman avatar gzpcho avatar haom-pinterest avatar jinruh avatar knguyen100000010 avatar kynanlalone avatar lilida avatar lixmgl avatar liyaqin1 avatar mingzhaodotname avatar nickdechant avatar ntascii avatar osoriano avatar rashmi59 avatar rfleur01 avatar robbintt avatar ruthgrace avatar rwxzhu avatar sbaogang avatar sonia-y avatar thelat avatar tobicodes avatar tylerwowen avatar vitalii-honchar avatar yongwen avatar yujunglo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

teletraan's Issues

error while creating env

Am getting following error while trying to add new enviornment

Teletraan failed to call backend server. Contact your friendly Teletraan owners for assistance. Hint: 500, Message: Cannot create PoolableConnectionFactory (Unknown database 'deploy')

Traceback (most recent call last): File "/root/teletraan-demo/venv/lib/python2.7/site-packages/django/core/handlers/base.py", line 114, in get_response response = wrapped_callback(request, _callback_args, *_callback_kwargs) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/env_views.py", line 566, in post_create_env environs_helper.create_env(request, data) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/environs_helper.py", line 92, in create_env return deployclient.post("/envs", request.teletraan_user_id.token, data=data) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 78, in post return self.__call('post')(path, token, params=params, data=data) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/decorators.py", line 67, in f_retry return f(_args, *_kwargs) File "/root/teletraan-demo/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 65, in api "Hint: %s, %s" % (response.status_code, response.content)) TeletraanException: Teletraan failed to call backend server. Contact your friendly Teletraan owners for assistance. Hint: 500, Message: Cannot create PoolableConnectionFactory (Unknown database 'deploy')

Feature Request: autoscaling group activities to display hostname instead of instance id

in the group page UI, section "auto scaling group activities", the list shows the AWS instance id, which is hard to refer to a particular hostname in the group info section above:
example:
2017-01-04 14:54:32 Terminating EC2 instance: i-051ac7c584ed7d009 Successful
2017-01-04 14:54:32 Terminating EC2 instance: i-0b94328f07944c082 Successful

would be nice to show the hostname as the hostname is used mostly everywhere in the UI.

deploy-board should require boto

When I install deploy-board, I get an error about a missing module "boto".
Adding "boto==2.42.0" to the requirements.txt fixes the problem.

check_version.sql- insert ignore

In check_version.sql, why not to use a simple INSERT IGNORE instead of this?

Insert if not already

INSERT INTO schema_versions (version)
SELECT 0 FROM DUAL
WHERE NOT EXISTS (SELECT * FROM schema_versions);

Unable to fully delete environments via API

With the API, it is not currently possible to completely delete an environment - it is possible to delete the non-primary stages within an environment, but deleting the primary stage results in an 500 error.

Call:

curl -X DELETE -H "Authorization: token <token>" localhost:8080/v1/envs/environmentname/prod

Response:

{"code":500,"message":"There was an error processing your request. It has been logged (ID 9a31b752c7e8ea27)."}

As far as I can find, there's not a corresponding UI element to delete the environment - the only option is to disable it.

No logout option within GUI

Currently there is no way to log out of a session within the GUI. This does cause some issues with other OAUTH providers which allow users to select multiple accounts during the login process.

In the event that a user selects a non-desired account during the login process, they are stuck in an orphaned state as they are unable to log out to change accounts.

Improve teletraan GC behavior

Currently when it's doing GC, it's doing it globally. it searches the build download dir, and keeps num_builds_to_retain builds. However, if there are >= 2 build target sharing the same build dir, the GC would potentially remove all the builds from one build target.

Instead, it should do it separately for each build target. For each build target, search for the builds with the given build name, and try to keep those builds under num_builds_to_retain.

Hardcoded OAUTH2 OAUTH_ACCESS_TOKEN_URL variable causes OAUTH to fail with non-pinterest providers

There's an issue where, even when the OAUTH token url is defined in manage.py, that deploy-board uses a hard coded Pinterest URL which causes OAUTH to fail.

manage.py config:

   #
    # OAuth based authentication settings. By default, OAuth based authentication is disabled
    # See documentation for how to enable OAuth
    #
    # os.environ.setdefault("OAUTH_ENABLED", "OFF")
    os.environ.setdefault("OAUTH_ENABLED", "ON")
    os.environ.setdefault("OAUTH_CLIENT_ID", "<ID>")
    os.environ.setdefault("OAUTH_CALLBACK", "URL")
    os.environ.setdefault("OAUTH_DOMAIN", "URL")
    os.environ.setdefault("OAUTH_CLIENT_TYPE", "Public")
    os.environ.setdefault("OAUTH_USER_INFO_URI", "https://www.googleapis.com/oauth2/v3/userinfo")
    os.environ.setdefault("ACCESS_TOKEN_URL", "https://accounts.google.com/o/oauth2/token")
    os.environ.setdefault("OAUTH_AUTHORIZE_URL", "https://accounts.google.com/o/oauth2/auth")
    os.environ.setdefault("OAUTH_DEFAULT_SCOPE", "email")

Logs:

2016-06-01 21:52:21,926 [INFO] deploy_board.webapp.security: clientid = <ID>
2016-06-01 21:52:21,926 [INFO] deploy_board.webapp.security: Successfully created OAuth!
2016-06-01 21:52:21,928 [DEBUG] deploy_board.webapp.security: Redirect oauth for authentication!, url = https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=<ID>
2016-06-01 21:52:26,679 [DEBUG] deploy_board.webapp.security: Bypass OAuth redirect request /auth/
2016-06-01 21:52:26,680 [DEBUG] deploy_board.webapp.security: Redirect back from oauth!
2016-06-01 21:52:26,681 [DEBUG] oauth: Request 'https://auth.pinadmin.com/oauth/token/' with 'POST' method
2016-06-01 21:53:09,710 [DEBUG] deploy_board.webapp.security: Bypass OAuth redirect request /auth/
2016-06-01 21:53:09,711 [DEBUG] deploy_board.webapp.security: Redirect back from oauth!
2016-06-01 21:53:09,712 [DEBUG] oauth: Request 'https://auth.pinadmin.com/oauth/token/' with 'POST' method
2016-06-01 21:53:30,865 [DEBUG] deploy_board.webapp.security: Redirect oauth for authentication!, url = https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=<ID>
02016-06-01 21:53:45,249 [DEBUG] deploy_board.webapp.security: Bypass OAuth redirect request /auth/
2016-06-01 21:53:45,250 [DEBUG] deploy_board.webapp.security: Redirect back from oauth!
2016-06-01 21:53:45,251 [DEBUG] oauth: Request 'https://auth.pinadmin.com/oauth/token/' with 'POST' method

deploy-downloader fails to execute when called by cron.

We have a functioning teletraan stack which works wonderfully except for one crucial issue - the deploy agent completely fails when called by cron. It works fine when called directly on the shell, but during execution called by cron, it fails during the deploy-downloader command with the following error:

  File "build/bdist.linux-x86_64/egg/deployd/common/executor.py", line 80, in run_cmd
    preexec_fn=os.setsid, **kw)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

We patched common/executor.py to print the contents of cmd to stdout so we could debug the issue and found that the following was occurring during the cron executed command:

[u'/data/nodejs/artifactname/teletraan/PRE_DOWNLOAD']
['deploy-downloader', '-f', '/etc/deployagent.conf', '-v', u'UUID-OG-A_1bb1b62', '-u', u's3://s3url/20/artifact.tar.gz', '-e', u'Environment-Name']
['deploy-downloader', '-f', '/etc/deployagent.conf', '-v', u'UUID-OG-A_1bb1b62', '-u', u's3://s3url/20/artifact.tar.gz', '-e', u'Environment-Name']
['deploy-downloader', '-f', '/etc/deployagent.conf', '-v', u'UUID-OG-A_1bb1b62', '-u', u's3://s3url/20/artifact.tar.gz', '-e', u'Environment-Name'] 

The respective crontab entry is:

* * * * * /usr/local/bin/deploy-agent -f /etc/deployagent.conf >> /var/log/deployd.log 2>&1

All of the respective files in the config (log directory, deployd directory, builds directory etc.) exist and are writable by the user. Everything works like a charm when the command /usr/local/bin/deploy-agent -f /etc/deployagent.conf is called on the shell, it's just cron which fails.

Any ideas about this issue would be greatly appreciated!

This is on Debian 8 with python 2.7.9 and the latest version of master (as of this morning).

feature request: connect to MySQL over SSL with client certificate

I'm setting up and evaluating Teletraan for our primary deployment manager. In our infrastructure requirements, we have to connect to MySQL over SSL with a client certificate and a self-signed server certificate. I looked for ways to do this for a while, but there seems to be no ways.

So it would be great if there ware a way to connect to MySQL over SSL.

(I joined [email protected] but it doesn't seem to be active; If GitHub Issues is not appropriate place to do feature request, please close this and point me an appropriate place. )

Request: custom deploy action on Staging of Host level deploy cycle.

First, this solution is very good. thanks for the work.

I have a idea.

In my test, deploy action on Staging of Host level deploy cycle seems to been uncustomized. (just a untar action.)
So can we custom this deploy action, that make this deploy platform more generic?

thanks.

Publish using Swagger

I am trying to build using swagger POST method, I am attaching the screenshot, the build is published, but I couldnt get it on the UI board of teletraan. I tried the POST method from REST API to teletraan as well.
screenshot 463

teletraan & docker

For using teletraan with docker containers teletraan must be installed in container?

add ability to run teletraanservice/bin/run.sh in the foreground

Currently the script only allows running in the background. If I wrap this in something like systemd or supervisord then sometimes it can be helpful to just have it run in the foreground, or write out a PID file that a daemon watcher can watch.

I would suggest just creating a new command called 'run' similar to the way that catilina.sh does it. Where 'start' will run in the background and 'run' will run the background.

AutoDeploy doesn't respect cron schedule as expected

When I set a schedule for autodeploy, the expected behavior is that the deploy only happens within some reasonable amount of time at or after when the schedule is set.

For example, 0 0 11 * * * should only fire at 11am utc or within some reasonable amount of time after it, say up to 30 minutes after 11am. This buffer time should be set the same everywhere, or configurable.

As it is, the current algorithm does the following in

boolean autoDeployDue(DeployBean deployBean, String cronExpressionString) {

In the autodeploydue function:

  • get last deploy time
  • calculate the next time (after the last deploy time) when the cron is satisfied
  • if the current time is after the next calculated time, it deploys

This has several unintended side effects:

  • On the first deploy, it deploys immediately
  • If deploys are made available at a smaller frequency than the cron schedule's frequency, the deploys will be auto deployed as soon as they are ready, rather than within the expected time window

This is functionality important for systems which should only deploy at certain times of day. For example, we have a critical tool which uses lots of memory during the day to hold User's data and perform fast computation. Many long-running computations are performed during the day. Deploys may not happen every day, but when they do, we want them published at 3-4am when Users are generally not working.

The post-deploy webhook is currently only called on successful deploys

Right now the post-deploy webhook is only called on successful deploys and the deploy state is not updated before calling the webhook. This causes the deploy state to always be RUNNING when the webhook is being called, so the deploy state variable substitution is currently meaningless.

Request: Containerization/Kubernetes?

Has anyone looked at supporting containers/Kubernetes as a deployment object, or containerizing Teletraan for running on Kubernetes? We'd love to help :)

Full disclosure: I work at Google on Kubernetes/GKE.

Error In the Web UI

'NoneType' object is not iterable

After I installed the teletraan under my user and tried to create a new Env as per the documentaion, I am getting the following error.

image

Webhook silently fails when header contains colon

Hi Pinterest,

We're seeing webhooks fail when headers contain colons - we presume it's the line here:

headers = Splitter.on(';').trimResults().withKeyValueSeparator(":").split(webhook.getHeaders());

Are the headers supposed to be kv pairs split with colons (as the code suggests), or equals signs (as the docs/tooltips) suggest?

An example header string that repros the behavior: Accept=application/json;Content-Type=application/json;Authorization=tok:747703

Log output:

cat service.log | grep "com.pinterest.deployservice.handler.WebhookJob"
INFO  [2016-11-15 11:29:48,397] com.pinterest.deployservice.handler.WebhookJob: Url after transform is https://api.webhook.com
INFO  [2016-11-15 11:29:48,397] com.pinterest.deployservice.handler.WebhookJob: Header string after transform is Accept=application/json;Content-Type=application/json;Authorization=tok:747703
INFO  [2016-11-15 11:29:48,397] com.pinterest.deployservice.handler.WebhookJob: Body string after transform is {"json":"data"}

There is no other entries in the log, so I presume that it is silently failing somewhere between L66 and L78.

Hosts are duplicated in GUI - one appears to be non-functional and causes deploy to fail

When running a standard deploy, servers sometimes are duplicated in the GUI. This causes the deploy to fail, as one of the hosts in the gui fails to deploy. The host ID's are the same for both hosts, and it was not changed during the deploy. Under the host's details, it shows two deploys rather than just one. I have also waited much longer than the SimpleAgentJanitor thresholds.

I'm not sure in this case what logs would be helpful, but here are screenshots of the behavior:

Deploy interface: http://ookla.d.pr/1HC4
Host Information: http://ookla.d.pr/TsSA

Is it possible to remove the erroneous host, and do you have any ideas what may have caused this/how to prevent it?

It's happened 3 times on different deploys, and the only way we have solved it is destroying the stage, which isn't a great solution at scale.

Feature request: ability to stop/start environments

I'm evaluating Teletraan as a replacement for Capistrano. One important feature in our Capistrano deployment system is the ability to run our init scripts to stop/start all hosts in what Teletraan calls an "environment".

It looks like I can roughly emulate this with the following workflow, when hosts are enumerated explicitly in the "Capacity" page:

  1. Go to the environment -> all hosts page.
  2. Terminate the desired hosts.
  3. Do whatever workflow tasks are required while the hosts are down.
  4. Go to the environment -> all hosts page.
  5. Perform the RESET action on the previously stopped hosts.

This is pretty clunky, however: it downloads a new build and goes through the complete lifecycle for a host (from PRE_DOWNLOAD to POST_RESTART). Really, all we need to run is RESTARTING and POST_RESTART to resume a host from the STOPPED state. It would be nice if there were a way to do this directly.

two pop up displays on progress bar

Hovering over the progress bar on current deploy page display two pop ups with same information.

image

Let me know, if I can help to fix the issue :)

Can not create environment

I've installed Python 2.7, virtualenv, mysql, Java 8 into dockerized Ubuntu, configured server.yml to look at local mysql database, started service and deploy board, but after 'create env' submit I got error:

Teletraan failed to call backend server. 
Contact your friendly Teletraan owners for assistance. 
Hint: 500, {"code":500,"message":"There was an error processing your request. It has been logged

vagrant version works fine, but what's the problem with Manual Option?

here is a traceback from /tmp/deploy_board/service.log

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/deploy-board/deploy_board/webapp/env_views.py", line 566, in post_create_env
    environs_helper.create_env(request, data)
  File "/home/deploy-board/deploy_board/webapp/helpers/environs_helper.py", line 92, in create_env
    return deployclient.post("/envs", request.teletraan_user_id.token, data=data)
  File "/home/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 78, in post
    return self.__call('post')(path, token, params=params, data=data)
  File "/home/deploy-board/deploy_board/webapp/helpers/decorators.py", line 67, in f_retry
    return f(*args, **kwargs)
  File "/home/deploy-board/deploy_board/webapp/helpers/deployclient.py", line 65, in api
    "Hint: %s, %s" % (response.status_code, response.content))
TeletraanException: Teletraan failed to call backend server. Contact your friendly Teletraan owners for assistance. Hint: 500, {"code":500,"message":"There was an error processing your request. It has been logged (ID ee4fbdc96583d8f8)."}

Deploy Agent python dependencies requirements

Hi

I'm trying to make a Debian package for the deploy agent: the python dependencies required versions are quite recent, is it really necessary? Debian provides packages for those deps, but their versions are a bit less recent: re-package all deps to the required version is quite tedious operationally...

Undocumented things

So apologies if I missed something. The quick start guide (and vagrant image + demo script) seem to pull from the last release which is over two years old. It appears to me that the documentation + usability outside of pinterest is a little bit broken.

Specifically around these environment variables that dont seem to have any documentation:

CMDB_API_HOST = os.getenv("CMDB_API_HOST")

grep -r CMDB_API_HOST *
deploy-board/deploy_board/webapp/host_views.py:from deploy_board.settings import IS_PINTEREST, CMDB_API_HOST, CMDB_INSTANCE_URL, CMDB_UI_HOST, PHOBOS_URL
deploy-board/deploy_board/webapp/host_views.py:    host_url = CMDB_API_HOST + CMDB_INSTANCE_URL + host_id
deploy-board/deploy_board/settings.py:CMDB_API_HOST = os.getenv("CMDB_API_HOST")

I've got master running in out lab, but those env vars are not set to anything which causes the deploy board to not be happy when navigating around:
Error snip:

 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7faee081e1e0>,
 'wsgi.file_wrapper': <class wsgiref.util.FileWrapper at 0x7faeddfc5050>,
 'wsgi.input': <socket._fileobject object at 0x7faedf2692d0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/views/generic/base.py", line 87, in dispatch
    return handler(request, *args, **kwargs)
  File "/opt/deploy-board/deploy_board/webapp/host_views.py", line 167, in get
    host_details = get_host_details(host_id)
  File "/opt/deploy-board/deploy_board/webapp/host_views.py", line 86, in get_host_details
    host_url = CMDB_API_HOST + CMDB_INSTANCE_URL + host_id
TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

I'd be happy to make a PR to add some defaults, but I have no idea what the defaults should be. Any chance someone who knows could drop a note here with some insight? (Or even better - update with some defaults?)

Thank you very much!

d

d

Authentication

Can you explain a bit on how to do authentication and authorization using OAuth for teletraan. I encountered following error while uncommenting on server.yaml file

Starting Teletraan server...
/home/archana/Desktop/workspace/teletraan/deploy-service/teletraanservice/bin/server.yaml has an error:

  • Failed to parse configuration at: db; Could not resolve type id 'role' into a subtype of [simple type, class com.pinterest.teletraan.config.DataSourceFactory]: known type ids = [DataSourceFactory, embedded, mysql, zkmysql]
    at [Source: N/A; line: -1, column: -1] (through reference chain: com.pinterest.teletraan.TeletraanServiceConfiguration["db"])

When Script Tokens are enabled, POST's to /v1/system/ping with correct Authorization header return 403

We recently enabled OAUTH for deploy-board and the deploy service so we can use role based permissions (dev deploy to staging, eng deploy to prod). Everything is working properly on the UI end except that we now cannot get any deploy-agents to communicate with deploy-service.

We discovered that due to this change (uncommenting the authorization and authentication blocks in server.yaml), deploy-service is now requiring the "Authorization: token " header. As a result, we generated new script tokens for the specific environment, and added it to the respective deploy-agent config.

We expected that this would work - however we are now receiving 403 errors on the deploy-agent ping. It appears that the token is somewhat accepted by the deploy-service, as it's not calling to the Oauth Token provider (when we place a fake/broken token in the request, it responds with a 401 from the Oauth provider).

Is there a way to A) have the deploy-agents stay on anonymous auth, B) determine why the script tokens are not being accepted (specific debug point)

Curl with correct token:

curl -v -k -H "Authorization: token <token>" -H "Content-Type: application/json" -X POST -d '{"hostName": "aw-mapi-consumer3", "hostId": "371008528", "reports": [{"failCount": 0, "envId": "xI9Nd0RgTnWel4VlSU52Ig", "errorMessage": null, "deployStage": "SERVING_BUILD", "errorCode": 0, "deployId": "BWLYzB70QOeVdtIjKRIQ-w", "deployAlias": null, "agentStatus": "SUCCEEDED"}], "groups": ["speedtest-mobile-reports-prod-consumer"], "hostIp": "127.0.0.1"}' http://deploy.internal.ookla.com:8080/v1/system/ping
* Hostname was NOT found in DNS cache
*   Trying 172.16.1.84...
* Connected to deploy.internal.ookla.com (172.16.1.84) port 8080 (#0)
> POST /v1/system/ping HTTP/1.1
> User-Agent: curl/7.38.0
> Host: deploy.internal.ookla.com:8080
> Accept: */*
> Authorization: token <token>
> Content-Type: application/json
> Content-Length: 356
>
* upload completely sent off: 356 out of 356 bytes
< HTTP/1.1 403 Forbidden
< Date: Thu, 02 Jun 2016 17:26:08 GMT
< Content-Type: application/json
< Content-Length: 40
<
* Connection #0 to host deploy.internal.ookla.com left intact

Curl with Incorrect token:

curl -v -k -H "Authorization: token <faketoken>" -H "Content-Type: application/json" -X POST -d '{"hostName": "aw-mapi-consumer3", "hostId": "371008528", "reports": [{"failCount": 0, "envId": "xI9Nd0RgTnWel4VlSU52Ig", "errorMessage": null, "deployStage": "SERVING_BUILD", "errorCode": 0, "deployId": "BWLYzB70QOeVdtIjKRIQ-w", "deployAlias": null, "agentStatus": "SUCCEEDED"}], "groups": ["speedtest-mobile-reports-prod-consumer"], "hostIp": "127.0.0.1"}' http://deploy.internal.ookla.com:8080/v1/system/ping
* Hostname was NOT found in DNS cache
*   Trying 172.16.1.84...
* Connected to deploy.internal.ookla.com (172.16.1.84) port 8080 (#0)
> POST /v1/system/ping HTTP/1.1
> User-Agent: curl/7.38.0
> Host: deploy.internal.ookla.com:8080
> Accept: */*
> Authorization: token <faketoken>
> Content-Type: application/json
> Content-Length: 356
>
* upload completely sent off: 356 out of 356 bytes
< HTTP/1.1 401 Unauthorized
< Date: Thu, 02 Jun 2016 17:29:28 GMT
< Content-Type: application/json
< Content-Length: 202
<
* Connection #0 to host deploy.internal.ookla.com left intact
{"code":401,"message":"Failed to authenticate user. java.io.IOException: Server returned HTTP response code: 401 for URL: https://www.googleapis.com/userinfo/v2/me?access_token=<faketoken>"}

The only respective log entry is in deploy-service/access.log and it states:
First curl:

172.16.1.192 - - [02/Jun/2016:17:30:42 +0000] "POST /v1/system/ping HTTP/1.1" 403 40 "-" "curl/7.38.0" 2

Second curl:

172.16.1.192 - - [02/Jun/2016:17:30:33 +0000] "POST /v1/system/ping HTTP/1.1" 401 202 "-" "curl/7.38.0" 5060

Quick start documentation glitches

I've tried getting the demo working, and it wasn't quite a smooth ride. I hope this feedback helps you guys smoothen that.

  • after vagrant up: no service is running at http://127.0.0.1:8888/
  • Trying the demo failed on sed calls in Linux:
vagrant@vagrant-ubuntu-trusty-64:~$ curl -O https://raw.githubusercontent.com/pinterest/teletraan/master/deploy-sentinel/demo_run.sh
...
...
Teletraan server downloaded
+ sed -i '' 's/type: mysql/#type: mysql/' ./deploy-service/bin/server.yaml
sed: can't read s/type: mysql/#type: mysql/: No such file or directory

the fix is using sed -i'' instead of sed -i ''. I hope that doesn't conflict with OSX's sed.

I like your error message though ๐Ÿ˜‚

Why you choose pull architecture?

Why teletraan use puppet-style pull-architecture?
Why not like ansible push-design - without agents on clients side - why not enough ssh?

Reduce traffic between deploy-agent and deploy-service

For each deployment, there are about 8 deployment stages/steps (from PRE_DOWNLOAD to SERVING_BUILD), and for each step the deploy-agent talks to deploy-service to find out what's the next step to do. So, there are at least 8 rounds of traffic for each deployment, which also creates more database access.

Another way to handle each deployment is to let the deploy-agent handle all the deployment stages/steps together and then report back to the deploy-service. In this way, there will be only one or two rounds of communications for each deployment. This would give the deploy-service and also the database less load.

This might be helpful for adding new features, e.g.
#491, where there are concerns about hitting scale limit because of too many database access.

How do you think?

Pinging pinterest jira in env_landing

Not an error really, just spammy.
Link:

<script type="text/javascript" src="https://jira.pinadmin.com/s/be5518e4fe3d7d4da611de525c30ea8f-T/-5a7v9w/75008/d5ce662ce434066877551cbdd6cd6070/2.0.24/_/download/batch/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector.js?locale=en-US&collectorId=81085280"></script>

Does not seem to be behind a IS_PINTEREST guard.

Teletraan deploy-agent is unable to find aws_access_key_id within configuration file.

When the deploy agent is run with a s3:// url, it is unable to find the aws_access_key_id and fails to perform the deploy.

aws_access_key_id and aws_secret_access_key are correctly defined within the configuration file, and other directives within the configuration file are correctly parsed (not falling back to defaults).

Command being used:
/usr/local/bin/deploy-agent -f /etc/deployagent.conf

Copy of /etc/deployagent.conf (sensitive parts redacted):

[default_config]
deploy_agent_dir = /data/deployd/
target_default_dir = /tmp
builds_dir = /data/deployd/builds
log_directory = /data/deployd/logs
log_level = DEBUG
process_wait_interval = 2
process_timeout = 1800
min_running_time = 60
back_off_factor = 2
max_sleep_interval = 60
num_builds_to_retain = 2
package_format = tar.gz
max_retry = 3
max_tail_bytes = 10240
aws_access_key_id = AKAAAAAAAAAAAAAALQ
aws_secret_access_key =  x9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaanP
teletraan_service_url = http://172.1.1.1:8080
teletraan_service_version = v1
# teletraan_service_token = test

Error log message received:

INFO:deployd.common.executor:Failed: deploy-downloader -f /etc/deployagent.conf -v yb9mGyjqQtuxsNvGiFhtOQ_b3fac1c -u s3://s3-us-west-2.amazonaws.com/ookla/artifact.tgz -e Speedtest-Intelligence, at 2 retry. Error:
INFO:deployd.download.downloader:Start to download the package.
Traceback (most recent call last):
  File "/usr/local/bin/deploy-downloader", line 9, in <module>
    load_entry_point('deploy-agent==1.2.5', 'console_scripts', 'deploy-downloader')()
  File "build/bdist.linux-x86_64/egg/deployd/download/downloader.py", line 113, in main
  File "build/bdist.linux-x86_64/egg/deployd/download/downloader.py", line 56, in download
  File "build/bdist.linux-x86_64/egg/deployd/download/download_helper_factory.py", line 33, in gen_downloader
  File "build/bdist.linux-x86_64/egg/deployd/common/config.py", line 187, in get_aws_access_key
  File "build/bdist.linux-x86_64/egg/deployd/common/config.py", line 131, in get_var
deployd.common.exceptions.DeployConfigException: aws_access_key_id cannot be found.
DEBUG:deployd.common.executor:start: 2016-04-20 21:58:20.442645, now: 2016-04-20 21:58:30.456953, process: 10
DEBUG:deployd.common.executor:start: 2016-04-20 21:58:20.442645, now: 2016-04-20 21:58:30.458551, process: 10

default vagrant file maybe should export 8080?

I've been toying with this, a few hurdles I ran into:

  • Vagrantfile doesn't export 8080 (deploy-service) by default, so you need to add that before you can publish a build.
  • Docs don't mention that "v1/build" is a target on the deploy-service and not the deploy-board.
  • That commit-date is an integer, and not a string (and can be generated with git show -s --format="%ct000")

Other than that, looking cool so far.

How to use Teletraan on AWS

I have Spring boot application running. Currently i am deploying jars. Is there any how to guide for deploying applications on AWS bean stalk?

Review apps

Heroku has the functionality to be able to deploy review apps for projects on a per PR basis. It's incredibly useful when trying to review code as you inspect visual changes via. a deployment and run integration tests on it.

Is this possible with Teletraan?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.