GithubHelp home page GithubHelp logo

sujiar37 / awx-ha-instancegroup Goto Github PK

View Code? Open in Web Editor NEW
109.0 17.0 39.0 62 KB

Build AWX clustering on Docker Standalone Installation

License: MIT License

Shell 100.00%
awx ansible ansible-tower ansible-roles devops docker-compose docker automation high-availability cluster

awx-ha-instancegroup's Introduction

Build Status

AWX V9.2.0 - Instance Group - HA

IMPORTANT NOTE: The project is now deprecated and no longer relevant with the AWX version >= 10.0.0, you may have to consider exploring this for the latest updates: https://github.com/fitbeard/awx-ha-cluster

AWX is an upstream project of Ansible Tower. I have been following this project since from the version 1.x to the current latest version which is 9.x. Below the diagram illustrates an overall idea about the clustering functionality in Ansible Tower version 3.X. More likely the same functionality can achieve in AWX by tweaking few file modifications and settings. Hence, I came across a solution to automate this clustering process via playbook after I had a few insights from AWX google groups as well as the official Ansible Tower installation playbook.

AWX HA - Instance Group

Points To Remember

  1. PostgreSQL DB should be centralized since all the nodes will act as Primary-Primary. This playbook does not cover the installation of PostgreSQL DB, however you can build it your own / use below docker-compose information to deploy as a container on a separate node.
$ mkdir /pgdocker/
$ cat docker-compose.yml 

version: '2'
services:
  postgres:
    image: postgres:10
    restart: unless-stopped
    volumes:
      - /pgdocker:/var/lib/postgresql/data:Z
    environment:
      POSTGRES_USER: awx
      POSTGRES_PASSWORD: awxpass
      POSTGRES_DB: awx
      PGDATA: /var/lib/postgresql/data/pgdata
    ports:
      - "5432:5432"
  1. Here is the inventory details I populate with ansible to add 'n' number of hosts into the existing cluster. All I have to update the machine ip address of new node under [awx_instance_group_task] and run the playbook awx_ha.yml. In case if you want to enable web front end, then you can update the new machine information under [awx_instance_group_web] and run the playbook. One cool feature is, you can always perform plug and play with the hosts by using these two awx_instance_group_web & awx_instance_group_task inventory groups. It is all about your desire how many web nodes and task nodes you would like to have since HA doesn't require to run AWX web container in all nodes. Also, it is important that all these nodes can communicate each other with their hostnames.
$ cat inventory/hosts

[all]

## Inventory Group where you need HA in AWX with web front end 
[awx_instance_group_web]
Primary_Node_A
Primary_Node_B

## Inventory Group where you need HA in AWX without web front end
[awx_instance_group_task]
Primary_Node_C
Primary_Node_D
  1. Following are the default global variables used across this playbook.
### AWX Default Settings
awx_unique_secret_key: awxsecret
awx_admin_default_pass: password

### Postgre DB details
pg_db_host: "Database_Node_IP"
pg_db_pass: "awxpass"
pg_db_port: "5432"
pg_db_user: "awx"
pg_db_name: "awx"


###  RabbitMQ default settings
rabbitmq_cookie: "cookiemonster"
rabbitmq_username: "awx"
rabbitmq_password: "password"

Let's Run it

$ ansible-playbook -i inventory/hosts awx_ha.yml --verbose

# Only run any of the below commands if you don't wish to enable and configure rules in firewalld daemon, which is an optional.
$ ansible-playbook -i inventory/hosts awx_ha.yml --skip-tags fw_rules --verbose

                          [ OR ]

$ ansible-playbook -i inventory/hosts awx_ha.yml -e "fw_rules=false" --verbose

Running the above command with --check mode may fail in a new machines since there are few commands to check whether the RabbitMQ cluster is active / not. However, the issue won't trigger if you had run it to a machine which is in clustered already

If you had a wish to run all these to a sandbox environment before deploying to your actual servers, please check the instructions for Vagrant here.

Known Issues

As we all are aware, the initial deployment of AWX containers will try to access the DB and perform migration if required. In such cases, you may ended up seeing below error inside the affected task containers where it get locked up for DB access and will fail to join the cluster by throwing out below error,

$ docker container logs -f build_image_task_1 

    raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2019-04-04 21:12:52,184 INFO exited: dispatcher (exit status 1; not expected)

This could be fixed by any of the below methods,

  1. Deploy a single web node under [awx_instance_group_web] during initial run, wait for the migration to complete and let the AWX GUI comes up. After that you can pass 'n' number of hosts into any of these inventory groups [awx_instance_group_web] OR [awx_instance_group_task] and those nodes will be added automatically in HA and will be visible from AWX portal.

  2. If you don't wish to follow the first method, then deploy to all and go to each nodes and try restart those affected containers once the DB migration completes. This would fix the issue and here is the references of few commands to perform that,

# ls 
docker-compose.yml  Dockerfile  Dockerfile.task  launch_awx.sh  launch_awx_task.sh  settings.py  system_uuid.txt

# pwd
/var/lib/awx/build_image

# docker-compose restart
Restarting build_image_task_1      ... done
Restarting build_image_memcached_1 ... done

Last but not least, if you like this piece of work, kindly rate this repo by providing your valuable STAR input.

awx-ha-instancegroup's People

Contributors

ioggstream avatar loceee avatar mark-it-kracht avatar sujiar37 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awx-ha-instancegroup's Issues

Error when using SCM type - Manual

Describe the bug
Not able to use SCM type - "manual" in projects. Throws error stating the directory is empty or ......
Looks like the directory is not getting created on the containers and mapped to the docker host.

To Reproduce
Default install and try to configure SCM type manual in a demo project

Expected behavior
Should be able to select SCM type manual and map the local playbook directory

Additional context
Add any other context about the problem here.

Few enhancement as per request

As per the request got it from AWX google group

  1. you are assuming redhat for the install, change that to centos, aws, redhat in a dictionary. I just removed the when condition in the playbook tjat loads the roles.
  2. I am testing on vagrant boxes loaded on my local laptop -- no dns. I had to add the name <--> IP mapping in /etc/hosts in each machine. Would be cool if the playbook could detect that and add to /etc/hosts or use IP's when name resolution doesn't work. (not required, just cool)

Enabling SSL functionality

Hi,

I would like to have the webinstances SSL enabled.
This seems not to be working.

Can you implement this as wel?

Clean up database

Hi, I found that we are about run out of disk space on instance where placed master and postgres database. After checking I found that largest thing is database
After checking size of tables I found that biggest one is main_jobevents is about 77G
Managment job
Cleanup Activity Stream
Cleanup Job Detail ran successfully but it does not reduce table size.
Version 9.0.1
Can you please help?

can't delete inventory

Hi

when i try to delete inventory , he stand in pending

image

2020-01-23 12:29:12,273 ERROR awx.main.dispatch failed to write inbound message
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 223, in create_cursor
cursor = self.connection.cursor()
psycopg2.InterfaceError: connection already closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/pool.py", line 392, in write
self.cleanup()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/pool.py", line 377, in cleanup
reaper.reap(excluded_uuids=running_uuids)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
(changed, me) = Instance.objects.get_or_register()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py", line 134, in get_or_register
return (False, self.me())
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/managers.py", line 114, in me
if node.exists():
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 766, in exists
return self.query.has_results(using=self.db)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/query.py", line 522, in has_results
return compiler.has_results()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1070, in has_results
return bool(self.execute_sql(SINGLE))
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1098, in execute_sql
cursor = self.connection.cursor()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
return self._cursor()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 223, in create_cursor
cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

i have try this ansible/awx#3945

but i think is not work because postgres is external

(InteractiveConsole)

Inventory.objects.filter(pending_deletion=True).update(pending_deletion=False)
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
self.connection = self.get_new_connection(conn_params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/psycopg2/init.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 1, in
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 741, in update
rows = query.get_compiler(self.db).execute_sql(CURSOR)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1429, in execute_sql
cursor = super().execute_sql(result_type)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1098, in execute_sql
cursor = self.connection.cursor()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
return self._cursor()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 233, in _cursor
self.ensure_connection()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
self.connection = self.get_new_connection(conn_params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/psycopg2/init.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Support for more than one web host

Hey!
Iv'e been looking for a decent clustering solution for AWX for a while now.
This playbook works as a charm!

Currently I noticed the awx-web-ha container is only deploying on the first host in
awx_instance_group_master inventory group.

Is there any change to add support for another hosts to run the awx-web-ha container?

Thanks for your work!

Rabbitmq cluster as a docker container

At the moment, the rabbitmq is deployed as an RPM installation. Can we have it thru docker containers ?

By that way, we can have a fully containerized AWX application and easy to deploy in K8s or in openshift.

Regards,
Vibin

Second node not running jobs

This project is awesome, I had a AWX cluster with 2 node up backing on to RDS in no time.

I am not seeing any jobs run on the second node though, they time out and fail. Nodes show up in the instance group and seem to be responding. Both web interfaces are up -- but jobs only run on the first node.

Am I missing something? More of an AWX question I guess. Thanks for your great work!

Few generic queries...

Hi Sujith,

Thank you for sharing this project, have been looking for promising HA setup solution, looks found one, I am new to awx so having few queries before getting into the HA setup. I'm currently using one node isolated machine which connects external PostgreSQL RDS with multi-az setup.

  1. Does this HA setup has any issues connecting with external AWS RDS ?
  2. AWX project will be keep upgrading the major versions very often, how easy to upgrade those major version within HA nodes without losing the existing configurations? What are the steps or efforts needed here ? appreciates if you elaborate more on what we need to do upgrade the latest versions.
  3. Does this HA setup supports Application LB to load the Web interface traffic ?
  4. most often we see, web interface is little sluggish for isolated AWX node in my current scenario, does this HA setup improves the web GUI response for some extent?

Sorry if these queries might look very generic and simple,these FAQ might help beginners like me.. thank you very for much..

Regards,
Sreenadh

Jobs never starts, struck at Pending state for ever

Hi Sujith,

Below is my ansible hosts file, i have 3 systems, web1, Agent node1 and Agent Node2 , External aws RDS . Deployed the HA setup without any errors, RabbitMQ cluster is working fine, all nodes are part of cluster, I am not sure what i am missing... could you please someone help...

issues :

  1. Jobs never starts..
  2. Agent Node2 did not reflect when trying add to instance group.

my ansible hosts

[all]

[awx_instance_group_web]
web1

[awx_instance_group_web:vars]
ansible_connection= ssh
ansible_ssh_user= root
ansible_ssh_pass= xxxxxxxx
ansible_become= yes

[awx_instance_group_task]
node1
node2

[awx_instance_group_task:vars]
ansible_connection= ssh
ansible_ssh_user= root
ansible_ssh_pass= xxxxxx
ansible_become= yes

To Reproduce
use above mentioned 3 node cluster and run playbook again..

Expected behavior

  1. jobs should run.
  2. we want both Agent Nodes 1 & 2 should reflect and balance the jobs load ..
    image

image

*Additional context

AWX UI login page loads very slowly (takes 3 minutes to load)- Poor/sluggish performance of UI...

Hi Sujith,

Happy New Year 2020...

I am posting this issue here to get some help/guidance to know, not sure this issue is only for me or someone else too...

Issue : Poor/sluggish performance of UI/webpage, it takes more than 3 minutes to load the log-in page and continues slow response later on.

We are using external postgres DB ( RDS as backend), and clustered the rabbitmq on two awx installed systems, awx systems has 2 vcpu and 8 GB RAM, they are put behind the application load balancer for HA purpose. No issues with resources, however AWX continues with poor UI response on both nodes, strangely it takes 3+ minutes to load the log-in page even without using ALB, later on it gets better from browser cache, need guidance to fix this issue, not sure if this issue for everyone or just me...

Total number of hosts we sync form AWS/Azure around 1500+ as of now..

ENVIRONMENT
AWX version: 9.0.1
AWX install method: docker on linux,
Ansible version: 2.8.5
Operating System: CentOS
Web Browser: Google Chrome

EXPECTED RESULTS
Login page/UI should work with better response.

ACTUAL RESULTS
Login page takes more than 3 minutes to load itself.

Additional context
image

Issues with pulling projects down SCM after running AWX-HA and putting hosts in instance group

Describe the bug
After installing AWX on individual hosts (that share an external PGDB) everything works fine. When I run the AWX-HA plays (I turned off the containers because it would complete without errors otherwise) and the AWX-HA play completes fine. Adds a awx_web and awx_task to docker (as well as memcached i believe). I manually restart RabbitMQ (not sure if its needed). I was kinda working the first time but projects were slow to update (I have my templates flagged to download the latest code on every run) after some weirdness with NGINX (Itried ot also setup load balancing......I think I was doing too much) I decided to reinstall everything (deleting all containers and rerunning the plays for awx (on both hosts) and for awx-ha.

Ever since then I get a weird api error400 on attempting to update the SCM project....it fails and despite finding somewhat similar use cases online I havent been able to get it work since. I have tried modifying the roles/requirement yaml for projectinventory and also messed with rabbitMQ a bit.

Have you seen this issue before. It seems like a good feature I dont want to pass on

Luckily i had the api window still open when I got frustrated and decided to blow it all up again.

" "detail": "null value in column "job_tags" violates not-null constraint\nDETAIL: Failing row contains (6422, _68__azure_devops_git_repo, git, https://bjruyk5yei62e243ic6ku72oh*@dev..., , t, f, null, 68, 0, check, , , null).\n""

To Reproduce
After installing awx on both nodes and running awx-ha scm projects refuse to update and throw a error 400

Expected behavior
For SCM projects ot update fine as before with no AWX-HA enabled.

Additional context
Error: "detail": "null value in column "job_tags" violates not-null constraint\nDETAIL: Failing row contains (6422, _68__azure_devops_git_repo, git, https://bjruyk5yei62e243ic6ku72oh*@dev..., , t, f, null, 68, 0, check, , , null).\n"

Moving to Centos8 as base OS with python3

Hello,

First of all thanks for the work that you have put into this. Starting ver 9.x.x the base OS for AWX was changed to centos8 with python3 as the standard interpreter. Are there any plans to move this to the new OS?

cheers

using single server for initial installation. and ran the command "ansible-playbook -i inventory/hosts awx_ha.yml" and error occurred near "TASK [awx_ha : Build Docker Images in Primary Node] ". Please help.

TASK [awx_ha : Build Docker Images in Primary Node] *****************************************************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: build() got an unexpected keyword argument 'stream'
failed: [server IP] (item={u'version': u'9.1.1', u'name': u'awx_web_ha', u'file': u'Dockerfile'}) => {"changed": false, "item": {"file": "Dockerfile", "name": "awx_web_ha", "version": "9.1.1"}, "module_stderr": "Traceback (most recent call last):\n File "/tmp/ansible_HZ1Q2e/ansible_module_docker_image.py", line 602, in \n main()\n File "/tmp/ansible_HZ1Q2e/ansible_module_docker_image.py", line 597, in main\n ImageManager(client, results)\n File "/tmp/ansible_HZ1Q2e/ansible_module_docker_image.py", line 295, in init\n self.present()\n File "/tmp/ansible_HZ1Q2e/ansible_module_docker_image.py", line 323, in present\n self.results['image'] = self.build_image()\n File "/tmp/ansible_HZ1Q2e/ansible_module_docker_image.py", line 520, in build_image\n for line in self.client.build(**params):\nTypeError: build() got an unexpected keyword argument 'stream'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: build() got an unexpected keyword argument 'stream'
failed: [server IP] (item={u'version': u'9.1.1', u'name': u'awx_task_ha', u'file': u'Dockerfile.task'}) => {"changed": false, "item": {"file": "Dockerfile.task", "name": "awx_task_ha", "version": "9.1.1"}, "module_stderr": "Traceback (most recent call last):\n File "/tmp/ansible_CX43Mq/ansible_module_docker_image.py", line 602, in \n main()\n File "/tmp/ansible_CX43Mq/ansible_module_docker_image.py", line 597, in main\n ImageManager(client, results)\n File "/tmp/ansible_CX43Mq/ansible_module_docker_image.py", line 295, in init\n self.present()\n File "/tmp/ansible_CX43Mq/ansible_module_docker_image.py", line 323, in present\n self.results['image'] = self.build_image()\n File "/tmp/ansible_CX43Mq/ansible_module_docker_image.py", line 520, in build_image\n for line in self.client.build(**params):\nTypeError: build() got an unexpected keyword argument 'stream'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1}

can't abort job

Hello,
We have running AWX cluster 9.0.1 we just discovered that we can't abort running job.
When I'm clicking on 'Cancel the job' I see following pop-up error alert
image

Here is error from awx web container

2020-01-21 20:17:23,556 ERROR django.request Internal Server Error: /api/v2/jobs/21/cancel/
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/views/generic/base.py", line 71, in view
return self.dispatch(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/api/generics.py", line 297, in dispatch
return super(APIView, self).dispatch(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/rest_framework/views.py", line 495, in dispatch
response = self.handle_exception(exc)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/rest_framework/views.py", line 455, in handle_exception
self.raise_uncaught_exception(exc)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/rest_framework/views.py", line 492, in dispatch
response = handler(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/api/views/init.py", line 3601, in post
obj.cancel()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/models/unified_jobs.py", line 1354, in cancel
if self.status == 'running' and not self.actually_running:
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/models/unified_jobs.py", line 1323, in actually_running
).running(timeout=timeout)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/control.py", line 37, in running
return self.control_with_reply('running', *args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/control.py", line 44, in control_with_reply
with Consumer(conn, reply_queue, callbacks=[self.process_message], no_ack=True):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/kombu/messaging.py", line 386, in init
self.revive(self.channel)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/kombu/messaging.py", line 408, in revive
self.declare()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/kombu/messaging.py", line 421, in declare
queue.declare()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/kombu/entity.py", line 608, in declare
self._create_queue(nowait=nowait, channel=channel)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/kombu/entity.py", line 617, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/kombu/entity.py", line 652, in queue_declare
nowait=nowait,
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/kombu.py", line 23, in queue_declare
return super(_Channel, self).queue_declare(queue, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/channel.py", line 1154, in queue_declare
spec.Queue.DeclareOk, returns_tuple=True,
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/abstract_channel.py", line 80, in wait
self.connection.drain_events(timeout=timeout)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/connection.py", line 500, in drain_events
while not self.blocking_read(timeout):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/connection.py", line 506, in blocking_read
return self.on_inbound_frame(frame)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/method_framing.py", line 55, in on_frame
callback(channel, method_sig, buf, None)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/connection.py", line 510, in on_inbound_method
method_sig, payload, content,
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/abstract_channel.py", line 126, in dispatch_method
listener(args)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/amqp/channel.py", line 282, in _on_close
reply_code, reply_text, (class_id, method_id), ChannelError,
amqp.exceptions.AccessRefused: Queue.declare: (403) ACCESS_REFUSED - queue name 'amq.rabbitmq.reply-to' contains reserved prefix 'amq.
'

Trouble When add new instances

I make a new fresh deploy of AWX-HA-InstanceGroup with just one Instance(TASK and WEB) but when I try to add two(2) new instances some times the new instances goes unavailable and I get host marked as lost out from "docker logs -f build_image_task_1":

2020-06-24 17:34:03,715 DEBUG awx.main.dispatch publish awx.main.tasks.awx_periodic_scheduler(793666d6-b0ce-4550-b3a5-4d763aff4dcd, queue=awx_private_queue)
2020-06-24 17:34:03,727 DEBUG awx.main.dispatch task 793666d6-b0ce-4550-b3a5-4d763aff4dcd starting awx.main.tasks.awx_periodic_scheduler([])
2020-06-24 17:34:03,738 DEBUG awx.main.tasks Starting periodic scheduler
2020-06-24 17:34:03,741 DEBUG awx.main.tasks Last scheduler run was: 2020-06-24 17:29:12.077878+00:00
2020-06-24 17:34:13,738 DEBUG awx.main.dispatch publish awx.main.scheduler.tasks.run_task_manager(7da3cd09-2c56-4ad5-ae03-bed7c82b6450, queue=awx_private_queue)
2020-06-24 17:34:13,752 DEBUG awx.main.dispatch task 7da3cd09-2c56-4ad5-ae03-bed7c82b6450 starting awx.main.scheduler.tasks.run_task_manager(
[])
2020-06-24 17:34:13,754 DEBUG awx.main.scheduler Running Tower task manager.
2020-06-24 17:34:13,771 DEBUG awx.main.scheduler Starting Scheduler
2020-06-24 17:34:33,768 DEBUG awx.main.dispatch publish awx.main.tasks.cluster_node_heartbeat(f6baef1a-fad8-4595-8fe8-ac276bb358d5, queue=cctdcapllx0828)
2020-06-24 17:34:33,842 DEBUG awx.main.dispatch publish awx.main.tasks.awx_k8s_reaper(93dbb6d2-c462-41fe-898c-5d5ea8b93774, queue=cctdcapllx0828)
2020-06-24 17:34:33,854 DEBUG awx.main.dispatch task f6baef1a-fad8-4595-8fe8-ac276bb358d5 starting awx.main.tasks.cluster_node_heartbeat([])
2020-06-24 17:34:33,855 DEBUG awx.main.tasks Cluster node heartbeat task.
2020-06-24 17:34:33,860 DEBUG awx.main.dispatch task 93dbb6d2-c462-41fe-898c-5d5ea8b93774 starting awx.main.tasks.awx_k8s_reaper(
[])
2020-06-24 17:34:33,862 DEBUG awx.main.dispatch publish awx.main.tasks.awx_periodic_scheduler(cfc8326a-2958-4b00-b3ff-80bb0bab269f, queue=awx_private_queue)
2020-06-24 17:34:33,875 DEBUG awx.main.dispatch task cfc8326a-2958-4b00-b3ff-80bb0bab269f starting awx.main.tasks.awx_periodic_scheduler([])
2020-06-24 17:34:33,875 DEBUG awx.main.dispatch publish awx.main.scheduler.tasks.run_task_manager(fac73454-c75f-4c77-bd45-32d2fe431eed, queue=awx_private_queue)
2020-06-24 17:34:33,884 DEBUG awx.main.tasks Starting periodic scheduler
2020-06-24 17:34:33,887 DEBUG awx.main.tasks Last scheduler run was: 2020-06-24 17:29:45.704402+00:00
2020-06-24 17:34:33,888 ERROR awx.main.tasks Host cctdcapllx0830 last checked in at 2020-06-24 17:29:15.621609+00:00, marked as lost.
2020-06-24 17:34:33,894 ERROR awx.main.tasks Host cctdcapllx0831 last checked in at 2020-06-24 17:29:11.107072+00:00, marked as lost.

2020-06-24 17:34:38,985 DEBUG awx.main.dispatch task 18e6473b-216b-4e33-9e01-4cb2f4ca8a49 starting awx.main.scheduler.tasks.run_task_manager(
[])
2020-06-24 17:34:38,987 DEBUG awx.main.scheduler Running Tower task manager.
2020-06-24 17:34:39,005 DEBUG awx.main.scheduler Starting Scheduler
2020-06-24 17:34:53,907 DEBUG awx.main.dispatch publish awx.main.scheduler.tasks.run_task_manager(4d8a5cda-988f-4c24-8a08-1fb52a62fd39, queue=awx_private_queue)
2020-06-24 17:34:53,920 DEBUG awx.main.dispatch task 4d8a5cda-988f-4c24-8a08-1fb52a62fd39 starting awx.main.scheduler.tasks.run_task_manager(*[])
2020-06-24 17:34:53,921 DEBUG awx.main.scheduler Running Tower task manager.
2020-06-24 17:34:53,936 DEBUG awx.main.scheduler Starting Scheduler
RESULT 2
OKREADY

Someone has seen this before?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.