shinesolutions / aem-stack-manager-messenger Goto Github PK

View Code? Open in Web Editor NEW

2.0 11.0 9.0 366 KB

Send messages to AEM Stack Manager via SNS Topic

License: Apache License 2.0

Makefile 26.54% Shell 40.73% Python 32.72%

aem aem-opencloud stack-management aws environment-manager

aem-stack-manager-messenger's People

Contributors

Stargazers

Watchers

Forkers

mbloch1986 melbit-nishantsharma nerdy-dav saurabhthareja90 shwinnbin pegr69 neko-design ovlords hoomaan-kh

aem-stack-manager-messenger's Issues

Add validation for variables

Is your feature request related to a problem? Please describe.
With plenties of variables in a project, it is easy to encounter pipeline failure due to configuration errors, especially when Opencloud is upgraded with variable modification.

Describe the solution you'd like
Before running pipelines, all required variables should be checked, making sure them meet the criteria.

SNS topic ARNs should be described and not preconfigured

Currently SNS topic ARNs are specified in the config file.
This should be improved by describing the Stack Manager stack's resources and retrieve the SNS topic ARNs instead.

The problem with preconfiguring the ARNs is that user can't simply create Stack Manager stacks and use them, as they have to check the ARNs and set it in a config file.
In strict environments where configuration files have to be signed off, they won't be able to modify configuration on demand.

JSON template variables aren't getting resolved

The variabled defined in the JSON Ttemplate aren't getting resolved anymore since Ansible version 2.8.2.

Change log dumping default and make it configurable

Currently Stack Manager Messenger dumps the log file only when there is a failure.
However, many users prefer to always have the log file regardless, which seems to be a reasonable default, i.e. users always want to see log output to 1) identify what caused a failure, 2) verify that a success is doing what is expected .

We need to do the following:

Change default behaviour to always dump the log
Introduce configuration show_log_on_failure_only , and set it to false by default

This configuration is handy for users who want to dump the log file only on failure.

Configurable stack manager event timeout setting

Current default timeout setting on stack manager messenger covers most use cases, however, it's not enough when user needs to deploy a large package (package installation might take, e.g. 20-30 minutes).

In order to support this edge case, we need to make this timeout setting configurable.

NoAuthHandlerFound when sending SNS message on CodeBuild

AEM Stack Manager Messenger executes the tests by sending SNS message, which works fine on most machines, except when it's run on CodeBuild.
It's worth investigating if there's a replacement sns module that uses boto3 under the hood.

TASK [debug] *******************************************************************
ok: [127.0.0.1] => 
  msg: 'Send message: {''task'': ''test-readiness-consolidated'', ''externalId'': u''8cd6cbf0-f2aa-53d1-b848-fd687cb63fc4'', ''details'': {''comment'': ''test the AEM Consolidated stack to see if it is ready to be used'', ''component'': [''author-publish-dispatcher'']}, ''stack_prefix'': u''ci-rhel7-aem62-consolidated''}, with subject: ci-rhel7-aem62-consolidated - test-readiness-consolidated, to topic: arn:aws:sns:ap-southeast-2:918473058104:ci-rhel7-aem62-stack-manager-AemStackManager, in region: ap-southeast-2'

TASK [ci-rhel7-aem62-consolidated - test-readiness-consolidated: Send message to SNS Topic] ***
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
fatal: [127.0.0.1]: FAILED! => changed=false 
  msg: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
	to retry, use: --limit @/codebuild/output/src357326021/src/stage/aem-stack-manager-messenger-1.5.2/ansible/playbooks/send-message.retry

Don't fail if DynamoDB query/scan failed due to missing payload

If the DynamoDB query fails because of hitting the AWS limit for the specified AWS account the stack manager messenger task execution fails.

To avoid this we should improve the until loop to loop as long as the registered variables are not defined.

Display action/event log message

Another step to be added to the playbook is one that downloads the log from S3 to the workspace, and then optionally display the log output (e.g. if the result is a failure, display the log output, if it's a success, up to user to check the log file in the filesystem).

This will help the user with troubleshooting the result of the action.

Remove 'sns.py' as it is no longer needed

Is your feature request related to a problem? Please describe.
No, it's a cleaning up unnecessary code.

Describe the solution you'd like
The file "provisioners/ansible/library/sns.py" is no longer needed as the Pull Request github.com/ansible/ansible/pull/45634
Has been merged and released. See

aem-stack-manager-messenger/provisioners/ansible/library/sns.py

Line 4 in e263137

 # THIS FILE SHOULD BE REMOVED WHEN github.com/ansible/ansible/pull/45634 IS MERGED AND RELEASED 

Needs an updated version of Ansible that has this change.

Describe alternatives you've considered
None

Additional context
See @mbloch1986 who can provide more context.

Condition for status quering needs to change

ATm the DB query for the command status only checks, if the command is not in Pending. Since the offline snapshot also use different status, there rather should a check if a command is in state Success or Failed

Query for offline-snapshot command executions tatus

The steps of creating the offline snapshots requires a different way to query for the status of the command execution, since the offline snapshot lambda function executes more then one ssm send_command command.

Syntax err or in dynamo db module

There is a small syntax error in the dynamodb module which needs to be fixed.

Introduce SSM commands configuration

Most SSM commands parameters are currently configured as defaults within aem-aws-stack-builder.
The problem with this is that execution timeout and service checks rely on the default, which is not going to work for every user.

Those timeouts need to be exposed as aem-stack-manager-messenger user configuration properties.

Deploy Artifacts for Consolidated

We need to add a additional deploy artifacts command for the consolidated. Since the current implementation of deploy-artifacts only works with the full-set and not with the consolidated instance.

How to enable/disable crxde on dispatcher instances

Users are going to access the author and publish instances via author-dispatcher/publish-dispatcher. Therefore the command execution of enable/disable crxde needs an additional step to update the httpd configuration dispatcher.farmers.any.

The question is, how to we want to enable the access.

One solution is to modify the enable-crxde.pp in aem-resources to modify the file e.q.

if $dispatcher_file_exist == 'true' {
  if $aem_id == 'author' {
    exec { 'author-dispatcher-allow-crxde':
      command => "sed -i.bak 's/system|crx|admin/system|admin/g' ${$dispatcher_file}",
      path    => ["/bin"],
    }
  }

if we want it that way, we need to apply the shell script enable-crxde and disable-crxde on the dispatcher instances.

Or do we want to modify it via deploy artifacts ? The problem, when users are using their own configuration, it's going to be overwritten.

Wrong config path in run-playbook.sh script

The path to the Ansible configuration is wrong in the run-playbook.sh script

https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/scripts/run-playbook.sh#L25

Consistency with original design of Stack Manager parameters

Currently Stack Manager expect the stack_prefix to have the stack prefix of the target AEM environment, whereas the stack prefix of the Stafck Manager environment should be configured in the YAML file.
This is inconsistent to the original design, and would break backward compatibility with existing users.

Stack Manager parameters should be reverted back to the original design where:

stack_prefix is the stack prefix value of the Stack Manager
target_aem_stack_prefix is the stack prefix value of the target AEM environment where the event would be executed against

Fix deploy-artifact message

deploy-artifact message is currently incomplete https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/files/deploy-artifact.txt

It's missing package group, name, and version, along with replicate, activate, and force values.

Message action status checking

Currently send-message playbook only sends the message in a fire-and-forget model.
This needs to be improved with a follow up step to poll for the status of the message, i.e. keep polling while status is pending, and stop when it's either a success or a failure, or otherwise it will eventually time out.
While polling, it would be good to display the status as a feedback to the user.

Ansible should fail if a command failed

Similar to the offline-snapshot and offline-compaction-snapshot the send_message playbook should fail if the command execution has the state failed.

Operational Task to update the SSL certificate via Granite

Is your feature request related to a problem? Please describe.
We should consider adding a new operational task to upate the SSL Certifiate on AEM when SSL was enabled via Granite

Describe the solution you'd like
Run a stack manager messenger command to update the SSL Certificate of AEM when it was enabled via Granite

Describe alternatives you've considered
Doing it manually

Ansible cloudformation_facts no longer returning any result

Describe the bug
Ansible cloudformation_facts no longer returns any result, causing an error in the subsequent Ansible steps that require the result.

To Reproduce
Run any stack manager messenger action.

Expected behavior
For the action to be completed successfully. However, it ended up in error:

TASK [cbus-aoc-fs-dev1 - test-readiness-full-set: Retrieve Main Stack CloudFormation resources facts] ***
[DEPRECATION WARNING]: The 'cloudformation_facts' module has been renamed to 
'cloudformation_info', and the renamed one no longer returns ansible_facts. 
This feature will be removed from amazon.aws in a release after 2021-12-01. 
Deprecation warnings can be disabled by setting deprecation_warnings=False in 
ansible.cfg.
ok: [127.0.0.1] => changed=false 
  ansible_facts:
    cloudformation: {}

TASK [set_fact] ****************************************************************
fatal: [127.0.0.1]: FAILED! => 
  msg: |-
    The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'aoc-stack-mgr-1-aem-stack-manager-main-stack'
  
    The error appears to be in '/tmp/shinesolutions/aem-opencloud-manager/aem-stack-manager-messenger/provisioners/ansible/playbooks/send-message.yaml': line 35, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.
  
    The offending line appears to be:
  
  
        - set_fact:
          ^ here

PLAY RECAP *********************************************************************
127.0.0.1                  : ok=3    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0   

make: *** [check-readiness-full-set] Error 2

Screenshots
N/A

Environment (please complete the following information if relevant):
N/A

Additional context
N/A

Combine topic and message config args

Currently the make commands expect topic_config_file and message_config_file args.
This is inconsistent with packer-aem and aem-aws-stack-builder, and too long as well.

Replace those args with config_path arg that passes the directory where the config files exist instead.

Add check if stack exist

Since the lambda function doesn't inform the messenger if the stack or the component doesn't exist and the messenger runs in a timeout we need to add a check for that. Maybe similar to the ones which are already implemented in the stack builder during stack creation/deletion.

When stack exist run otherwise fail with message.

Enable/disable debug mode

To reduce the output of the stack manager messenger we can suppress the information output of the tasks with introducing the option parameter

no_log: True

The value of no_log can be configurable from the yaml configuration. This gives us the ability to introduce a debug option in the yaml configuration.

Improving offline-snapshot and offline-compaction-snapshto playbooks

The playbooks for offline-snapshot and offline-compaction-snapshots needs to be improved. atm when one of the jobs failed e.g. stop publish Ansible will still check if the following jobs were run successful, we need to add a skip the scan/querys and go directly to downloading and showing the error logfile.

Distinguish component unreadiness from check readiness failure

For checking environment readiness, we have to set a delay for a period of time (currently 1 minute) before executing readiness check. This is due to the fact that the component (currently orchestrator) that performs the readiness check might not be ready yet.

This 1 minute delay needs to be eliminated, it would be good if it's possible to identify the component unreadiness (i.e. the orchestrator not in service) and then translate that to a pending state as event status.

This way the check readiness event naturally waits while it's pending the component readiness in order to be able to check the environment readiness.

Update readiness check template components

In combination with ticket shinesolutions/aem-aws-stack-provisioner#178 we can update the template

test-readiness-full-set.json
https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/templates/sns/test-readiness-full-set.json

To only send the SSM document to the component orchestrator with the improvements of the readiness check we don't need to send the SSM document to all components. It further allows us to check recovered components.

Furthermore we can remove the following make target

check-readiness-full-set-with-disabled-chaosmonkey

& SSM template

test-readiness-full-set-with-disabled-chaosmonkey.json

https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/templates/sns/test-readiness-full-set-with-disabled-chaosmonkey.json

shinesolutions / aem-stack-manager-messenger Goto Github PK

aem-stack-manager-messenger's People

Contributors

Stargazers

Watchers

Forkers

aem-stack-manager-messenger's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs