GithubHelp home page GithubHelp logo

shinesolutions / aem-stack-manager-messenger Goto Github PK

View Code? Open in Web Editor NEW
2.0 11.0 9.0 366 KB

Send messages to AEM Stack Manager via SNS Topic

License: Apache License 2.0

Makefile 26.54% Shell 40.73% Python 32.72%
aem aem-opencloud stack-management aws environment-manager

aem-stack-manager-messenger's People

Contributors

cliffano avatar hoomaan-kh avatar mbloch1986 avatar melbit-nishantsharma avatar michaeldiender-shinesolutions avatar ovlords avatar phillipi-shinesolutions avatar shineworks avatar siebes avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aem-stack-manager-messenger's Issues

Add validation for variables

Is your feature request related to a problem? Please describe.
With plenties of variables in a project, it is easy to encounter pipeline failure due to configuration errors, especially when Opencloud is upgraded with variable modification.

Describe the solution you'd like
Before running pipelines, all required variables should be checked, making sure them meet the criteria.

SNS topic ARNs should be described and not preconfigured

Currently SNS topic ARNs are specified in the config file.
This should be improved by describing the Stack Manager stack's resources and retrieve the SNS topic ARNs instead.

The problem with preconfiguring the ARNs is that user can't simply create Stack Manager stacks and use them, as they have to check the ARNs and set it in a config file.
In strict environments where configuration files have to be signed off, they won't be able to modify configuration on demand.

Change log dumping default and make it configurable

Currently Stack Manager Messenger dumps the log file only when there is a failure.
However, many users prefer to always have the log file regardless, which seems to be a reasonable default, i.e. users always want to see log output to 1) identify what caused a failure, 2) verify that a success is doing what is expected .

We need to do the following:

  1. Change default behaviour to always dump the log
  2. Introduce configuration show_log_on_failure_only , and set it to false by default

This configuration is handy for users who want to dump the log file only on failure.

Configurable stack manager event timeout setting

Current default timeout setting on stack manager messenger covers most use cases, however, it's not enough when user needs to deploy a large package (package installation might take, e.g. 20-30 minutes).

In order to support this edge case, we need to make this timeout setting configurable.

NoAuthHandlerFound when sending SNS message on CodeBuild

AEM Stack Manager Messenger executes the tests by sending SNS message, which works fine on most machines, except when it's run on CodeBuild.
It's worth investigating if there's a replacement sns module that uses boto3 under the hood.

TASK [debug] *******************************************************************
ok: [127.0.0.1] => 
  msg: 'Send message: {''task'': ''test-readiness-consolidated'', ''externalId'': u''8cd6cbf0-f2aa-53d1-b848-fd687cb63fc4'', ''details'': {''comment'': ''test the AEM Consolidated stack to see if it is ready to be used'', ''component'': [''author-publish-dispatcher'']}, ''stack_prefix'': u''ci-rhel7-aem62-consolidated''}, with subject: ci-rhel7-aem62-consolidated - test-readiness-consolidated, to topic: arn:aws:sns:ap-southeast-2:918473058104:ci-rhel7-aem62-stack-manager-AemStackManager, in region: ap-southeast-2'

TASK [ci-rhel7-aem62-consolidated - test-readiness-consolidated: Send message to SNS Topic] ***
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
fatal: [127.0.0.1]: FAILED! => changed=false 
  msg: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
	to retry, use: --limit @/codebuild/output/src357326021/src/stage/aem-stack-manager-messenger-1.5.2/ansible/playbooks/send-message.retry

Display action/event log message

Another step to be added to the playbook is one that downloads the log from S3 to the workspace, and then optionally display the log output (e.g. if the result is a failure, display the log output, if it's a success, up to user to check the log file in the filesystem).

This will help the user with troubleshooting the result of the action.

Remove 'sns.py' as it is no longer needed

Is your feature request related to a problem? Please describe.
No, it's a cleaning up unnecessary code.

Describe the solution you'd like
The file "provisioners/ansible/library/sns.py" is no longer needed as the Pull Request github.com/ansible/ansible/pull/45634
Has been merged and released. See

# THIS FILE SHOULD BE REMOVED WHEN github.com/ansible/ansible/pull/45634 IS MERGED AND RELEASED

Needs an updated version of Ansible that has this change.

Describe alternatives you've considered
None

Additional context
See @mbloch1986 who can provide more context.

Condition for status quering needs to change

ATm the DB query for the command status only checks, if the command is not in Pending. Since the offline snapshot also use different status, there rather should a check if a command is in state Success or Failed

Query for offline-snapshot command executions tatus

The steps of creating the offline snapshots requires a different way to query for the status of the command execution, since the offline snapshot lambda function executes more then one ssm send_command command.

Introduce SSM commands configuration

Most SSM commands parameters are currently configured as defaults within aem-aws-stack-builder.
The problem with this is that execution timeout and service checks rely on the default, which is not going to work for every user.

Those timeouts need to be exposed as aem-stack-manager-messenger user configuration properties.

Deploy Artifacts for Consolidated

We need to add a additional deploy artifacts command for the consolidated. Since the current implementation of deploy-artifacts only works with the full-set and not with the consolidated instance.

How to enable/disable crxde on dispatcher instances

Users are going to access the author and publish instances via author-dispatcher/publish-dispatcher. Therefore the command execution of enable/disable crxde needs an additional step to update the httpd configuration dispatcher.farmers.any.

The question is, how to we want to enable the access.

One solution is to modify the enable-crxde.pp in aem-resources to modify the file e.q.

if $dispatcher_file_exist == 'true' {
  if $aem_id == 'author' {
    exec { 'author-dispatcher-allow-crxde':
      command => "sed -i.bak 's/system|crx|admin/system|admin/g' ${$dispatcher_file}",
      path    => ["/bin"],
    }
  }

if we want it that way, we need to apply the shell script enable-crxde and disable-crxde on the dispatcher instances.

Or do we want to modify it via deploy artifacts ? The problem, when users are using their own configuration, it's going to be overwritten.

Consistency with original design of Stack Manager parameters

Currently Stack Manager expect the stack_prefix to have the stack prefix of the target AEM environment, whereas the stack prefix of the Stafck Manager environment should be configured in the YAML file.
This is inconsistent to the original design, and would break backward compatibility with existing users.

Stack Manager parameters should be reverted back to the original design where:

  • stack_prefix is the stack prefix value of the Stack Manager
  • target_aem_stack_prefix is the stack prefix value of the target AEM environment where the event would be executed against

Message action status checking

Currently send-message playbook only sends the message in a fire-and-forget model.
This needs to be improved with a follow up step to poll for the status of the message, i.e. keep polling while status is pending, and stop when it's either a success or a failure, or otherwise it will eventually time out.
While polling, it would be good to display the status as a feedback to the user.

Operational Task to update the SSL certificate via Granite

Is your feature request related to a problem? Please describe.
We should consider adding a new operational task to upate the SSL Certifiate on AEM when SSL was enabled via Granite

Describe the solution you'd like
Run a stack manager messenger command to update the SSL Certificate of AEM when it was enabled via Granite

Describe alternatives you've considered
Doing it manually

Ansible cloudformation_facts no longer returning any result

Describe the bug
Ansible cloudformation_facts no longer returns any result, causing an error in the subsequent Ansible steps that require the result.

To Reproduce
Run any stack manager messenger action.

Expected behavior
For the action to be completed successfully. However, it ended up in error:

TASK [cbus-aoc-fs-dev1 - test-readiness-full-set: Retrieve Main Stack CloudFormation resources facts] ***
[DEPRECATION WARNING]: The 'cloudformation_facts' module has been renamed to 
'cloudformation_info', and the renamed one no longer returns ansible_facts. 
This feature will be removed from amazon.aws in a release after 2021-12-01. 
Deprecation warnings can be disabled by setting deprecation_warnings=False in 
ansible.cfg.
ok: [127.0.0.1] => changed=false 
  ansible_facts:
    cloudformation: {}

TASK [set_fact] ****************************************************************
fatal: [127.0.0.1]: FAILED! => 
  msg: |-
    The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'aoc-stack-mgr-1-aem-stack-manager-main-stack'
  
    The error appears to be in '/tmp/shinesolutions/aem-opencloud-manager/aem-stack-manager-messenger/provisioners/ansible/playbooks/send-message.yaml': line 35, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.
  
    The offending line appears to be:
  
  
        - set_fact:
          ^ here

PLAY RECAP *********************************************************************
127.0.0.1                  : ok=3    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0   

make: *** [check-readiness-full-set] Error 2

Screenshots
N/A

Environment (please complete the following information if relevant):
N/A

Additional context
N/A

Combine topic and message config args

Currently the make commands expect topic_config_file and message_config_file args.
This is inconsistent with packer-aem and aem-aws-stack-builder, and too long as well.

Replace those args with config_path arg that passes the directory where the config files exist instead.

Add check if stack exist

Since the lambda function doesn't inform the messenger if the stack or the component doesn't exist and the messenger runs in a timeout we need to add a check for that. Maybe similar to the ones which are already implemented in the stack builder during stack creation/deletion.

When stack exist run otherwise fail with message.

Enable/disable debug mode

To reduce the output of the stack manager messenger we can suppress the information output of the tasks with introducing the option parameter

no_log: True

The value of no_log can be configurable from the yaml configuration. This gives us the ability to introduce a debug option in the yaml configuration.

Improving offline-snapshot and offline-compaction-snapshto playbooks

The playbooks for offline-snapshot and offline-compaction-snapshots needs to be improved. atm when one of the jobs failed e.g. stop publish Ansible will still check if the following jobs were run successful, we need to add a skip the scan/querys and go directly to downloading and showing the error logfile.

Distinguish component unreadiness from check readiness failure

For checking environment readiness, we have to set a delay for a period of time (currently 1 minute) before executing readiness check. This is due to the fact that the component (currently orchestrator) that performs the readiness check might not be ready yet.

This 1 minute delay needs to be eliminated, it would be good if it's possible to identify the component unreadiness (i.e. the orchestrator not in service) and then translate that to a pending state as event status.

This way the check readiness event naturally waits while it's pending the component readiness in order to be able to check the environment readiness.

Update readiness check template components

In combination with ticket shinesolutions/aem-aws-stack-provisioner#178 we can update the template

test-readiness-full-set.json
https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/templates/sns/test-readiness-full-set.json

To only send the SSM document to the component orchestrator with the improvements of the readiness check we don't need to send the SSM document to all components. It further allows us to check recovered components.

Furthermore we can remove the following make target

check-readiness-full-set-with-disabled-chaosmonkey

& SSM template

test-readiness-full-set-with-disabled-chaosmonkey.json

https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/templates/sns/test-readiness-full-set-with-disabled-chaosmonkey.json

Add unschedule/schedule tasks for all scheduled jobs

In order to avoid running integration tests while scheduled tasks are running, e.g. testing snapshot task while a schedule snapshot task is running, we need to introduce more unschedule/schedule tasks for all those scheduled jobs.

The idea is that the tests should unschedule all of those jobs before running the tests, and then rescheduling the jobs after the tests.

Rename test-readiness to check-readiness

To avoid confusion between testing and checking the readiness state of AEM architecture, all leftover references to test-readiness must be replaced with check-readiness .

Note that the Makefile targets already have check- prefix, so userland doesn't have to change anything.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.