shinesolutions / aem-stack-manager-messenger Goto Github PK
View Code? Open in Web Editor NEWSend messages to AEM Stack Manager via SNS Topic
License: Apache License 2.0
Send messages to AEM Stack Manager via SNS Topic
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
With plenties of variables in a project, it is easy to encounter pipeline failure due to configuration errors, especially when Opencloud is upgraded with variable modification.
Describe the solution you'd like
Before running pipelines, all required variables should be checked, making sure them meet the criteria.
Currently SNS topic ARNs are specified in the config file.
This should be improved by describing the Stack Manager stack's resources and retrieve the SNS topic ARNs instead.
The problem with preconfiguring the ARNs is that user can't simply create Stack Manager stacks and use them, as they have to check the ARNs and set it in a config file.
In strict environments where configuration files have to be signed off, they won't be able to modify configuration on demand.
The variabled defined in the JSON Ttemplate aren't getting resolved anymore since Ansible version 2.8.2.
Currently Stack Manager Messenger dumps the log file only when there is a failure.
However, many users prefer to always have the log file regardless, which seems to be a reasonable default, i.e. users always want to see log output to 1) identify what caused a failure, 2) verify that a success is doing what is expected .
We need to do the following:
show_log_on_failure_only
, and set it to false by defaultThis configuration is handy for users who want to dump the log file only on failure.
Current default timeout setting on stack manager messenger covers most use cases, however, it's not enough when user needs to deploy a large package (package installation might take, e.g. 20-30 minutes).
In order to support this edge case, we need to make this timeout setting configurable.
AEM Stack Manager Messenger executes the tests by sending SNS message, which works fine on most machines, except when it's run on CodeBuild.
It's worth investigating if there's a replacement sns module that uses boto3 under the hood.
TASK [debug] *******************************************************************
ok: [127.0.0.1] =>
msg: 'Send message: {''task'': ''test-readiness-consolidated'', ''externalId'': u''8cd6cbf0-f2aa-53d1-b848-fd687cb63fc4'', ''details'': {''comment'': ''test the AEM Consolidated stack to see if it is ready to be used'', ''component'': [''author-publish-dispatcher'']}, ''stack_prefix'': u''ci-rhel7-aem62-consolidated''}, with subject: ci-rhel7-aem62-consolidated - test-readiness-consolidated, to topic: arn:aws:sns:ap-southeast-2:918473058104:ci-rhel7-aem62-stack-manager-AemStackManager, in region: ap-southeast-2'
TASK [ci-rhel7-aem62-consolidated - test-readiness-consolidated: Send message to SNS Topic] ***
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
fatal: [127.0.0.1]: FAILED! => changed=false
msg: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
to retry, use: --limit @/codebuild/output/src357326021/src/stage/aem-stack-manager-messenger-1.5.2/ansible/playbooks/send-message.retry
If the DynamoDB query fails because of hitting the AWS limit for the specified AWS account the stack manager messenger task execution fails.
To avoid this we should improve the until loop to loop as long as the registered variables are not defined.
Another step to be added to the playbook is one that downloads the log from S3 to the workspace, and then optionally display the log output (e.g. if the result is a failure, display the log output, if it's a success, up to user to check the log file in the filesystem).
This will help the user with troubleshooting the result of the action.
Is your feature request related to a problem? Please describe.
No, it's a cleaning up unnecessary code.
Describe the solution you'd like
The file "provisioners/ansible/library/sns.py" is no longer needed as the Pull Request github.com/ansible/ansible/pull/45634
Has been merged and released. See
Needs an updated version of Ansible that has this change.
Describe alternatives you've considered
None
Additional context
See @mbloch1986 who can provide more context.
ATm the DB query for the command status only checks, if the command is not in Pending. Since the offline snapshot also use different status, there rather should a check if a command is in state Success
or Failed
The steps of creating the offline snapshots requires a different way to query for the status of the command execution, since the offline snapshot lambda function executes more then one ssm send_command command.
There is a small syntax error in the dynamodb module which needs to be fixed.
Most SSM commands parameters are currently configured as defaults within aem-aws-stack-builder.
The problem with this is that execution timeout and service checks rely on the default, which is not going to work for every user.
Those timeouts need to be exposed as aem-stack-manager-messenger user configuration properties.
We need to add a additional deploy artifacts command for the consolidated. Since the current implementation of deploy-artifacts only works with the full-set and not with the consolidated instance.
Users are going to access the author and publish instances via author-dispatcher/publish-dispatcher. Therefore the command execution of enable/disable crxde needs an additional step to update the httpd configuration dispatcher.farmers.any.
The question is, how to we want to enable the access.
One solution is to modify the enable-crxde.pp in aem-resources to modify the file e.q.
if $dispatcher_file_exist == 'true' {
if $aem_id == 'author' {
exec { 'author-dispatcher-allow-crxde':
command => "sed -i.bak 's/system|crx|admin/system|admin/g' ${$dispatcher_file}",
path => ["/bin"],
}
}
if we want it that way, we need to apply the shell script enable-crxde and disable-crxde on the dispatcher instances.
Or do we want to modify it via deploy artifacts ? The problem, when users are using their own configuration, it's going to be overwritten.
The path to the Ansible configuration is wrong in the run-playbook.sh script
Currently Stack Manager expect the stack_prefix
to have the stack prefix of the target AEM environment, whereas the stack prefix of the Stafck Manager environment should be configured in the YAML file.
This is inconsistent to the original design, and would break backward compatibility with existing users.
Stack Manager parameters should be reverted back to the original design where:
stack_prefix
is the stack prefix value of the Stack Managertarget_aem_stack_prefix
is the stack prefix value of the target AEM environment where the event would be executed againstdeploy-artifact
message is currently incomplete https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/files/deploy-artifact.txt
It's missing package group, name, and version, along with replicate, activate, and force values.
Currently send-message playbook only sends the message in a fire-and-forget model.
This needs to be improved with a follow up step to poll for the status of the message, i.e. keep polling while status is pending, and stop when it's either a success or a failure, or otherwise it will eventually time out.
While polling, it would be good to display the status as a feedback to the user.
Similar to the offline-snapshot and offline-compaction-snapshot the send_message playbook should fail if the command execution has the state failed.
Is your feature request related to a problem? Please describe.
We should consider adding a new operational task to upate the SSL Certifiate on AEM when SSL was enabled via Granite
Describe the solution you'd like
Run a stack manager messenger command to update the SSL Certificate of AEM when it was enabled via Granite
Describe alternatives you've considered
Doing it manually
Describe the bug
Ansible cloudformation_facts no longer returns any result, causing an error in the subsequent Ansible steps that require the result.
To Reproduce
Run any stack manager messenger action.
Expected behavior
For the action to be completed successfully. However, it ended up in error:
TASK [cbus-aoc-fs-dev1 - test-readiness-full-set: Retrieve Main Stack CloudFormation resources facts] ***
[DEPRECATION WARNING]: The 'cloudformation_facts' module has been renamed to
'cloudformation_info', and the renamed one no longer returns ansible_facts.
This feature will be removed from amazon.aws in a release after 2021-12-01.
Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.
ok: [127.0.0.1] => changed=false
ansible_facts:
cloudformation: {}
TASK [set_fact] ****************************************************************
fatal: [127.0.0.1]: FAILED! =>
msg: |-
The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'aoc-stack-mgr-1-aem-stack-manager-main-stack'
The error appears to be in '/tmp/shinesolutions/aem-opencloud-manager/aem-stack-manager-messenger/provisioners/ansible/playbooks/send-message.yaml': line 35, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- set_fact:
^ here
PLAY RECAP *********************************************************************
127.0.0.1 : ok=3 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
make: *** [check-readiness-full-set] Error 2
Screenshots
N/A
Environment (please complete the following information if relevant):
N/A
Additional context
N/A
Currently the make commands expect topic_config_file
and message_config_file
args.
This is inconsistent with packer-aem and aem-aws-stack-builder, and too long as well.
Replace those args with config_path
arg that passes the directory where the config files exist instead.
Since the lambda function doesn't inform the messenger if the stack or the component doesn't exist and the messenger runs in a timeout we need to add a check for that. Maybe similar to the ones which are already implemented in the stack builder during stack creation/deletion.
When stack exist run otherwise fail with message.
To reduce the output of the stack manager messenger we can suppress the information output of the tasks with introducing the option parameter
no_log: True
The value of no_log can be configurable from the yaml configuration. This gives us the ability to introduce a debug option in the yaml configuration.
The playbooks for offline-snapshot and offline-compaction-snapshots needs to be improved. atm when one of the jobs failed e.g. stop publish Ansible will still check if the following jobs were run successful, we need to add a skip the scan/querys and go directly to downloading and showing the error logfile.
For checking environment readiness, we have to set a delay for a period of time (currently 1 minute) before executing readiness check. This is due to the fact that the component (currently orchestrator) that performs the readiness check might not be ready yet.
This 1 minute delay needs to be eliminated, it would be good if it's possible to identify the component unreadiness (i.e. the orchestrator not in service) and then translate that to a pending state as event status.
This way the check readiness event naturally waits while it's pending the component readiness in order to be able to check the environment readiness.
In combination with ticket shinesolutions/aem-aws-stack-provisioner#178 we can update the template
test-readiness-full-set.json
https://github.com/shinesolutions/aem-stack-manager-messenger/blob/master/templates/sns/test-readiness-full-set.json
To only send the SSM document to the component orchestrator
with the improvements of the readiness check we don't need to send the SSM document to all components. It further allows us to check recovered components.
Furthermore we can remove the following make target
check-readiness-full-set-with-disabled-chaosmonkey
& SSM template
test-readiness-full-set-with-disabled-chaosmonkey.json
As a general purpose system we need to make sure that some of the stack names that can be invoked by messenger can be variable.
In order to avoid running integration tests while scheduled tasks are running, e.g. testing snapshot task while a schedule snapshot task is running, we need to introduce more unschedule/schedule tasks for all those scheduled jobs.
The idea is that the tests should unschedule all of those jobs before running the tests, and then rescheduling the jobs after the tests.
To avoid confusion between testing and checking the readiness state of AEM architecture, all leftover references to test-readiness
must be replaced with check-readiness
.
Note that the Makefile targets already have check-
prefix, so userland doesn't have to change anything.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.