elastic / ansible-elastic-cloud-enterprise Goto Github PK
View Code? Open in Web Editor NEWAnsible playbooks for Elastic Cloud Enterprise (ECE)
Home Page: https://www.elastic.co/products/ece
License: Other
Ansible playbooks for Elastic Cloud Enterprise (ECE)
Home Page: https://www.elastic.co/products/ece
License: Other
The ECE default memory settings are pretty much useless.
We set up a test installation and used the settings as documented for a 'Small baseline installation' for the primary node, but - accidentally - the secondary and tertiary node did not.
End result was that secondary and tertiary started, but could not be administered via Cloud UI - worse, they could not be /removed/ and we had to ditch the whole installation and start all over again.
On the plus side: installation is really fast!
We're currently using a dirty hack to solve this problem.
In 'bootstrap/main.yml' the task
-name: Memory settings for ECE set_fact: ece_memory_settings: ' {"runner:{"xms":"1G","xmx":"1G"}[...]}'
defines a variable and add
--memory-settings '{{ ece_memory_settings }}'
in the respective 'install_stack.yml' to the install script call (the whitespace before '{' is necessary, otherwise the line is not treated simply as text.
This is definitely the wrong way to do this - but it works for us so far!
A better approach might be to define variables for each (ECE) role and read the values from the 'inventory.yml' file?
I'm getting an error while executing below play.
{{ ece_version }}: 2.6.0
Error:
fatal: [ece-host]: FAILED! => {
"ansible_job_id": "75300605991.103745",
"changed": true,
"cmd": "/mnt/data/elastic-cloud-enterprise.sh upgrade --cloud-enterprise-version 2.6.0",
"delta": "0:00:00.665977",
"end": "2020-09-04 13:30:33.803465",
"finished": 1,
"invocation": {
"module_args": {
"_raw_params": "/mnt/data/elastic-cloud-enterprise.sh upgrade --cloud-enterprise-version 2.6.0",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": false
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2020-09-04 13:30:33.137488",
"stderr": "",
"stderr_lines": [],
"stdout": "\u001b[0;31mContainer frc-runners-runner was not found -- is the environment running?\u001b[0m",
"stdout_lines": [
"\u001b[0;31mContainer frc-runners-runner was not found -- is the environment running?\u001b[0m"
]
}
ECE is supported on RHEL 7 when using Docker 1.13 [1], however the playbook uses a default of docker_version: "18.09"
[2], which means users can end up with an unsupported installation if they don't adjust the docker_version
variable. It'd be great if we could default to 1.13
(maybe even prohibit 18.09
[3]?) if installing on RHEL 7.
[1] https://www.elastic.co/guide/en/cloud-enterprise/current/ece-prereqs-software-linux.html
[2] https://github.com/elastic/ansible-elastic-cloud-enterprise/blob/master/defaults/main.yml#L11
[3] https://github.com/elastic/ansible-elastic-cloud-enterprise/blob/master/tasks/base/RedHat-7/install_docker.yml#L24-L36
We'd like to allow customers to configure the amount of swap defined on each ECE host, instead of always using a predetermined formula as seen here
The formula should act as a default, but there may be some cases where 7% of disk space is too little or too much from a customers' perspective.
Hi,
this issue is regarding the process on how the required swap partition is created by the playbook.
The size of the swap patition is defined as 7% of the whole disk size, which is in my opinion way to much (and 7% should be the maximum).
ansible-elastic-cloud-enterprise/tasks/direct-install/setup_xfs.yml
Lines 13 to 15 in 479b8e6
Using the calculated size the logical volume for swap is created and initialized
ansible-elastic-cloud-enterprise/tasks/direct-install/setup_xfs.yml
Lines 31 to 32 in 479b8e6
Then the last step is to enable swapping to the newly created partition
ansible-elastic-cloud-enterprise/tasks/direct-install/setup_xfs.yml
Lines 50 to 51 in 479b8e6
I see the following issues:
What's your opinion on that?
Regards,
Alex
This is now hardcoded to /mnt/data
and we would like to take this role into use but our convention is to use /data
. Now I have to modify the role which introduces merge conflicts when we update it form upstream. Can you make it a variable in official repo?
UPD: Made PR for this since I already performed the necessary changes
The ansible-playbook fails with the following error when run on centos7 using this image (standard centos in gcp)
image - projects/centos-cloud/global/images/centos-7-v20191210
Ece version: 2.4.3
TASK [ansible-elastic-cloud-enterprise : include_tasks] ************************
included: /Users/omerkushmaro/.ansible/roles/ansible-elastic-cloud-enterprise/tasks/bootstrap/primary/install_stack.yml for 35.184.83.18
TASK [ansible-elastic-cloud-enterprise : Execute the primary installation] *****
fatal: [35.184.83.18]: FAILED! => {"changed": true, "cmd": "/home/elastic/elastic-cloud-enterprise.sh install --availability-zone us-central1-a --cloud-enterprise-version 2.4.3 --docker-registry docker.elastic.co --ece-docker-repository cloud-enterprise --memory-settings ' {\"runner\":{\"xms\":\"1G\",\"xmx\":\"1G\"},\"proxy\":{\"xms\":\"8G\",\"xmx\":\"8G\"},\"zookeeper\":{\"xms\":\"4G\",\"xmx\":\"4G\"},\"director\":{\"xms\":\"1G\",\"xmx\":\"1G\"},\"constructor\":{\"xms\":\"4G\",\"xmx\":\"4G\"},\"admin-console\":{\"xms\":\"4G\",\"xmx\":\"4G\"}}'", "delta": "0:00:00.520743", "end": "2020-01-30 17:16:30.355823", "msg": "non-zero return code", "rc": 1, "start": "2020-01-30 17:16:29.835080", "stderr": "", "stderr_lines": [], "stdout": "\u001b[0;31mCan't determine a default HOST_IP ('ip' tool can't be found). Please supply '--host-ip' with the appropriate ip address.\u001b[0m", "stdout_lines": ["\u001b[0;31mCan't determine a default HOST_IP ('ip' tool can't be found). Please supply '--host-ip' with the appropriate ip address.\u001b[0m"]}
use this terraform example for installing 3 node ece instance in gcp
When running the ansible job multiple times, there is a failure on the "create swap volume" task. The following output is shown
TASK [elastic-cloud-enterprise : Create swap volume]
******************************************************************************************************
fatal: [192.168.1.10]: FAILED! => {"changed": false, "msg": "Sorry, no shrinking of swap without force=yes."}
The server hardware has not been changed during runs.
Also if the proposed swap size is smaller than the current size, should this be run ?
Thanks
[2022-04-21T13:23:56.118Z] TASK [. : Install common base dependencies] ************************************
[2022-04-21T13:24:05.248Z] failed: [ec2-3-82-203-154.compute-1.amazonaws.com] (item=cloud-init) => {"ansible_loop_var": "item", "changed": false, "item": "cloud-init", "msg": "Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist", "rc": 1, "results": []}
When using the ansible-playbook to install ece, results in an invalid (revoked) certificate, that modern browsers block, preventing you from using the UI
Error is -
NET::ERR_CERT_REVOKED
(thus, the browser doesn't allow you to 'skip' and temporarily trust the certificate)
2.4.0 / 2.4.3
Install ece on a remote host, by running the ansible-playbook from your local machine (might be a timezone issue? just a thought)
We have been using --skip-tags setup_filesystem
when running this playbook, but after pulling in a more recent version, it seems the play is trying to create a volume for xvdb
. Before, it would skip this section due to the tag being set to skip, but now it seems to try and create the volume anyway.
It appears that in #55 the included tasks general/setup_xfs.yml
was moved to a new place and the tag setup_filesystem
was not moved with it.
The ansible-playbook should make sure docker server is running on the host before attempting to continue configuration
Hi,
i tried an upgrade of a newly installed test environment and noticed that only providing '--skip-tags base' has a severe drawback:
It still triggers the tasks in 'direct-install' which result in a reboot of the node. In my case the docker containers weren't up and running fast enough afterwards and the upgrade terminated with an error along the lines
Container frc-directors-director was not found -- does the current host have a role 'director'?
I haven't yet tried but i think using '--tags bootstrap' instead should be the correct way to do this.
Regards,
Steffen Elste
p.s.: Using '--tags bootstrap' did indeed work without any problems.
In my efforts to reproduce a different issue, I have identified a problem with the playbook installing docker. It appears we assert docker to be installed before it is able to attempt to install the desired version.
I believe this the issue is we attempt to validate the docker version BEFORE we attempt to install docker.
https://github.com/elastic/ansible-elastic-cloud-enterprise/blob/master/tasks/base/main.yml#L8-L11
However, ansible throws an error before we can do anything about it.
When installing ECE inside a more sandboxed environment we needed to add in some proxy settings.
I took a copy of the settings that were added, but I did not test this to know exactly which settings were definitely needed and which are duplicates etc.
---
- hosts: all
vars:
http_proxy: http://<proxy_ip>:8080
https_proxy: https://<proxy_ip>:8080
proxy_host: <proxy_ip>
proxy_port: 8080
proxy_env:
http_proxy: "{{ http_proxy }}"
https_proxy: "{{ https_proxy }}"
no_proxy: <primary_hostname>
When I get a chance I will test them, but putting this here as a placeholder and in case someone already knows the answer.
Hi
We are looking to deploy this, but we are currently standardising on Ubuntu 18.04, any chance of getting this supported in the ansible scripts ?
Thanks
It would be pretty useful to have ece-support-diagnostics (https://github.com/elastic/ece-support-diagnostics) available in the default installation as well.
Include the support diagnostics script at preparation time in the same way as the ECE installation/management script.
- name: Download ece support diagnostics
get_url:
url: "{{ ece_supportdiagnostics_url }}"
dest: /home/elastic/ece-support-diagnostics.sh
mode: 0755
SYSSTAT is not listed a dependency anywhere in the Playbooks
https://github.com/elastic/ansible-elastic-cloud-enterprise/blob/master/tasks/diagnostics/main.yml
This should be automatically deployed as part of the deployment of ECE otherwise when you run the ECE Support Tool it needs installing
Using --tags alone now weorks properly, so we can recommend using that instead for e.g. upgrades (--tags "bootstrap")
The playbook always assumes that the network interface is eth0.
in main.yml
- name: ensure dhcp dns is set
lineinfile:
path: /etc/sysconfig/network-scripts/ifcfg-eth0
line: "{{ item }}"
with_items:
- 'PeerDNS=yes'
- 'NM_CONTROLLED=yes'
My suggestion is to allow this to be passed via variable.
ip_conntrack
is currently loaded via ansible modprobe
which does not persist it anywhere.
We need to create a /etc/modules-load.d/
file in a seperate task (e.g. ip_conntrack.conf
)
Hi
This an issue in the "Copy keys from default user to elastic user" task from the "tasks/system/general/make_user.yml" file.
This issue occurs when the "standard" user (user running ansible) does not have any authorized_keys.
Scenario is
Would it not be better to allow the "authorized_keys" content to be defined in the ansible role ? Or at least not fail if no keys are defined ?
At the moment I am working around this with a separate task, that just essentially populates the "standard" users authorized_keys file prior to the make_user.yml running
Thanks
This is the error that I am getting from running ansible-playbook -i inventory.yml site.yml
. Ubuntu image - ubuntu-1804-bionic-v20210119a on GCP
TASK [ansible-elastic-cloud-enterprise : sysctl_scripts.yml || load ip_conntrack if needed] ******************************************
fatal: [35.209.97.34]: FAILED! => {"changed": false, "msg": "modprobe: FATAL: Module ip_conntrack not found in directory /lib/modules/5.4.0-1034-gcp\n", "name": "ip_conntrack", "params": "", "rc": 1, "state": "present", "stderr": "modprobe: FATAL: Module ip_conntrack not found in directory /lib/modules/5.4.0-1034-gcp\n", "stderr_lines": ["modprobe: FATAL: Module ip_conntrack not found in directory /lib/modules/5.4.0-1034-gcp"], "stdout": "", "stdout_lines": []}
Let me describe with a table because that should illustrate better what I mean.
Basically in the documentation we sometimes mention another concrete software than in the actual documentation
So to summarize is seems that there is no playbook completely matching the requirements in documentation at the moment.
TL;DR: We should update the containerd version in RHEL/CentOS (i.e. here) to 1.5.*
to fix runc the issue described bwelow and also match our own install docs, that use sudo yum install -y docker-ce-20.10* docker-ce-cli-20.10* containerd.io-1.5.*
today (ECE 3.4).
This is a follow up to #125 and #121.
We worked with a user yesterday that had issues with ECE (or better docker), where newly created containers would be stuck in "Created" state, similar to the issue above.
In #125 we pinned the containerd version to 1.4.3
for RHEL/CentOS and back then this was containerd.io-1.4.3-3.1.el8.x86_64
and using runc 1.0.0-rc92
.
Today if one installs 1.4.3
it translates to containerd.io-1.4.3-3.2.el8.x86_64
and that ships with runc 1.0.0-rc93
and is affected by opencontainers/runc#2871 .
I am pasting in a series of comments provided by a user which they encountered when "testing the latest version of the ansible playbook/install script". User requested I post these issues because they do not have access to github:
Install process:
Ubuntu 18.04/docker 19.03: docker_version 18.09 => assertion failure when running the playbook on Ubuntu 18.04 that requires docker 19.03.
Ubuntu 18.04/docker 19.03: docker19.03.conf is present in the "template" folder but there is no task to copy it on the remote system. I only see tasks related to docker 18.09 or docker 1.13:
~/ansible-elastic-cloud-enterprise/templates$ ls
docker1.13.conf docker18.09.conf docker19.03.conf elastic.cfg.j2 format-drives.j2
~/ansible-elastic-cloud-enterprise$ vi tasks/base/general/configure_docker.yml
- name: Ensures /etc/systemd/system/docker.service.d dir exists
file:
path: /etc/systemd/system/docker.service.d
state: directory
when: docker_version == '18.09'
- name: Create service.d docker.conf
template:
src: docker{{ docker_version }}.conf
dest: /etc/systemd/system/docker.service.d/docker.conf
when: docker_version == '18.09'
- name: set docker storage options
lineinfile:
path: /etc/sysconfig/docker
regexp: "^OPTIONS='(.*)'"
line: "OPTIONS='-g {{ data_dir }}/docker \\1'"
backrefs: yes
create: yes
when: docker_version == '1.13'
- name: set docker network options
lineinfile:
path: /etc/sysconfig/docker-network
regexp: '^DOCKER_NETWORK_OPTIONS='
line: 'DOCKER_NETWORK_OPTIONS="--bip={{ docker_bridge_ip }}"'
create: yes
when: docker_version == '1.13'
- name: set docker storage driver
lineinfile:
path: /etc/sysconfig/docker-storage-setup
regexp: '^DOCKER_NETWORK_OPTIONS='
line: 'STORAGE_DRIVER={{ docker_storage_driver }}'
create: yes
when: docker_version == '1.13'
Upgrade process:
Problematic lines of code (removed the /dev/null to be able to see the root cause of this error).
SOURCE_CONTAINER_NAME="frc-runners-runner"
HOST_STORAGE_PATH=$(docker -H "unix://${HOST_DOCKER_HOST}" exec -it $SOURCE_CONTAINER_NAME bash -c 'echo -n $HOST_STORAGE_PATH' | cut -d: -f 2)
if [[ -z "${HOST_STORAGE_PATH}" ]]; then
echo -e "${RED}Container $SOURCE_CONTAINER_NAME was not found -- is the environment running?${NC}"
exit $GENERAL_ERROR_EXIT_CODE
fi
SOURCE_CONTAINER_NAME="frc-directors-director"
ZK_ROOT_PASSWORD=$(docker -H "unix://${HOST_DOCKER_HOST}" exec -it $SOURCE_CONTAINER_NAME bash -c 'echo -n $FOUND_ZK_READWRITE' | cut -d: -f 2)
if [[ -z "${ZK_ROOT_PASSWORD}" ]]; then
echo -e "${RED}Container $SOURCE_CONTAINER_NAME was not found -- does the current host have a role 'director'?${NC}"
exit $GENERAL_ERROR_EXIT_CODE
fi
Error
"the input device is not a TTY"
TASK [elastic-cloud-enterprise : include_tasks]
included: /home/user/ansible/roles/elastic-cloud-enterprise/tasks/ece-bootstrap/upgrade.yml for <REDACTED>
TASK [elastic-cloud-enterprise : Execute upgrade]
fatal: [<REDACTED>]: FAILED! => {"changed": true, "cmd": "/home/elastic/elastic-cloud-enterprise.sh upgrade --cloud-enterprise-version 2.6.2 --docker-registry docker.elastic.co --ece-docker-repository cloud-enterprise", "delta": "0:00:00.434120", "end": "2020-10-12 10:29:11.156153", "msg": "non-zero return code", "rc": 1, "start": "2020-10-12 10:29:10.722033", "stderr": "+ SOURCE_CONTAINER_NAME=frc-runners-runner\n++ docker -H unix:///var/run/docker.sock exec -it frc-runners-runner bash -c 'echo -n $HOST_STORAGE_PATH'\n++ cut -d: -f 2\nthe input device is not a TTY\n+ HOST_STORAGE_PATH=\n+ [[ -z '' ]]\n+ echo -e '\\033[0;31mContainer frc-runners-runner was not found -- is the environment running?\\033[0m'\n+ exit 1", "stderr_lines": ["+ SOURCE_CONTAINER_NAME=frc-runners-runner", "++ docker -H unix:///var/run/docker.sock exec -it frc-runners-runner bash -c 'echo -n $HOST_STORAGE_PATH'", "++ cut -d: -f 2", "the input device is not a TTY", "+ HOST_STORAGE_PATH=", "+ [[ -z '' ]]", "+ echo -e '\\033[0;31mContainer frc-runners-runner was not found -- is the environment running?\\033[0m'", "+ exit 1"], "stdout": "\u001b[0;31mContainer frc-runners-runner was not found -- is the environment running?\u001b[0m", "stdout_lines": ["\u001b[0;31mContainer frc-runners-runner was not found -- is the environment running?\u001b[0m"]}
The task below in ~/elastic-cloud-enterprise/tasks/ece-bootstrap/main.yml fails (permission issue) when I run the playbook with my own admin user. This command has to run as root or elastic user. So a sudo instruction has to be added to that task.
- name: Check if an installation or upgrade should be performed
shell: docker ps -a -f name=frc-runners-runner --format {%raw%}"{{.Image}}"{%endraw%}
register: existing_runner
tags: [dbg]
become: yes
become_method: sudo
become_user: elastic
Error:
TASK [elastic-cloud-enterprise : Check if an installation or upgrade should be performed]
fatal: [<REDACTED>]: FAILED! => {"changed": true, "cmd": "docker ps -a -f name=frc-runners-runner --format \"{{.Image}}\"", "delta": "0:00:00.419693", "end": "2020-10-12 10:52:42.118514", "msg": "non-zero return code", "rc": 1, "start": "2020-10-12 10:52:41.698821", "stderr": "Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1&filters=%7B%22name%22%3A%7B%22frc-runners-runner%22%3Atrue%7D%7D: dial unix /var/run/docker.sock: connect: permission denied", "stderr_lines": ["Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1&filters=%7B%22name%22%3A%7B%22frc-runners-runner%22%3Atrue%7D%7D: dial unix /var/run/docker.sock: connect: permission denied"], "stdout": "", "stdout_lines": []}
Documentation update (README.md)
https://github.com/elastic/ansible-elastic-cloud-enterprise#performing-an-upgrade
The upgrade section of the documentation indicates to use the following command:
ansible-playbook -i inventory.yml site.yml --skip-tags base
By just skipping the base tag, the playbook still performs some destructive(volume-fs creation) or unwanted(system reboot) task from direct install.
To perform an upgrade of ece only, I had to use the following command:
ansible-playbook -i inventory.yml site.yml --tags bootstrap
I would like to use an image building tool like Packer with your Ansible role. However, there are a few issues with the role today that make it difficult to do so. Specifically the reboot in system/main.yml causes timeout issues and certain tasks in this file cannot be done at image build time. A possible solution would be to add two more tags for system
(i.e reboot
and setup
) so that these tasks can be included during image builds but excluded during deployments.
---
- name: Include OS specific vars
include_vars: "{{ item }}"
with_first_found:
- os_{{ ansible_distribution }}_{{ ansible_distribution_major_version }}.yml
- unsupported.yml
- name: Check that OS is supported
fail:
msg: "ERROR: OS {{ ansible_distribution }} {{ ansible_distribution_major_version}} is not supported!"
when: unsupported_version is defined and unsupported_version
- name: Assert docker version is supported
assert:
that: "docker_version in docker_version_map.keys()"
msg: "Docker version must be one of {{ docker_version_map.keys() }}"
- name: execute os specific tasks
include_tasks: "{{ ansible_distribution }}-{{ ansible_distribution_major_version}}/main.yml"
tags: [setup]
- include_tasks: general/make_user.yml
tags: [setup]
- include_tasks: general/set_limits.yml
tags: [setup]
- include_tasks: general/setup_xfs.yml
tags: [setup_filesystem, destructive]
when: ansible_lvm['vgs']['lxc'] is not defined or force_xfc == true
- include_tasks: general/update_grub_docker.yml
tags: [setup_filesystem, destructive]
- include_tasks: general/configure_docker.yml
tags: [install_docker, destructive]
- include_tasks: general/sysctl_scripts.yml
tags: [setup]
- include_tasks: general/kernel_modules.yml
tags: [setup]
- name: Reboot the machine with all defaults
shell: sleep 2 && shutdown -r now "Reboot for changes to take effect"
async: 1
poll: 0
ignore_errors: true
tags: [reboot]
- name: Wait for the reboot to complete
wait_for_connection:
connect_timeout: 20
sleep: 5
delay: 5
timeout: 600
tags: [reboot]
- include_tasks: general/setup_mount_permissions.yml
tags: [setup_filesystem]
Starting 2.13 and above (including 3.0 and above), ECE does not bootstrap on SLES 12 and 15, with docker 19 or 20:
bootstrap logs:
- Starting local runner {}
- Started local runner {}
- Waiting for runner container node {}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Errors have caused Elastic Cloud Enterprise installation to fail - Please check logs
Node type - initial
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
in docker logs of runner:
ok: run: docker-socket-proxy: (pid 30) 2s
Traceback (most recent call last):
File "/elastic_cloud_apps/runner/write_config.py", line 10, in <module>
with open('runner.conf', 'w') as dest:
PermissionError: [Errno 13] Permission denied: 'runner.conf'
What I noticed is ece user is well in passwd and group, and elastic
well belongs to ece
group! So this failure should not happen.
elastic:x:1000:1000::/home/elastic:/bin/false
ece:x:199:199::/home/ece:/bin/bash
ece:x:199:elastic
elastic:x:1000:
Indeed, path to runner.conf
:
$ ls -lah /elastic_cloud_apps/runner
total 16K
drwxrwxr-x 1 199 199 65 Apr 28 14:36 .
On ubuntu, user ece
is well set as owner of /elastic_cloud_apps/runner
, but on SLES it shows its uid 199
For bootstrapper docker container, it's well displayed ece
and not its uid
Also, the following command does not work:
$ setuser ece whoami
setuser: user ece not found
This does not make sense as ece
user is well defined in /etc/passwd
Again, it's all good on ubuntu and on SLES from inside boostrapper container
My guess is that docker have issues with mapping uid/gid between the host and the container. Indeed, the user/group ece
does not exists on the host. And so, elastic
does not belong to group ece
on the host.
On the host, create a user and group named ece
with uid/gid both 199
, and add user elastic
to ece
group.
Then run ECE installer, and that should work!
Related to #147
This syntax breaks on ansible 2.9. It works on ansible 2.10 (confirmed) and it seems it passed your tests on 2.8.
Minimal example to reproduce behavior :
--
- name: "test"
hosts: localhost
connection: local
tasks:
- name: Set empty capacity if not defined
set_fact:
capacity: "{{ capacity | default('') }}"
- name: "debug"
debug:
msg: /home/elastic/elastic-cloud-enterprise.sh {{ '--capacity ' + capacity if capacity }}
Results on Fedora 35 with Ansible 2.10.15 (python3)
ansible-playbook test.yml
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'
PLAY [add admin] ***************************************************************
TASK [Gathering Facts] *********************************************************
ok: [localhost]
TASK [Set empty capacity if not defined] ***************************************
ok: [localhost]
TASK [debug] *****************************************************************
ok: [localhost] =>
msg: '/home/elastic/elastic-cloud-enterprise.sh '
CentOS 7.9 with ansible 2.9.27 (python 2.7.5)
ansible-playbook test.yml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [add admin] *************************************************************************************************************
TASK [Gathering Facts] *******************************************************************************************************
ok: [localhost]
TASK [Set empty capacity if not defined] *************************************************************************************
ok: [localhost]
TASK [debug] ***************************************************************************************************************
fatal: [localhost]: FAILED! =>
msg: |-
The task includes an option with an undefined variable. The error was: the inline if-expression on line 1 evaluated to false and no else section was defined.
The error appears to be in '<my_secret_filepath>/test.yml': line 9, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
capacity: "{{ capacity | default('') }}"
- name: "debug"
^ here
I would suggest using jinja2 syntax to avoid any problems with Ansible 2.9 which I believe is the most widely used version at the moment.
I will create a PR once I check the correct jinja2 syntax.
With larger users of Elastic Cloud Enterprise (ECE) it is not uncommon that the set-up of a deployment is split up by a number of different teams who have different responsibilities. There are also a number of configuration items that are easy to miss with the branching nature of the documentation between the different Linux distribution requirements as well.
It would be helpful to have a "script" that can check that all of the pre-steps are complete before continuing. This will improve user experience as there are a number of items that require a re-install of the runners if they are missed.
My proposal would be to add two "checking" tasks. It would be helpful if we could run the checking tasks separately in additional to the larger tasks/main.yml
file. Some users prefer to manually install the software, so it would be helpful if we had something to run to check that items are correct before continuing.
tasks/system/main.yml
is executed have been completedtasks/bootstrap/main.yml
, add another sub-task that would run checks to verify all the prerequisites are complete before doing the install. There is an option to skip the system set-up task but we don't verify that the system configuration was done successfully.The checks could output with info / warn / error messages depending on if it is a hard requirement or a performance issue (e.g. memory / disk ratio).
This project is so great I'm already submitting a request to add support for RHEL :)
As ECE also support RHEL 7.x with RedHat Docker 1.13 it will be great if we can add that as another OS to the vars folder.
This role defaults to ece_version
2.2, but the current version is 2.4.3. Is there a reason to keep this at 2.2 vs keeping it in sync with ECE releases?
ECE 2.7
We've identified an issue where the default users and groups that Ansible provisions do not align with the user & group then used in other tasks.
The task in tasks/base/general/make_user.yml defines the group "elastic" and "docker" - Elastic then ends up being the primary group from what we can see.
But in the download step under tasks/ece-bootstrap/main.yml it tries to then set the group ownership of the file to elastic_user_group - which does not exist and was never created.
ece_supportdiagnostics_url: The location of the diagnostics tool. Can be a local file for offline installation.
Default: https://github.com/elastic/ece-support-diagnostics/archive/v1.1.tar.gz
Should be kept in line with
https://github.com/elastic/ece-support-diagnostics
Current is v2.0.2, v1.1 is 3 years old
https://github.com/elastic/ece-support-diagnostics/releases/tag/v2.0.2
Installed fresh 7.9 RHEL DVD ISO using VBox locally. Script throws the error seen in #51 resolved by (a8df0d5).
IP tool is installed. I discovered that the main.yml within tasks/base/RedHat-7 do not include the same configurations that allow the server to discover the IP address:
- name: ensure dhcp dns is set
lineinfile:
path: /etc/sysconfig/network-scripts/ifcfg-eth0
line: "{{ item }}"
with_items:
- 'PeerDNS=yes'
- 'NM_CONTROLLED=yes'
- name: set locale
lineinfile:
path: /etc/environment
line: "{{ item }}"
with_items:
- 'LANG=en_US.utf8'
- 'LC_CTYPE=en_US.utf8'
- name: set path
lineinfile:
path: /etc/profile.d/path.sh
line: "export PATH=$PATH:/usr/sbin"
create: yes
Also, by default, VirtualBox renames eth0 to enp0s3, and it does not appear that the ansible-elastic role checks for the name of the interface. I followed this guide to rename enp0s3 to eth0, and after updating the main.yml above, the installation completes successfully:
https://www.linuxtopic.com/2017/02/how-to-change-default-interface-name.html
HardCoded ece-support-diagnostics-1.1 in script
Will never run with later versions
Latest version is 2.02 unpacked tar file is ece-support-diagnostics-v2.0.2\diagnostics.sh not ece-support-diagnostics-1.1/diagnostics.sh
ece_version: 2.4.3
ece_docker_registry: docker.elastic.co
ece_docker_repository: cloud-enterprise
docker_config: ""
ece_installer_url: "https://download.elastic.co/cloud/elastic-cloud-enterprise.sh"
ece_runner_id: "{{ ansible_default_ipv4.address }}"
docker_version: "18.09"
Should docker_version now be
docker_version: "19.03"
At the moment the installation script is owned by root in elastic home directory with execution permissions on any user
$ ls -l /home/elastic/
-rwxr-xr-x 1 root root 54962 Feb 7 22:55 elastic-cloud-enterprise.sh
This is not causing any issue especially as the home folder of user elastic is not accessible to users without sudo, but for the sake of it, it might be good to add owner
and group
with value elastic and possibly use something more restrictive than mode: 0755
in ece-bootstrap/main.yml
RHEL 8.3 kernel 4.18.0-240.22.1.el8_3.x86_64
Ansible role adds fs.may_detach_mounts=1 to /etc/sysctl.conf
The following error is seen when executing sysctl -p:
sysctl: cannot stat /proc/sys/fs/may_detach_mounts: No such file or directory
Various Google searches indicate the version of runc included with the version of containerd may be related.
There are legit use-cases where a user might not want to rely on the authorized_keys
files.
During #15 a change was implemented, which allows to look at two different locations for a file as well as to provide a custom path. However, the Ansible task is failing if neither of those values are available.
Would it be possible to make those parameters optional and to not fail if they are unavailable, as requested in the above issue Or at least not fail if no keys are defined ?
?
The current workaround is to provide mock key files which is not ideal.
Thank you.
This is more of a question.. Why are you not setting allocator memory here?
You set a variable here:
Hi team,
The ansible script is failing at the following task with the message
fatal: [jon-ece-test1-1]: FAILED! => {
"changed": false,
"err": " Can't open /dev/sda2 exclusively. Mounted filesystem?\n Can't open /dev/sda2 exclusively. Mounted filesystem?\n",
"invocation": {
"module_args": {
"force": true,
"pesize": "4",
"pv_options": "",
"pvs": [
"/dev/sda2"
],
"state": "present",
"vg": "lxc",
"vg_options": ""
}
},
"msg": "Creating physical volume '/dev/sda2' failed",
"rc": 5
}
[root@jon-ece-test1-1 dev]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 15G 0 15G 0% /dev
tmpfs 15G 0 15G 0% /dev/shm
tmpfs 15G 8.5M 15G 1% /run
tmpfs 15G 0 15G 0% /sys/fs/cgroup
/dev/sda2 100G 2.4G 98G 3% /
/dev/sda1 200M 12M 189M 6% /boot/efi
tmpfs 3.0G 0 3.0G 0% /run/user/1019
tmpfs 3.0G 0 3.0G 0% /run/user/0
#Ansible Config file
- hosts: primary
gather_facts: true
roles:
- ansible-elastic-cloud-enterprise
vars:
ece_primary: true
device_name: sda2
availability_zone: asia-southeast1-c
As spoken with @vaubarth, this is an issue due to the the Ansible script that requires a separate mount path besides the root partition for the installation. I think the error message isn't clear on the steps to take next.
Best Regards,
Jonathan Lim
As of today the upgrade script has a limited hard coded amount of parameters:
https://github.com/elastic/ansible-elastic-cloud-enterprise/blob/master/tasks/ece-bootstrap/upgrade.yml#L3
It would be nice if the parameters could be customized in order to fulfill additional parameters that are documented here:
https://www.elastic.co/guide/en/cloud-enterprise/current/ece-installation-script-upgrade.html
Possibly a parameter can be introduce which later on can be used for customization.
The runner ID, if not set during the installation process using --runner-id
defaults to host-ip
. Currently this setting can't be used in the ECE playbook.
It would be very good if this setting could be specified by the user when installing ECE.
We need to use allocator tags in our setup.
I have created a PR that adds that: #140
It's a very simplistic solution (adding a variable for allocator tags and using the existence of the variable to control the addition of a parameter for the install script) but it works for us.
We've recently seen several occurrences of ECE users that had issue with containers getting stuck in "Created" state and traced it down to this upstream issue opencontainers/runc#2871.
For now we should pin the containerd version to <=1.4.3-1
to avoid that the affected runc
gets used.
As far as I can see, this would affect at least:
Not sure about the others.
- name: Add user elastic
user:
name: elastic
uid: 1001
group: elastic
groups: docker
append: yes
state: present
generate_ssh_key: true
when: getent_passwd["elastic"] == none
Should uid
be auto-generated if the number 1001 is already in use?
I believe this should be "2.5.1". Please advise.
Error /etc/cloud/cloud.cfg.d/
no file or directory
Files affected:
tasks/base/general/make_user.yml
tasks/base/main.yml
Potential solution, add task to ensure /etc/cloud/cloud.cfg.d/
directory is present, not certain on which file to add it too though, most like guess tasks/base/main.yml
.
- name: ansible create directory
file:
path: /etc/cloud/cloud.cfg.d/
state: directory
My shorter temporary workaround change /etc/cloud/cloud.cfg.d/
to /tmp/
is both the files mentioned above.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.