princeton-cdh / cdh-ansible Goto Github PK
View Code? Open in Web Editor NEWCDH Ansible playbook repository
License: Apache License 2.0
CDH Ansible playbook repository
License: Apache License 2.0
Describe the solution you'd like
Additional context
Molecule requires current versions of ansible (>=2.8
) so we will need to update. Currently, ansible is pinned <=2.7
to deal with a setuptools issue; maybe we can check if this bug still exists. Note that ansible's most recent stable is 2.10
, but it was split into two modules (ansible
and ansible-base
) after 2.9
so that will require some changes as well. Current plan is to wait until we have molecule tests in place before upgrading ansible to 2.10
.
Checklist
2.8
applies to local settings templates in django role; check for others
Is your feature request related to a problem? Please describe.
The build_project_repo
role currently does a full git checkout of the project's repository each time. We don't need the full git history on the target machine; only the most recent state is necessary.
Describe the solution you'd like
A shallow clone will allow us to get the files from github without their git history. It should be sufficient to pass depth:1
to the git
module in ansible.
The relevant lines are here:
cdh-ansible/roles/build_project_repo/tasks/main.yml
Lines 29 to 37 in 80891c6
We need a system to document major decisions/changes to the way we use ansible. One option is to use the ADR (architectural decision record) spec, like PUL does in princeton-ansible
.
Discussion:
Would you add a note about dropping these in a changelog? (Or do we even have a changelog? Maybe we don't, because no releases...) Would it make sense to start?
Originally posted by @rlskoeser in #49 (comment)
Describe the bug
When deploying apps on PUL infrastructure, the beginning of the build_project_repo
role takes significantly longer than other tasks, and reports changed:
each time it is run. The slow part is this step, where we recursively set permissions on the deploy directory to prevent permissions errors when cloning:
cdh-ansible/roles/build_project_repo/tasks/main.yml
Lines 18 to 27 in 9a3839c
this is a large recursive operation, so it could potentially take a long time, but it appears to be running (and changing permissions) every time, which doesn't seem right.
Expected behavior
The permissions should be set correctly once, and then ansible should detect that there's no need to make a change and report skipped:
, making the step quicker.
apache2 clean up can be done in cdh-ansible once this task is done:
there's a restart handler in the supervisor role, but it doesn't seem like it's being triggered
Advice from @acozine about options for investigation:
I can think of 3 things to check right away. 1. The handler has a condition on it of when: supervisor_started - maybe add a debug task to check that? If the service hasn't been started, the handler won't run. 2. Most of the tasks that call it do something like "make sure X is present / exists", so if the file already exists, Ansible doesn't change anything and the handler does not get called. Try manually removing one of those files and see if the handler runs then. 3. IIRC handler names must be globally unique. Grep through your roles to see if there's another handler somewhere else with the name restart supervisor.
Refer to https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_handlers.html
Other possibilities:
... if you have other tasks/roles later that are failing, it could be that Ansible has the handler loaded up to run but something else fails before it gets around to actually running it.
you could try addingmeta: flush_handlers
(as a task) and see if that fixes your issue
Everyone occasionally forgets to log into access test/vpn sites, and it would be helpful to have a different visual reminder to distinguish those sites from library sites.
Instructions from Francis:
at the bottom of every config generally is
server {
listen 443 ssl;
server_name cdh-geniza.princeton.edu;
...
include /etc/nginx/conf.d/templates/prod-maintenance.conf;
}
and the contents of prod-maintenance
error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 421 422 423 424 425 426 428 429 431 451 500 501 502 503 504 505 506 507 508 510 511 = @maintenance;
location @maintenance {
root /var/local/www/default;
try_files $uri /index.html =502;
}
And the html file is the error page displayed.
checks at the end of create_deployment
and close_deployment
are failing when staging
is not defined:
https://github.com/Princeton-CDH/CDH_ansible/blob/8b971961a366bf9a24a54bd01417e734e3305665/roles/create_deployment/tasks/main.yml#L39
ansible docs on when
statement
see slack link below for more
flagged by Max on PUL slack for cdh_ppa but may be relevant to other applications using Solr
I'm not sure where your Solr configurations live, but a bunch of ours live here - https://github.com/pulibrary/pul_solr/tree/main/solr_configs. The Catalog filterCache setting lives here - https://github.com/pulibrary/pul_solr/blob/05d87860a7837c62135dedc949ed608644666cd3/solr_configs/catalog-production-v2/conf/solrconfig.xml#LL1[…]C38
There's more documentation here - https://solr.apache.org/guide/8_3/query-settings-in-solrconfig.html
My very basic understanding is that you want to dial in your cache settings so that you don't get a ton of evictions, and you get a fairly high hit ratio, without over-taxing your machine's memory. Right now, what's in Datadog as cdh_ppa has very very high filter cache evictions, and very low filter cache hit ratio, so you might want to give a higher value to your filterCache in your conf/solrconfig.xml
To reproduce
Steps to reproduce the behavior:
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user configurable on deprecation. This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
Additional context
Output from running with -vvvv
option:
Not replacing invalid character(s) "{'-'}" in group name (cdh-web_qa)
Not replacing invalid character(s) "{'-'}" in group name (cdh-web_prod)
Not replacing invalid character(s) "{'-'}" in group name (cdh-web_prod)
Not replacing invalid character(s) "{'-'}" in group name (cdh-web_staging)
Not replacing invalid character(s) "{'-'}" in group name (cdh-web_staging)
Not replacing invalid character(s) "{'-'}" in group name (cdh-web)
Not replacing invalid character(s) "{'-'}" in group name (cdh-web)
once we start storing stuff like github action templates in here we might want to rename the repo to be more all-encompassing.
We occasionally still get errors with CAS versions. Update CAS settings to stop this.
see for example https://github.com/pulibrary/princeton_ansible/blob/540e2136fdec4becd1336dd1c68ffbf41296de95/playbooks/approvals_production.yml#L12-L21
unfortunately it looks like there's no modules published on ansible galaxy to handle this (we might consider publishing one, actually, if we get it working well). they do have ones for creating/updating github releases, but not deployments.
some digging reveals that pre_tasks
and post_tasks
are not special at all - they just run arbitrary tasks at the specified time, and will not be run if the playbooks fail. i think the way to approach this is to:
github
role (along with cloning)include
to pull the tasks from this role into each playbook as pre_tasks
and post_tasks
always
tag so they run regardless of playbook failuregithub_deploy
(or similar) to the tasks so that we can easily turn off github deploys when running the playbook with --skip-tag=github_deploy
or similar. see this article for more info on this approach.Is your feature request related to a problem? Please describe.
Our typical deploy user deploy
is currently being used by pulibrary. If we would like to migrate to PUL infrastructure, we need to begin using the deploy user conan
. Ideally, this process would be automated.
Describe the solution you'd like
We should create an ansible role that automatically creates the deploy user conan for all our machines.
we're creating lockfiles for most of our apps, so we should use them to install predictable versions of dependencies in production.
media_root: '/srv/www/media/'
setting as neededIs your feature request related to a problem? Please describe.
Compare #63; this situation is similar: functionality related to javascript/nodejs is spread across multiple roles.
build_npm
build_semantic
run_webpack
Describe the solution you'd like
Implementation could be fairly similar to the django
and python
roles, with tasks to:
package.json
or package-lock.json
filenpm
scripts, such as those that compile static filesAdditional context
This issue could resolve existing issue #9 related to ansible's npm
module; perhaps we now have a newer version or can find a smarter way to use the module.
Is your feature request related to a problem? Please describe.
Currently, functionality related to deploying Django apps is spread across several roles:
configure_logging
configure_media
django_collectstatic
django_compressor
django_migrate
install_local_settings
Describe the solution you'd like
Ideally, these could all be tasks that are part of a single Django-specific role, which could have shared defaults and dependencies on other roles. New Django-related tasks (for example, Princeton-CDH/geniza#117) can also become part of this role.
Additional context
django_manage
plugin, from community.general
ansible-role-django
on githubFor python apps, we get the python_app_version
from the current checkout of the code and use it to determine the path for the deployed version. When we run playbooks without running the full sequence, e.g. using the --start-at-task
option, this variable doesn't get set, and then any task that needs to know the path to the deployed version of the project fails.
The task is currently part of the build_project_repo
role: https://github.com/Princeton-CDH/cdh-ansible/blob/main/roles/build_project_repo/tasks/main.yml#L43-L54
On a new deploy, it can't be run until the new version of the code has been checked out; but on an existing deploy it could be run anytime.
Create a new Solr role that works similarly to the Solr update logic PUL uses in capistrano, and write molecule tests for the new role.
conan
user on lib-solr-staging1
Is your feature request related to a problem? Please describe.
Currently, the deploy scripts leave all past deploys sitting in the directory.
Describe the solution you'd like
Old deploys should automatically be cleaned up when the deploy finishes. Maybe keep the last 3 based on date (most recent)? Should always preserve current + previous.
See for one possible solution https://www.future500.nl/articles/2014/07/thoughts-on-deploying-with-ansible/
Describe alternatives you've considered
Could be cleaned up manually or by a cron job, but seems better to make it part of the deploy.
looks like the config error is a misuse/misunderstanding of the root
configuration:
# Tell Nginx and Passenger where your app's 'public' directory is
root /path-to-app/public;
We're configuring it to use the app root, but we should point it at static root instead to avoid this behavior.
sudo systemctl stop nginx
pg_dump --format custom --clean --no-owner --no-privileges cdh_geniza > 2023_10_25_cdh_geniza.dump
sudo mv /var/lib/postgresql/2023_10_25_cdh_geniza.dump .
scp 2023_10_25_cdh_geniza.dump lib-postgres-prod1:~
sudo mv 2023_10_25_cdh_geniza.dump /var/lib/postgresql
pg_restore -h 127.0.0.1 -U cdh_geniza --dbname cdh_geniza --no-owner --no-privileges 2023_10_25_cdh_geniza.dump
sudo systemctl start nginx
details from @kayiwa :
Ubuntu bionic VM goes out of support in June
The process has been: Set up new Operating System with Jammy Jellyfish.
Try to deploy. See what breaks, fix with a PR that accommodates the existence of both Jammy and Bionic
We'll want to test the upgrade in staging first. Francis says he'll do the dependencies work on our PRs. Once everything is working in staging we can schedule an upgrade for production.
projects needing upgrades:
We're getting this error:
pulsys@cdh-test-prosody1:~$ sudo tail /var/log/apache2/ppa_error.log
[Tue Nov 29 18:00:55.640359 2022] [wsgi:error] [pid 4599:tid 140169124407040] [remote 128.112.203.144:36628] Traceback (most recent call last):
[Tue Nov 29 18:00:55.640404 2022] [wsgi:error] [pid 4599:tid 140169124407040] [remote 128.112.203.144:36628] File "/var/www/ppa/ppa/wsgi.py", line 12, in <module>
[Tue Nov 29 18:00:55.640434 2022] [wsgi:error] [pid 4599:tid 140169124407040] [remote 128.112.203.144:36628] from django.core.wsgi import get_wsgi_application
[Tue Nov 29 18:00:55.640466 2022] [wsgi:error] [pid 4599:tid 140169124407040] [remote 128.112.203.144:36628] ModuleNotFoundError: No module named 'django'
[Tue Nov 29 18:00:56.046126 2022] [wsgi:error] [pid 4599:tid 140169132799744] [remote 128.112.203.145:38478] mod_wsgi (pid=4599): Target WSGI script '/var/www/ppa/ppa/wsgi.py' cannot be loaded as Python module.
[Tue Nov 29 18:00:56.046239 2022] [wsgi:error] [pid 4599:tid 140169132799744] [remote 128.112.203.145:38478] mod_wsgi (pid=4599): Exception occurred processing WSGI script '/var/www/ppa/ppa/wsgi.py'.
[Tue Nov 29 18:00:56.046376 2022] [wsgi:error] [pid 4599:tid 140169132799744] [remote 128.112.203.145:38478] Traceback (most recent call last):
[Tue Nov 29 18:00:56.046424 2022] [wsgi:error] [pid 4599:tid 140169132799744] [remote 128.112.203.145:38478] File "/var/www/ppa/ppa/wsgi.py", line 12, in <module>
[Tue Nov 29 18:00:56.046434 2022] [wsgi:error] [pid 4599:tid 140169132799744] [remote 128.112.203.145:38478] from django.core.wsgi import get_wsgi_application
[Tue Nov 29 18:00:56.046465 2022] [wsgi:error] [pid 4599:tid 140169132799744] [remote 128.112.203.145:38478] ModuleNotFoundError: No module named 'django'
It's possible that the python version is harder to switch when we're on apache vs nginx. we have to have the correct version of mod wsgi installed. @rlskoeser will investigate.
PR: #129
PPA builds will continue to look weird until we do this, since they aren't building semantic ui now by default. The task just needs to run npm run build:semantic
, in the case of PPA.
per @bwhicks we might want to integrate this into a general 'build' task, or something that aggregates the relevant npm...
commands.
this task wouldn't run in the main sequence for the role, but it should be runnable on demand for apps that use it (only relevant one is cdhweb).
setup
Should theoretically be as simple as registering the output of the create_db
task to check if it was indeed created, and then conditionally running the backup_db
task.
ansible snap builtin is supposed to run refresh instead of install when a package is already installed, but it failed to do the upgrade
Recommendation from @kayiwa is to use PUL's nodejs role that builds it from source based on a specified version; see the nodejs download to see the syntax needed for version numbers.
after migration to PUL infrastructure is complete
Is your feature request related to a problem? Please describe.
Stuck on python 3.6 because of ubuntu's package manager.
Describe the solution you'd like
build_dependencies
role should enable access to an external apt repo (deadsnakes
maybe) and then use it to install a version of python specified by python_version
. this should become the default python.
Error when executing a ansible-playbook pemm.yml
, need to instead execute ansible-playbook pemm.yml -e '{"deploy_contexts": []}'
All the messy logs:
TASK [create_deployment : Create a deployment] *********************************
fatal: [cdh-pemm1.princeton.edu]: FAILED! => changed=false
access_control_allow_origin: '*'
access_control_expose_headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset
connection: close
content: '{"message":"Conflict: Commit status checks failed for master.","errors":[{"contexts":[{"context":"Travis CI - Branch","state":"success"}],"resource":"Deployment","field":"required_contexts","code":"invalid"}],"documentation_url":"https://developer.github.com/v3/repos/deployments/#create-a-deployment"}'
content_length: '302'
content_security_policy: default-src 'none'
content_type: application/json; charset=utf-8
date: Fri, 05 Jun 2020 14:50:23 GMT
json:
documentation_url: https://developer.github.com/v3/repos/deployments/#create-a-deployment
errors:
- code: invalid
contexts:
- context: Travis CI - Branch
state: success
field: required_contexts
resource: Deployment
message: 'Conflict: Commit status checks failed for master.'
msg: 'Status code was 409 and not [201]: HTTP Error 409: Conflict'
redirected: false
referrer_policy: origin-when-cross-origin, strict-origin-when-cross-origin
server: GitHub.com
status: 409
strict_transport_security: max-age=31536000; includeSubdomains; preload
url: https://api.github.com/repos/Princeton-CDH/pemm-scripts/deployments
vary: Accept-Encoding, Accept, X-Requested-With
x_accepted_oauth_scopes: ''
x_content_type_options: nosniff
x_frame_options: deny
x_github_media_type: github.v3; format=json
x_github_request_id: BC8E:560E:668A2:ACFF7:5EDA5BAF
x_oauth_scopes: repo_deployment
x_ratelimit_limit: '5000'
x_ratelimit_remaining: '4991'
x_ratelimit_reset: '1591368768'
x_xss_protection: 1; mode=block
to retry, use: --limit @/Users/kmcelwee/cdh/CDH_ansible/pemm.retry
PLAY RECAP *********************************************************************
cdh-pemm1.princeton.edu : ok=1 changed=0 unreachable=0 failed=1
The npm
module for Ansible does not note build failures and bubble them up to Ansible. This may need to be replaced with a call to the shell
module.
Is your feature request related to a problem? Please describe.
When we want to test against recent production data, we have to manually download sql dumps and media files, copy them, load them into the database or file system, set permissions, update django/wagtail site in the database. Because it's error prone and because we tend to be lazy, we don't always do always bother with refreshing data when we probably should.
Describe the solution you'd like
A script or playbook I can run that does all of this for me. For QA at least, but for local dev would be super.
general steps needed:
I think we should be able to replace the get_ver.py
script with a one-line python command that can be run directly from the ansible script. I think what we want to generate should look roughly like this:
python -c 'import ppa; print(ppa.__version__)'
It will need configuration for the python package name to import and the working directory.
Please try updating ansible to this python version and running the QA deploy to confirm that everything works there. The python version is specified in group_vars/prosody/vars.yml
— changing that should be all that's needed, since the ansible playbook installs and configures the version we have set.
Originally posted by @rlskoeser in Princeton-CDH/ppa-django#495 (comment)
we need to use the approach francis recommended of installing postgres local to the target container, rather than testing it between two containers.
The following work needs to happen to make rotating logs happen on all projects:
cdh
and derridas-margins
cdh
and derridas-margins
playbooksIs your feature request related to a problem? Please describe.
current pre-commit hook script is custom and will be hard to manage if we add more encrypted files with different names
Describe the solution you'd like
suggest implementing with pre-commit, either using repository local hooks or with an existing pre-commit script like this one: https://github.com/IamTheFij/ansible-pre-commit
Then the files that need to be checked would be configured in the pre-commit yaml instead being hard-coded in the script.
since potentially leaked in lastpass breach
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.