estevaobk / 3scaledump Goto Github PK
View Code? Open in Web Editor NEWUnofficial tool for dumping a Red Hat 3scale On-premises project
Unofficial tool for dumping a Red Hat 3scale On-premises project
This feature is especially useful for the system-app pod.
The Red Hat 3scale logs should be split into different text files per container on each pod (e.g. 'system-master', 'system-provider' and 'system-developer') instead of all included in a single log file for the whole pod.
Many thanks to Anna for providing this suggestion.
It would be useful if the command allowed for a case number to be specified and the output file was prefixed with this case number.
The docs should be updated to reflect this change as the default.
Find any information related to the project quotas and output to /status/quotas.yaml
Following below a description on why they could be useful:
# oc logs --help | grep -i "\-p,"
-p, --previous=false: If true, print the logs for the previous instance of the container in a pod if it exists.
Allow us to fetch the environment variables more easily. It could be the case of storing it on dc/env
.
This is a follow-up from #19. Besides fetching them as yaml
, also describe the project quotas.
This will allow us to further investigate whether the dump had any impact on the customer's environment.
Sample command:
# oc get pod --all-namespaces
The following documentation mentions the commands oc adm top nodes
and oc adm top node --selector=''
:
https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#viewing-nodes
However, these don't seem to work (tested with the latest 3.11 release).
We could optionally do the following:
Execute oc get node
and then for every node returned by this command, oc describe node <NODE>
.
This is similar to what is already being done in node.txt
, but will be applied to all nodes instead of only the currently running one.
This command could be helpful to further troubleshoot issues:
# oc status --help
Show a high level overview of the current project
This command will show services, deployment configs, build configurations, and active deployments. If you have any
misconfigured components information about them will be shown. For more information about individual items, use the
describe command (e.g. oc describe buildConfig, oc describe deploymentConfig, oc describe service).
You can specify an output format of "-o dot" to have this command output the generated status graph in DOT format that
is suitable for use by the "dot" command.
Usage:
oc status [-o dot | --suggest ] [flags]
Examples:
# See an overview of the current project.
oc status
# Export the overview of the current project in an svg file.
oc status -o dot | dot -T svg -o project.svg
# See an overview of the current project including details for any identified issues.
oc status --suggest
Options:
--all-namespaces=false: If true, display status for all namespaces (must have cluster admin)
-o, --output='': Output format. One of: dot.
--suggest=false: See details for resolving issues.
Use "oc options" for a list of global command-line options (applies to all commands).
Sample output:
# oc status --suggest
In project 3scale-26 on server https://master.ocp3-11-26.cluster:8443
https://api-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway (svc/apicast-production)
https://oidc-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
https://api-using-port-8443-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
dc/apicast-production deploys istag/amp-apicast:latest
deployment #2 deployed 13 days ago - 1 pod
deployment #1 deployed 3 weeks ago
https://api-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway (svc/apicast-staging)
https://oidc-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
https://api-using-port-8443-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
dc/apicast-staging deploys istag/amp-apicast:latest
deployment #3 deployed 4 days ago - 1 pod
deployment #2 deployed 13 days ago
deployment #1 deployed 3 weeks ago
https://backend-3scale.3scale-26.apps.ocp3-11-26.cluster (and http) to pod port http (svc/backend-listener)
dc/backend-listener deploys istag/amp-backend:latest
deployment #1 deployed 3 weeks ago - 1 pod
svc/backend-redis - 172.30.146.199:6379
dc/backend-redis deploys istag/backend-redis:latest
deployment #3 deployed 3 weeks ago - 1 pod
deployment #2 deployed 3 weeks ago
deployment #1 failed 3 weeks ago: newer deployment was found running
https://master.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-master)
https://3scale-admin.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-provider)
https://3scale.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-developer)
dc/system-app deploys istag/amp-system:latest
deployment #4 deployed 11 days ago - 1 pod
deployment #3 deployed 11 days ago
deployment #2 deployed 3 weeks ago
svc/system-memcache - 172.30.199.113:11211
dc/system-memcache deploys istag/system-memcached:latest
deployment #1 deployed 3 weeks ago - 1 pod
svc/system-mysql - 172.30.224.83:3306
dc/system-mysql deploys istag/system-mysql:latest
deployment #4 deployed 3 weeks ago - 0 pods
deployment #3 deployed 3 weeks ago
deployment #2 deployed 3 weeks ago
svc/system-redis - 172.30.43.97:6379
dc/system-redis deploys istag/system-redis:latest
deployment #3 deployed 3 weeks ago - 1 pod
deployment #2 deployed 3 weeks ago
deployment #1 failed 3 weeks ago: newer deployment was found running
svc/system-sphinx - 172.30.249.37:9306
dc/system-sphinx deploys istag/amp-system:latest
deployment #1 deployed 3 weeks ago - 1 pod
svc/zync - 172.30.240.127:8080
dc/zync deploys istag/amp-zync:latest
deployment #1 deployed 3 weeks ago - 1 pod
svc/zync-database - 172.30.208.182:5432
dc/zync-database deploys istag/zync-database-postgresql:latest
deployment #1 deployed 3 weeks ago - 1 pod
dc/backend-cron deploys istag/amp-backend:latest
deployment #1 deployed 3 weeks ago - 1 pod
dc/backend-worker deploys istag/amp-backend:latest
deployment #1 deployed 3 weeks ago - 1 pod
dc/system-sidekiq deploys istag/amp-system:latest
deployment #1 deployed 3 weeks ago - 1 pod
dc/zync-que deploys istag/amp-zync:latest
deployment #5 deployed 3 weeks ago - 1 pod
deployment #4 deployed 3 weeks ago
deployment #3 deployed 3 weeks ago
Warnings:
* pod/apicast-production-2-wfbqn has restarted 9 times
* pod/backend-cron-1-g5q2z has restarted 9 times
* pod/backend-listener-1-f6zvk has restarted 12 times
* pod/backend-redis-3-v2ctm has restarted 12 times
* pod/backend-worker-1-lzpx9 has restarted 12 times
* container "system-developer" in pod/system-app-4-trg29 has restarted 9 times
* container "system-master" in pod/system-app-4-trg29 has restarted 9 times
* container "system-provider" in pod/system-app-4-trg29 has restarted 9 times
* pod/system-memcache-1-f4t6t has restarted 12 times
* pod/system-redis-3-hcqkn has restarted 12 times
* pod/system-sidekiq-1-m67nq has restarted 9 times
* pod/system-sphinx-1-wnc2q has restarted 9 times
* pod/zync-1-p46cc has restarted 9 times
* pod/zync-database-1-7cr6k has restarted 12 times
* pod/zync-que-5-8pfqd has restarted 15 times
Info:
* dc/backend-cron has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
try: oc set probe dc/backend-cron --readiness ...
* dc/backend-cron has no liveness probe to verify pods are still running.
try: oc set probe dc/backend-cron --liveness ...
* dc/backend-worker has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
try: oc set probe dc/backend-worker --readiness ...
* dc/backend-worker has no liveness probe to verify pods are still running.
try: oc set probe dc/backend-worker --liveness ...
* dc/system-sidekiq has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
try: oc set probe dc/system-sidekiq --readiness ...
* dc/system-sidekiq has no liveness probe to verify pods are still running.
try: oc set probe dc/system-sidekiq --liveness ...
* dc/system-sphinx has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
try: oc set probe dc/system-sphinx --readiness ...
* dc/zync-que has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
try: oc set probe dc/zync-que --readiness ...
View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.
Use y
before the commands. Thanks to Shannon for providing this suggestion.
The commits from Sep 24, 2019 were just a quick hack to not perform many changes to the current stable release. This should be better implemented.
Error: unknown flag: --all-containers
The certificate validation doesn't work starting from 2.6 On-premises because the apicast-wildcard-router
pod has been deprecated and it was the single pod to include the openssl
utility (no other pods include it now, not even zync
or zync-que
).
Executing this check from OpenShift itself might provide a false positive, since the self-signed certificates might be included in the OpenShift's default keystore.
Hence, a suggestion is to use something similar to the example below in order to allow a regular OpenShift command to be performed from inside a pod:
# nsenter -n -t $(docker inspect --format '{{.State.Pid}}' $(docker ps -f label=io.kubernetes.pod.name=${POD_NAME} -f label=io.kubernetes.pod.namespace=${PROJECT_NAME} -f label=io.kubernetes.docker.type=podsandbox -q)) <command-to-run>
This query has been used while troubleshooting some issues.
The following line of code:
echo -e "\n\tAPICAST_POD_PRD: ${APICAST_POD_PRD}\n\tAPICAST_POD_STG: ${APICAST_POD_STG}\n\tMGMT_API_PRD: ${MGMT_API_PRD}\n\tMGMT_API_STG: ${MGMT_API_STG}\n\tAPICAST_ROUTE_PRD: ${APICAST_ROUTE_PRD}\n\tAPICAST_ROUTE_STG: ${APICAST_ROUTE_STG}\n\tWILDCARD POD: ${WILDCARD_POD}\n\tTHREESCALE_PORTAL_ENDPOINT: ${THREESCALE_PORTAL_ENDPOINT}\n\tSYSTEM_APP_POD: ${SYSTEM_APP_POD}
Could be better viewed later on if output to a file as well.
Provide the possibility to pass a parameter in order to decide which of the components (Logs, Secrets, Routes, PVs, ...) should be included, here's an idea:
chmod style
:
Let's say we have 7 components: A, B, C, D, E, F, G. If we want to fetch all of them we could represent it this way:
A B C D E F
1 1 1 1 1 1 1
that interpreted as a binary number and converted to decimal would be: 127.
We could then use the parameter 127 to say we want all the components. Every component except for B would become:
A B C D E F
1 0 1 1 1 1 1 --> 95, etc.
The parameter should default to 127 (all)
For example, status/node.txt may contain the following value:
Error from server (Forbidden): nodes is forbidden: User "XXXX" cannot list nodes at the cluster scope: no RBAC policy matched
If the user XXXX does not have the right permission to access the requested details.
It would probably be enough to mention in the README that you should be logged as system:admin prior to executing the script.
Optionally we could also log something to the command line as a reminder.
Currently it's not added. It will be useful to print on the errors.
It seems safer to just fail in case OCP 4.X is not being used.
Shannon's request:
Can we change the dump to use different names for the production/staging json configurations? apicast-production.json/apicast-staging.json is conflating the terminology and confusing. They should just be called production.json/staging.json
It seems like the issue #9 was initially right - the 'deploy' pods are being ruled out and not the ones in an 'error' state.
This issue needs a minor follow-up fix.
This could be performed by mid Oct, 2019.
After the above is performed, it will be possible to work on the other issues.
This should be useful for OCP 4.X
When executing the script on 3scale 2.6 projects the zync-que dc is missing.
Sample:
# oc get scc
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES
anyuid false [] MustRunAs RunAsAny RunAsAny RunAsAny 10 false [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
hostaccess false [] MustRunAs MustRunAsRange MustRunAs RunAsAny <none> false [configMap downwardAPI emptyDir hostPath persistentVolumeClaim projected secret]
hostmount-anyuid false [] MustRunAs RunAsAny RunAsAny RunAsAny <none> false [configMap downwardAPI emptyDir hostPath nfs persistentVolumeClaim projected secret]
hostnetwork false [] MustRunAs MustRunAsRange MustRunAs MustRunAs <none> false [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
kube-state-metrics false [] RunAsAny RunAsAny RunAsAny RunAsAny <none> false [*]
node-exporter false [] RunAsAny RunAsAny RunAsAny RunAsAny <none> false [*]
nonroot false [] MustRunAs MustRunAsNonRoot RunAsAny RunAsAny <none> false [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
privileged true [*] RunAsAny RunAsAny RunAsAny RunAsAny <none> false [*]
restricted false [] MustRunAs MustRunAsRange MustRunAs RunAsAny <none> false [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
This is a small workaround while an official log file isn't generated yet (next stable release).
Currently it's only officially tested and supported under OCP 3.X.
oc get events --all-namespaces
bash-3.2$ date +"%Y-%m-%d_%H-%M" -u
date: illegal time format
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
please flip the option order
bash-3.2$ date -u +"%Y-%m-%d_%H-%M"
2020-03-04_04-37
Shannon complained, hence it needs to be addressed.
# oc get hostsubnet -o yaml
apiVersion: v1
items:
- apiVersion: network.openshift.io/v1
host: master.ocp3-11-26.cluster
hostIP: 10.0.1.6
kind: HostSubnet
metadata:
annotations:
pod.network.openshift.io/node-uid: c0687f17-d67e-11e9-a0fe-5254001aeab5
creationTimestamp: 2019-09-13T23:32:33Z
name: master.ocp3-11-26.cluster
namespace: ""
resourceVersion: "476"
selfLink: /apis/network.openshift.io/v1/hostsubnets/master.ocp3-11-26.cluster
uid: c2feff4a-d67e-11e9-a0fe-5254001aeab5
subnet: 10.128.0.0/23
kind: List
metadata:
resourceVersion: ""
selfLink: ""
---
# oc describe hostsubnet
Name: master.ocp3-11-26.cluster
Created: 4 weeks ago
Labels: <none>
Annotations: pod.network.openshift.io/node-uid=c0687f17-d67e-11e9-a0fe-5254001aeab5
Node: master.ocp3-11-26.cluster
Node IP: 10.0.1.6
Pod Subnet: 10.128.0.0/23
Egress CIDRs: <none>
Egress IPs: <none>
The file 'errors.txt' could be used when there are any errors while executing a given command. This would be useful to troubleshoot e.g. permission level errors.
Include them in a "logs/error" directory
These were ruled out initially for some specific pods e.g. 'deploy' or 'pre', but it has been realized that they might be actually very useful.
These limits are already configured on the DeploymentConfig objects, however it will be much easier and quicker to view them in a single file containing already all the information in order to troubleshoot more easily "out of resource" issues.
Some customers have a large list of PVs that correspond to other projects and the script takes a long time to execute. The request is to limit the PV's information that the script fetches corresponding only to 3scale.
Instead of 3scale-dump
, it would be e.g. 3scale-dump-YYYY-MM-DD_HH-MM
This would allow a Support Engineer to have a directory for troubleshooting a Case and inside this Directory several potential dumps for different dates.
Right now everything is being sent to a file manually, e.g.:
PROJECT=<3scale Project> ; curl -s https://access.redhat.com/sites/default/files/attachments/3scale-dump-<VERSION>.sh | bash -s "$PROJECT" "auto" 2>&1 | tee 3scale-dump-logs.txt
In the next stable release, an official log file needs to be generated. Using tee
would allow to print the output from the whole dumo to both the stdout/stderr
and this file.
This is a follow-up from #25. Some previous containers just outputs a message similar to the following:
Error from server (BadRequest): previous terminated container "backend-cron" in pod "backend-
cron-1-mdgzf" not found
It should be simple enough to perform a string match and not include those files in the logs/previous
directory.
Sometimes I need the event timestamps. This isn't included in the events.txt file. It seems the only way to get this is via -o yaml
Capture more information regarding each pod.
Example:
$ curl -X GET -v -k <THREESCALE_PORTAL_ENDPOINT>/staging.json
$ curl -X GET -v -k <THREESCALE_PORTAL_ENDPOINT>/production.json
This could help on further troubleshooting some particular issues (e.g. the reply is empty for some reason).
Add the ability to test against an RH SSO Endpoint if an optional argument is provided, e.g:
rhsso=https://my-sso-server
Default checks to be performed includes the 'well-known' Endpoint and the infamous certificate validation.
Add this feature since it looks like some TSE's have been asking on a few Cases.
Sample command:
$ oc get replicationcontroller -o yaml
This has been suggested today by Shannon.
Something similar to the following should work:
timeout 180 oc rsh -c system-master ${SYSTEM_APP_POD} /bin/bash -c "echo -e 'stats = Sidekiq::Stats.new\nAccessToken.all' | bundle exec rails console" > ${DUMP_DIR}/status/rails.txt 2>&1 < /dev/null
Sample query:
# oc get nodes -o wide --show-kind --show-labels
This is a follow-up from #13. Instead of adding it to the end, use in the beginning and with a confirmation yes/no
message.
Suggestion from Samuele:
"WARNING: you are running the script without the required privileges to obtain a full project dump, you should run the script as administrator, would you like to proceed anyway? y/N"
When OCP is 4.X.
This seems to be the last data missing comparing to the default OpenShift dump.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.