estevaobk / 3scaledump Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 6.0 82 KB

Unofficial tool for dumping a Red Hat 3scale On-premises project

Shell 100.00%

3scaledump's People

Contributors

Stargazers

Watchers

Forkers

jessesarn leathersole chamalabey palmieric nanyte25 nci-student-pentesters

3scaledump's Issues

Fetch container specific logs

This feature is especially useful for the system-app pod.

The Red Hat 3scale logs should be split into different text files per container on each pod (e.g. 'system-master', 'system-provider' and 'system-developer') instead of all included in a single log file for the whole pod.

Many thanks to Anna for providing this suggestion.

Allow user to specify a case number

It would be useful if the command allowed for a case number to be specified and the output file was prefixed with this case number.

The docs should be updated to reflect this change as the default.

Fetch the project quotas

Find any information related to the project quotas and output to /status/quotas.yaml

Add 'previous' logs fetching

Following below a description on why they could be useful:

# oc logs --help | grep -i "\-p,"
  -p, --previous=false: If true, print the logs for the previous instance of the container in a pod if it exists.

Fetch all the Environment Variables: oc set env dc/<DC> --list

Allow us to fetch the environment variables more easily. It could be the case of storing it on dc/env.

Add 'describe' quotas

This is a follow-up from #19. Besides fetching them as yaml, also describe the project quotas.

Fetch the nodes information from before and after the dump

This will allow us to further investigate whether the dump had any impact on the customer's environment.

Include 'oc get pod all-namespaces'

Sample command:
# oc get pod --all-namespaces

Fetch all the nodes information through 'oc describe node'

The following documentation mentions the commands oc adm top nodes and oc adm top node --selector='':

https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#viewing-nodes

However, these don't seem to work (tested with the latest 3.11 release).

We could optionally do the following:

Execute oc get node and then for every node returned by this command, oc describe node <NODE>.

This is similar to what is already being done in node.txt, but will be applied to all nodes instead of only the currently running one.

Fetch the 'oc status'

This command could be helpful to further troubleshoot issues:

# oc status --help
Show a high level overview of the current project 

This command will show services, deployment configs, build configurations, and active deployments. If you have any
misconfigured components information about them will be shown. For more information about individual items, use the
describe command (e.g. oc describe buildConfig, oc describe deploymentConfig, oc describe service). 

You can specify an output format of "-o dot" to have this command output the generated status graph in DOT format that
is suitable for use by the "dot" command.

Usage:
  oc status [-o dot | --suggest ] [flags]

Examples:
  # See an overview of the current project.
  oc status
  
  # Export the overview of the current project in an svg file.
  oc status -o dot | dot -T svg -o project.svg
  
  # See an overview of the current project including details for any identified issues.
  oc status --suggest

Options:
      --all-namespaces=false: If true, display status for all namespaces (must have cluster admin)
  -o, --output='': Output format. One of: dot.
      --suggest=false: See details for resolving issues.

Use "oc options" for a list of global command-line options (applies to all commands).

Sample output:

# oc status --suggest
In project 3scale-26 on server https://master.ocp3-11-26.cluster:8443

https://api-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway (svc/apicast-production)
https://oidc-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
https://api-using-port-8443-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
  dc/apicast-production deploys istag/amp-apicast:latest 
    deployment #2 deployed 13 days ago - 1 pod
    deployment #1 deployed 3 weeks ago

https://api-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway (svc/apicast-staging)
https://oidc-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
https://api-using-port-8443-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
  dc/apicast-staging deploys istag/amp-apicast:latest 
    deployment #3 deployed 4 days ago - 1 pod
    deployment #2 deployed 13 days ago
    deployment #1 deployed 3 weeks ago

https://backend-3scale.3scale-26.apps.ocp3-11-26.cluster (and http) to pod port http (svc/backend-listener)
  dc/backend-listener deploys istag/amp-backend:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/backend-redis - 172.30.146.199:6379
  dc/backend-redis deploys istag/backend-redis:latest 
    deployment #3 deployed 3 weeks ago - 1 pod
    deployment #2 deployed 3 weeks ago
    deployment #1 failed 3 weeks ago: newer deployment was found running

https://master.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-master)
https://3scale-admin.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-provider)
https://3scale.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-developer)
  dc/system-app deploys istag/amp-system:latest 
    deployment #4 deployed 11 days ago - 1 pod
    deployment #3 deployed 11 days ago
    deployment #2 deployed 3 weeks ago

svc/system-memcache - 172.30.199.113:11211
  dc/system-memcache deploys istag/system-memcached:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/system-mysql - 172.30.224.83:3306
  dc/system-mysql deploys istag/system-mysql:latest 
    deployment #4 deployed 3 weeks ago - 0 pods
    deployment #3 deployed 3 weeks ago
    deployment #2 deployed 3 weeks ago

svc/system-redis - 172.30.43.97:6379
  dc/system-redis deploys istag/system-redis:latest 
    deployment #3 deployed 3 weeks ago - 1 pod
    deployment #2 deployed 3 weeks ago
    deployment #1 failed 3 weeks ago: newer deployment was found running

svc/system-sphinx - 172.30.249.37:9306
  dc/system-sphinx deploys istag/amp-system:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/zync - 172.30.240.127:8080
  dc/zync deploys istag/amp-zync:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/zync-database - 172.30.208.182:5432
  dc/zync-database deploys istag/zync-database-postgresql:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

dc/backend-cron deploys istag/amp-backend:latest 
  deployment #1 deployed 3 weeks ago - 1 pod

dc/backend-worker deploys istag/amp-backend:latest 
  deployment #1 deployed 3 weeks ago - 1 pod

dc/system-sidekiq deploys istag/amp-system:latest 
  deployment #1 deployed 3 weeks ago - 1 pod

dc/zync-que deploys istag/amp-zync:latest 
  deployment #5 deployed 3 weeks ago - 1 pod
  deployment #4 deployed 3 weeks ago
  deployment #3 deployed 3 weeks ago

Warnings:
  * pod/apicast-production-2-wfbqn has restarted 9 times
  * pod/backend-cron-1-g5q2z has restarted 9 times
  * pod/backend-listener-1-f6zvk has restarted 12 times
  * pod/backend-redis-3-v2ctm has restarted 12 times
  * pod/backend-worker-1-lzpx9 has restarted 12 times
  * container "system-developer" in pod/system-app-4-trg29 has restarted 9 times
  * container "system-master" in pod/system-app-4-trg29 has restarted 9 times
  * container "system-provider" in pod/system-app-4-trg29 has restarted 9 times
  * pod/system-memcache-1-f4t6t has restarted 12 times
  * pod/system-redis-3-hcqkn has restarted 12 times
  * pod/system-sidekiq-1-m67nq has restarted 9 times
  * pod/system-sphinx-1-wnc2q has restarted 9 times
  * pod/zync-1-p46cc has restarted 9 times
  * pod/zync-database-1-7cr6k has restarted 12 times
  * pod/zync-que-5-8pfqd has restarted 15 times

Info:
  * dc/backend-cron has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/backend-cron --readiness ...
  * dc/backend-cron has no liveness probe to verify pods are still running.
    try: oc set probe dc/backend-cron --liveness ...
  * dc/backend-worker has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/backend-worker --readiness ...
  * dc/backend-worker has no liveness probe to verify pods are still running.
    try: oc set probe dc/backend-worker --liveness ...
  * dc/system-sidekiq has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/system-sidekiq --readiness ...
  * dc/system-sidekiq has no liveness probe to verify pods are still running.
    try: oc set probe dc/system-sidekiq --liveness ...
  * dc/system-sphinx has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/system-sphinx --readiness ...
  * dc/zync-que has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/zync-que --readiness ...

View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.

Fetch the rails status as yaml

Use y before the commands. Thanks to Shannon for providing this suggestion.

Use a 'sleep' Function and Environment Variable

The commits from Sep 24, 2019 were just a quick hack to not perform many changes to the current stable release. This should be better implemented.

all-containers option of oc logs doesn't work with old oc versions

Error: unknown flag: --all-containers

Use OpenShift instead of 'apicast-wildcard-router' to check for the APIcast Certificates

The certificate validation doesn't work starting from 2.6 On-premises because the apicast-wildcard-router pod has been deprecated and it was the single pod to include the openssl utility (no other pods include it now, not even zync or zync-que).

Executing this check from OpenShift itself might provide a false positive, since the self-signed certificates might be included in the OpenShift's default keystore.

Hence, a suggestion is to use something similar to the example below in order to allow a regular OpenShift command to be performed from inside a pod:

# nsenter -n -t $(docker inspect --format '{{.State.Pid}}' $(docker ps -f label=io.kubernetes.pod.name=${POD_NAME} -f label=io.kubernetes.pod.namespace=${PROJECT_NAME} -f label=io.kubernetes.docker.type=podsandbox -q)) <command-to-run>

Perform the query 'Account.first.domain' on the rails console

This query has been used while troubleshooting some issues.

Add the 3scale variables status into a file

The following line of code:

echo -e "\n\tAPICAST_POD_PRD: ${APICAST_POD_PRD}\n\tAPICAST_POD_STG: ${APICAST_POD_STG}\n\tMGMT_API_PRD: ${MGMT_API_PRD}\n\tMGMT_API_STG: ${MGMT_API_STG}\n\tAPICAST_ROUTE_PRD: ${APICAST_ROUTE_PRD}\n\tAPICAST_ROUTE_STG: ${APICAST_ROUTE_STG}\n\tWILDCARD POD: ${WILDCARD_POD}\n\tTHREESCALE_PORTAL_ENDPOINT: ${THREESCALE_PORTAL_ENDPOINT}\n\tSYSTEM_APP_POD: ${SYSTEM_APP_POD}

Could be better viewed later on if output to a file as well.

Parameter to decide which components should be included in the dump

Provide the possibility to pass a parameter in order to decide which of the components (Logs, Secrets, Routes, PVs, ...) should be included, here's an idea:

chmod style:
Let's say we have 7 components: A, B, C, D, E, F, G. If we want to fetch all of them we could represent it this way:
A B C D E F
1 1 1 1 1 1 1
that interpreted as a binary number and converted to decimal would be: 127.
We could then use the parameter 127 to say we want all the components. Every component except for B would become:
A B C D E F
1 0 1 1 1 1 1 --> 95, etc.

The parameter should default to 127 (all)

Some logs require an administrator to be logged on OC in order to be fetched

For example, status/node.txt may contain the following value:
Error from server (Forbidden): nodes is forbidden: User "XXXX" cannot list nodes at the cluster scope: no RBAC policy matched

If the user XXXX does not have the right permission to access the requested details.
It would probably be enough to mention in the README that you should be logged as system:admin prior to executing the script.
Optionally we could also log something to the command line as a reminder.

Add the 'detect_error' check for every 'execute_command' call

Currently it's not added. It will be useful to print on the errors.

Perform the '/status/apimanager.*' check regardless of the OCP version

It seems safer to just fail in case OCP 4.X is not being used.

Use 'staging.json' and 'production.json' for the APIcast configuration fetch

Shannon's request:

Can we change the dump to use different names for the production/staging json configurations? apicast-production.json/apicast-staging.json is conflating the terminology and confusing. They should just be called production.json/staging.json

Fetch the logs from all 'deploy' pods

It seems like the issue #9 was initially right - the 'deploy' pods are being ruled out and not the ones in an 'error' state.

This issue needs a minor follow-up fix.

Merge the 1.0-stable branch into master

This could be performed by mid Oct, 2019.

After the above is performed, it will be possible to work on the other issues.

Add the pod '3scale-operator' as a valid one

This should be useful for OCP 4.X

dc/zync-que.yaml logs missing

When executing the script on 3scale 2.6 projects the zync-que dc is missing.

Fetch the SCC

Sample:

# oc get scc
NAME                 PRIV      CAPS      SELINUX     RUNASUSER          FSGROUP     SUPGROUP    PRIORITY   READONLYROOTFS   VOLUMES
anyuid               false     []        MustRunAs   RunAsAny           RunAsAny    RunAsAny    10         false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
hostaccess           false     []        MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <none>     false            [configMap downwardAPI emptyDir hostPath persistentVolumeClaim projected secret]
hostmount-anyuid     false     []        MustRunAs   RunAsAny           RunAsAny    RunAsAny    <none>     false            [configMap downwardAPI emptyDir hostPath nfs persistentVolumeClaim projected secret]
hostnetwork          false     []        MustRunAs   MustRunAsRange     MustRunAs   MustRunAs   <none>     false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
kube-state-metrics   false     []        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <none>     false            [*]
node-exporter        false     []        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <none>     false            [*]
nonroot              false     []        MustRunAs   MustRunAsNonRoot   RunAsAny    RunAsAny    <none>     false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
privileged           true      [*]       RunAsAny    RunAsAny           RunAsAny    RunAsAny    <none>     false            [*]
restricted           false     []        MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <none>     false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]

Add the log file '3scale-dump-logs.txt' (if exists) to the dump

This is a small workaround while an official log file isn't generated yet (next stable release).

Test and validate the dump project against OCP 4.X

Currently it's only officially tested and supported under OCP 3.X.

Events from all namespaces

oc get events --all-namespaces

date +"%Y-%m-%d_%H-%M" -u is not compatible with macOS's bash

bash-3.2$ date +"%Y-%m-%d_%H-%M" -u
date: illegal time format
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ... 
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]

please flip the option order

bash-3.2$ date -u +"%Y-%m-%d_%H-%M"
2020-03-04_04-37

Remove the duplicate "Step X" when fetching the 'deploy' pod logs

Shannon complained, hence it needs to be addressed.

Fetch the hostsubnet

# oc get hostsubnet -o yaml
apiVersion: v1
items:
- apiVersion: network.openshift.io/v1
  host: master.ocp3-11-26.cluster
  hostIP: 10.0.1.6
  kind: HostSubnet
  metadata:
    annotations:
      pod.network.openshift.io/node-uid: c0687f17-d67e-11e9-a0fe-5254001aeab5
    creationTimestamp: 2019-09-13T23:32:33Z
    name: master.ocp3-11-26.cluster
    namespace: ""
    resourceVersion: "476"
    selfLink: /apis/network.openshift.io/v1/hostsubnets/master.ocp3-11-26.cluster
    uid: c2feff4a-d67e-11e9-a0fe-5254001aeab5
  subnet: 10.128.0.0/23
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

---

# oc describe hostsubnet
Name:		master.ocp3-11-26.cluster
Created:	4 weeks ago
Labels:		<none>
Annotations:	pod.network.openshift.io/node-uid=c0687f17-d67e-11e9-a0fe-5254001aeab5
Node:		master.ocp3-11-26.cluster
Node IP:	10.0.1.6
Pod Subnet:	10.128.0.0/23
Egress CIDRs:	<none>
Egress IPs:	<none>

Output any errors into a file when executing a command

The file 'errors.txt' could be used when there are any errors while executing a given command. This would be useful to troubleshoot e.g. permission level errors.

Fetch the logs from all 'error' pods

Include them in a "logs/error" directory

These were ruled out initially for some specific pods e.g. 'deploy' or 'pre', but it has been realized that they might be actually very useful.

Fetch all pods Limits and Requests into a specific file

These limits are already configured on the DeploymentConfig objects, however it will be much easier and quicker to view them in a single file containing already all the information in order to troubleshoot more easily "out of resource" issues.

Fetch only the PV's that correspond to the 3scale PVC's

Some customers have a large list of PVs that correspond to other projects and the script takes a long time to execute. The request is to limit the PV's information that the script fetches corresponding only to 3scale.

Change the 'dump' root directory to the same naming format as the .tar file generated

Instead of 3scale-dump, it would be e.g. 3scale-dump-YYYY-MM-DD_HH-MM

This would allow a Support Engineer to have a directory for troubleshooting a Case and inside this Directory several potential dumps for different dates.

Generate an official log file

Right now everything is being sent to a file manually, e.g.:

PROJECT=<3scale Project> ; curl -s https://access.redhat.com/sites/default/files/attachments/3scale-dump-<VERSION>.sh | bash -s "$PROJECT" "auto" 2>&1 | tee 3scale-dump-logs.txt

In the next stable release, an official log file needs to be generated. Using tee would allow to print the output from the whole dumo to both the stdout/stderr and this file.

Don't include "not found" previous containers

This is a follow-up from #25. Some previous containers just outputs a message similar to the following:

Error from server (BadRequest): previous terminated container "backend-cron" in pod "backend-
cron-1-mdgzf" not found

It should be simple enough to perform a string match and not include those files in the logs/previous directory.

oc get events -o yaml

Sometimes I need the event timestamps. This isn't included in the events.txt file. It seems the only way to get this is via -o yaml

oc get pod -o yaml

Capture more information regarding each pod.

Perform a verbose output from the 'THREESCALE_PORTAL_ENDPOINT' call

Example:

$ curl -X GET -v -k <THREESCALE_PORTAL_ENDPOINT>/staging.json
$ curl -X GET -v -k <THREESCALE_PORTAL_ENDPOINT>/production.json

This could help on further troubleshooting some particular issues (e.g. the reply is empty for some reason).

timeout 180 oc rsh -c system-master ${SYSTEM_APP_POD} /bin/bash -c "echo -e 'stats = Sidekiq::Stats.new\nAccessToken.all' | bundle exec rails console" > ${DUMP_DIR}/status/rails.txt 2>&1 < /dev/null

Add --show-kind and --show-labels to the 'oc get node' request

Sample query:

# oc get nodes -o wide --show-kind --show-labels

Add the 'non-admin' warning to the beginning of the dump project

This is a follow-up from #13. Instead of adding it to the end, use in the beginning and with a confirmation yes/no message.

Suggestion from Samuele:

"WARNING: you are running the script without the required privileges to obtain a full project dump, you should run the script as administrator, would you like to proceed anyway? y/N"

Fetch the 'apimanager' resource

When OCP is 4.X.

Fetch: (Cluster) Role Binding and Storage Class

This seems to be the last data missing comparing to the default OpenShift dump.

estevaobk / 3scaledump Goto Github PK

3scaledump's People

Contributors

Stargazers

Watchers

Forkers

3scaledump's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs