GithubHelp home page GithubHelp logo

3scaledump's People

Contributors

chamalabey avatar estevaobk avatar leathersole avatar

Stargazers

 avatar  avatar

Watchers

 avatar

3scaledump's Issues

Fetch container specific logs

This feature is especially useful for the system-app pod.

The Red Hat 3scale logs should be split into different text files per container on each pod (e.g. 'system-master', 'system-provider' and 'system-developer') instead of all included in a single log file for the whole pod.

Many thanks to Anna for providing this suggestion.

Allow user to specify a case number

It would be useful if the command allowed for a case number to be specified and the output file was prefixed with this case number.

The docs should be updated to reflect this change as the default.

Add 'previous' logs fetching

Following below a description on why they could be useful:

# oc logs --help | grep -i "\-p,"
  -p, --previous=false: If true, print the logs for the previous instance of the container in a pod if it exists.

Fetch all the nodes information through 'oc describe node'

The following documentation mentions the commands oc adm top nodes and oc adm top node --selector='':

https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#viewing-nodes

However, these don't seem to work (tested with the latest 3.11 release).

We could optionally do the following:

Execute oc get node and then for every node returned by this command, oc describe node <NODE>.

This is similar to what is already being done in node.txt, but will be applied to all nodes instead of only the currently running one.

Fetch the 'oc status'

This command could be helpful to further troubleshoot issues:

# oc status --help
Show a high level overview of the current project 

This command will show services, deployment configs, build configurations, and active deployments. If you have any
misconfigured components information about them will be shown. For more information about individual items, use the
describe command (e.g. oc describe buildConfig, oc describe deploymentConfig, oc describe service). 

You can specify an output format of "-o dot" to have this command output the generated status graph in DOT format that
is suitable for use by the "dot" command.

Usage:
  oc status [-o dot | --suggest ] [flags]

Examples:
  # See an overview of the current project.
  oc status
  
  # Export the overview of the current project in an svg file.
  oc status -o dot | dot -T svg -o project.svg
  
  # See an overview of the current project including details for any identified issues.
  oc status --suggest

Options:
      --all-namespaces=false: If true, display status for all namespaces (must have cluster admin)
  -o, --output='': Output format. One of: dot.
      --suggest=false: See details for resolving issues.

Use "oc options" for a list of global command-line options (applies to all commands).

Sample output:

# oc status --suggest
In project 3scale-26 on server https://master.ocp3-11-26.cluster:8443

https://api-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway (svc/apicast-production)
https://oidc-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
https://api-using-port-8443-3scale-apicast-production.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
  dc/apicast-production deploys istag/amp-apicast:latest 
    deployment #2 deployed 13 days ago - 1 pod
    deployment #1 deployed 3 weeks ago

https://api-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway (svc/apicast-staging)
https://oidc-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
https://api-using-port-8443-3scale-apicast-staging.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port gateway
  dc/apicast-staging deploys istag/amp-apicast:latest 
    deployment #3 deployed 4 days ago - 1 pod
    deployment #2 deployed 13 days ago
    deployment #1 deployed 3 weeks ago

https://backend-3scale.3scale-26.apps.ocp3-11-26.cluster (and http) to pod port http (svc/backend-listener)
  dc/backend-listener deploys istag/amp-backend:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/backend-redis - 172.30.146.199:6379
  dc/backend-redis deploys istag/backend-redis:latest 
    deployment #3 deployed 3 weeks ago - 1 pod
    deployment #2 deployed 3 weeks ago
    deployment #1 failed 3 weeks ago: newer deployment was found running

https://master.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-master)
https://3scale-admin.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-provider)
https://3scale.3scale-26.apps.ocp3-11-26.cluster (redirects) to pod port http (svc/system-developer)
  dc/system-app deploys istag/amp-system:latest 
    deployment #4 deployed 11 days ago - 1 pod
    deployment #3 deployed 11 days ago
    deployment #2 deployed 3 weeks ago

svc/system-memcache - 172.30.199.113:11211
  dc/system-memcache deploys istag/system-memcached:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/system-mysql - 172.30.224.83:3306
  dc/system-mysql deploys istag/system-mysql:latest 
    deployment #4 deployed 3 weeks ago - 0 pods
    deployment #3 deployed 3 weeks ago
    deployment #2 deployed 3 weeks ago

svc/system-redis - 172.30.43.97:6379
  dc/system-redis deploys istag/system-redis:latest 
    deployment #3 deployed 3 weeks ago - 1 pod
    deployment #2 deployed 3 weeks ago
    deployment #1 failed 3 weeks ago: newer deployment was found running

svc/system-sphinx - 172.30.249.37:9306
  dc/system-sphinx deploys istag/amp-system:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/zync - 172.30.240.127:8080
  dc/zync deploys istag/amp-zync:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

svc/zync-database - 172.30.208.182:5432
  dc/zync-database deploys istag/zync-database-postgresql:latest 
    deployment #1 deployed 3 weeks ago - 1 pod

dc/backend-cron deploys istag/amp-backend:latest 
  deployment #1 deployed 3 weeks ago - 1 pod

dc/backend-worker deploys istag/amp-backend:latest 
  deployment #1 deployed 3 weeks ago - 1 pod

dc/system-sidekiq deploys istag/amp-system:latest 
  deployment #1 deployed 3 weeks ago - 1 pod

dc/zync-que deploys istag/amp-zync:latest 
  deployment #5 deployed 3 weeks ago - 1 pod
  deployment #4 deployed 3 weeks ago
  deployment #3 deployed 3 weeks ago

Warnings:
  * pod/apicast-production-2-wfbqn has restarted 9 times
  * pod/backend-cron-1-g5q2z has restarted 9 times
  * pod/backend-listener-1-f6zvk has restarted 12 times
  * pod/backend-redis-3-v2ctm has restarted 12 times
  * pod/backend-worker-1-lzpx9 has restarted 12 times
  * container "system-developer" in pod/system-app-4-trg29 has restarted 9 times
  * container "system-master" in pod/system-app-4-trg29 has restarted 9 times
  * container "system-provider" in pod/system-app-4-trg29 has restarted 9 times
  * pod/system-memcache-1-f4t6t has restarted 12 times
  * pod/system-redis-3-hcqkn has restarted 12 times
  * pod/system-sidekiq-1-m67nq has restarted 9 times
  * pod/system-sphinx-1-wnc2q has restarted 9 times
  * pod/zync-1-p46cc has restarted 9 times
  * pod/zync-database-1-7cr6k has restarted 12 times
  * pod/zync-que-5-8pfqd has restarted 15 times

Info:
  * dc/backend-cron has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/backend-cron --readiness ...
  * dc/backend-cron has no liveness probe to verify pods are still running.
    try: oc set probe dc/backend-cron --liveness ...
  * dc/backend-worker has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/backend-worker --readiness ...
  * dc/backend-worker has no liveness probe to verify pods are still running.
    try: oc set probe dc/backend-worker --liveness ...
  * dc/system-sidekiq has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/system-sidekiq --readiness ...
  * dc/system-sidekiq has no liveness probe to verify pods are still running.
    try: oc set probe dc/system-sidekiq --liveness ...
  * dc/system-sphinx has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/system-sphinx --readiness ...
  * dc/zync-que has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/zync-que --readiness ...

View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.

Use OpenShift instead of 'apicast-wildcard-router' to check for the APIcast Certificates

The certificate validation doesn't work starting from 2.6 On-premises because the apicast-wildcard-router pod has been deprecated and it was the single pod to include the openssl utility (no other pods include it now, not even zync or zync-que).

Executing this check from OpenShift itself might provide a false positive, since the self-signed certificates might be included in the OpenShift's default keystore.

Hence, a suggestion is to use something similar to the example below in order to allow a regular OpenShift command to be performed from inside a pod:

# nsenter -n -t $(docker inspect --format '{{.State.Pid}}' $(docker ps -f label=io.kubernetes.pod.name=${POD_NAME} -f label=io.kubernetes.pod.namespace=${PROJECT_NAME} -f label=io.kubernetes.docker.type=podsandbox -q)) <command-to-run>

Add the 3scale variables status into a file

The following line of code:

echo -e "\n\tAPICAST_POD_PRD: ${APICAST_POD_PRD}\n\tAPICAST_POD_STG: ${APICAST_POD_STG}\n\tMGMT_API_PRD: ${MGMT_API_PRD}\n\tMGMT_API_STG: ${MGMT_API_STG}\n\tAPICAST_ROUTE_PRD: ${APICAST_ROUTE_PRD}\n\tAPICAST_ROUTE_STG: ${APICAST_ROUTE_STG}\n\tWILDCARD POD: ${WILDCARD_POD}\n\tTHREESCALE_PORTAL_ENDPOINT: ${THREESCALE_PORTAL_ENDPOINT}\n\tSYSTEM_APP_POD: ${SYSTEM_APP_POD}

Could be better viewed later on if output to a file as well.

Parameter to decide which components should be included in the dump

Provide the possibility to pass a parameter in order to decide which of the components (Logs, Secrets, Routes, PVs, ...) should be included, here's an idea:

chmod style:
Let's say we have 7 components: A, B, C, D, E, F, G. If we want to fetch all of them we could represent it this way:
A B C D E F
1 1 1 1 1 1 1
that interpreted as a binary number and converted to decimal would be: 127.
We could then use the parameter 127 to say we want all the components. Every component except for B would become:
A B C D E F
1 0 1 1 1 1 1 --> 95, etc.

The parameter should default to 127 (all)

Some logs require an administrator to be logged on OC in order to be fetched

For example, status/node.txt may contain the following value:
Error from server (Forbidden): nodes is forbidden: User "XXXX" cannot list nodes at the cluster scope: no RBAC policy matched

If the user XXXX does not have the right permission to access the requested details.
It would probably be enough to mention in the README that you should be logged as system:admin prior to executing the script.
Optionally we could also log something to the command line as a reminder.

Fetch the logs from all 'deploy' pods

It seems like the issue #9 was initially right - the 'deploy' pods are being ruled out and not the ones in an 'error' state.

This issue needs a minor follow-up fix.

Fetch the SCC

Sample:

# oc get scc
NAME                 PRIV      CAPS      SELINUX     RUNASUSER          FSGROUP     SUPGROUP    PRIORITY   READONLYROOTFS   VOLUMES
anyuid               false     []        MustRunAs   RunAsAny           RunAsAny    RunAsAny    10         false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
hostaccess           false     []        MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <none>     false            [configMap downwardAPI emptyDir hostPath persistentVolumeClaim projected secret]
hostmount-anyuid     false     []        MustRunAs   RunAsAny           RunAsAny    RunAsAny    <none>     false            [configMap downwardAPI emptyDir hostPath nfs persistentVolumeClaim projected secret]
hostnetwork          false     []        MustRunAs   MustRunAsRange     MustRunAs   MustRunAs   <none>     false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
kube-state-metrics   false     []        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <none>     false            [*]
node-exporter        false     []        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <none>     false            [*]
nonroot              false     []        MustRunAs   MustRunAsNonRoot   RunAsAny    RunAsAny    <none>     false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]
privileged           true      [*]       RunAsAny    RunAsAny           RunAsAny    RunAsAny    <none>     false            [*]
restricted           false     []        MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <none>     false            [configMap downwardAPI emptyDir persistentVolumeClaim projected secret]

date +"%Y-%m-%d_%H-%M" -u is not compatible with macOS's bash

bash-3.2$ date +"%Y-%m-%d_%H-%M" -u
date: illegal time format
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ... 
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]

please flip the option order

bash-3.2$ date -u +"%Y-%m-%d_%H-%M"
2020-03-04_04-37

Fetch the hostsubnet

# oc get hostsubnet -o yaml
apiVersion: v1
items:
- apiVersion: network.openshift.io/v1
  host: master.ocp3-11-26.cluster
  hostIP: 10.0.1.6
  kind: HostSubnet
  metadata:
    annotations:
      pod.network.openshift.io/node-uid: c0687f17-d67e-11e9-a0fe-5254001aeab5
    creationTimestamp: 2019-09-13T23:32:33Z
    name: master.ocp3-11-26.cluster
    namespace: ""
    resourceVersion: "476"
    selfLink: /apis/network.openshift.io/v1/hostsubnets/master.ocp3-11-26.cluster
    uid: c2feff4a-d67e-11e9-a0fe-5254001aeab5
  subnet: 10.128.0.0/23
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

---

# oc describe hostsubnet
Name:		master.ocp3-11-26.cluster
Created:	4 weeks ago
Labels:		<none>
Annotations:	pod.network.openshift.io/node-uid=c0687f17-d67e-11e9-a0fe-5254001aeab5
Node:		master.ocp3-11-26.cluster
Node IP:	10.0.1.6
Pod Subnet:	10.128.0.0/23
Egress CIDRs:	<none>
Egress IPs:	<none>

Fetch the logs from all 'error' pods

Include them in a "logs/error" directory

These were ruled out initially for some specific pods e.g. 'deploy' or 'pre', but it has been realized that they might be actually very useful.

Fetch all pods Limits and Requests into a specific file

These limits are already configured on the DeploymentConfig objects, however it will be much easier and quicker to view them in a single file containing already all the information in order to troubleshoot more easily "out of resource" issues.

Fetch only the PV's that correspond to the 3scale PVC's

Some customers have a large list of PVs that correspond to other projects and the script takes a long time to execute. The request is to limit the PV's information that the script fetches corresponding only to 3scale.

Generate an official log file

Right now everything is being sent to a file manually, e.g.:

PROJECT=<3scale Project> ; curl -s https://access.redhat.com/sites/default/files/attachments/3scale-dump-<VERSION>.sh | bash -s "$PROJECT" "auto" 2>&1 | tee 3scale-dump-logs.txt

In the next stable release, an official log file needs to be generated. Using tee would allow to print the output from the whole dumo to both the stdout/stderr and this file.

Don't include "not found" previous containers

This is a follow-up from #25. Some previous containers just outputs a message similar to the following:

Error from server (BadRequest): previous terminated container "backend-cron" in pod "backend-
cron-1-mdgzf" not found

It should be simple enough to perform a string match and not include those files in the logs/previous directory.

oc get events -o yaml

Sometimes I need the event timestamps. This isn't included in the events.txt file. It seems the only way to get this is via -o yaml

Support RH SSO through an optional argument

Add the ability to test against an RH SSO Endpoint if an optional argument is provided, e.g:
rhsso=https://my-sso-server

Default checks to be performed includes the 'well-known' Endpoint and the infamous certificate validation.

'oc describe pod'

Add this feature since it looks like some TSE's have been asking on a few Cases.

Fetch 'AccessToken.all' from the rails console

This has been suggested today by Shannon.

Something similar to the following should work:

timeout 180 oc rsh -c system-master ${SYSTEM_APP_POD} /bin/bash -c "echo -e 'stats = Sidekiq::Stats.new\nAccessToken.all' | bundle exec rails console" > ${DUMP_DIR}/status/rails.txt 2>&1 < /dev/null

Add the 'non-admin' warning to the beginning of the dump project

This is a follow-up from #13. Instead of adding it to the end, use in the beginning and with a confirmation yes/no message.

Suggestion from Samuele:

"WARNING: you are running the script without the required privileges to obtain a full project dump, you should run the script as administrator, would you like to proceed anyway? y/N"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.