GithubHelp home page GithubHelp logo

kubernetes-csi / csi-lib-utils Goto Github PK

View Code? Open in Web Editor NEW
37.0 37.0 48.0 15.87 MB

Common code for Kubernetes CSI sidecar containers (e.g. `external-attacher`, `external-provisioner`, etc.)

License: Apache License 2.0

Makefile 6.80% Go 54.11% Shell 34.38% Python 4.71%
k8s-sig-storage

csi-lib-utils's People

Contributors

aimuz avatar amolmote avatar andrewsykim avatar animeshk08 avatar ayanamist avatar bells17 avatar carlory avatar chrishenzie avatar cyb70289 avatar ddebroy avatar fricounet avatar gnufied avatar humblec avatar jiawei0227 avatar jsafrane avatar k8s-ci-robot avatar leiyiz avatar madhu-1 avatar mauriciopoppe avatar msau42 avatar mucahitkurt avatar namrata-ibm avatar pohly avatar saad-ali avatar sneha-at avatar spiffxp avatar sunnylovestiramisu avatar testwill avatar windayski avatar xing-yang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csi-lib-utils's Issues

Broken Link of `contributor cheat sheet` need to fix

Bug Report

I have observed same kind of issue in various kubernetes-csi project.
this happens because after the localization there are too much modifications done in the various directories.
I have observed same issue in this page also.

It has one broken link of the contributes cheat sheet which needs to fix.
I will try to look in further csi repo as well and try to fix it as soon as I can

/kind bug
/assign

protosanitizer.StripSecrets overhead too big

The protosanitizer.StripSecrets is un/marshalling every request to identify sensitive information and replace it. This operation seems to be too costly. Many CSI drivers (or other components) can print the secrets in logs when they're configured e.g. in the StorageClass. The impact of the issue is zero to little but still it might be good to have all the logging sanitized.

I have attempted to fix this in the GCP: kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#747 however the fix was reverted precisely because of the performance impact of the StripSecrets function.

Would it be possible to try to for example identify or replace the secret without the expensive JSON operations? Or any other idea if you could come with some.

Support structured logging

As a suggestion to align with the following structured logging guidelines, I've explored updating the klog functions in csi-lib-utils:
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md

I noticed that several CSI Sidecars like external-snapshotter rely on csi-lib-utils. To potentially enhance structured logging support in these Sidecars, adopting structured logging in csi-lib-utils could be beneficial.

configure prow presubmit jobs

We should run make test or some equivalent of it before each PR merge. As this is a Kubernetes repo and we already use prow for merging, we should also use prow for testing.

leader election occasionally fails to reconnect to api server

Exact root cause is still uncertain, but when apiserver is having problems, the csi sidecars will fail to get the leader election lease with this error:

"error retrieving resource lock kube-system/external-attacher-leader-my-driver: Get https://localhost:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/external-attacher-leader-my-driver: write tcp [::1]:53540->[::1]:443: write: broken pipe"

Even after apiserver comes back up, this error continues and never recovers. This is apparently intended behavior, and the fix is to enable watchdog so that kubelet can restart the container: https://github.com/kubernetes/client-go/blob/master/tools/leaderelection/healthzadaptor.go#L25

In-tree controllers like kube-controller-manager already set this.

Add otel trace instrumentation on gRPC calls

I'm proposing to add some basic tracing instrumentation of the gRPC calls made by the CSI clients using the code in connection.go to connect to the CSI driver.
The feature simply consists in adding the otelgrpc.UnaryClientInterceptor() to the existing ChainUnaryInterceptor. This is enough to create traces for all gRPC calls and they can be easily exported by the client.

I think it can be done by creating a new ConnectWithGrpcInterceptor function that will call the connect function with the added interceptor as a DialOption. This way the feature will be opt-in for the users.

I can contribute it (and the implementation in the CSI components) if it is something the community wants to see. I already implemented something similar on the azuredisk-csi-driver and plan to do the same for aws and gcp drivers.

Timeout limit in Connect() function is leading to crashloopbackoff of controller pod of CSI-Driver.

We were going through with this PR and have found that @ConnorJC3 and team has introduced a timeout limit as part of bbcd132.

Can someone please provide insight into the rationale behind implementing this timeout?
Given that many CSI-sidecars rely on this utility, imagine a scenario with two controllers where only one is the leader. In such a case, the non-leading controller is unable to connect. In the previous setup with an infinite timeout, the controllers attempted to establish a connection indefinitely. As a result, the controller pod remained in a running state, avoiding crashloopbackoff.

Once the pod assumed the leader role, the session was established seamlessly but because of this change now pod is in crashloopbackoff.

Tasks

No tasks being tracked yet.

duplicate functions in connection and rpc package

I see below functions are present on both connection and rpc package

func GetDriverName(ctx context.Context, conn *grpc.ClientConn) (string, error)

func GetPluginCapabilities(ctx context.Context, conn *grpc.ClientConn)

func GetControllerCapabilities(ctx context.Context, conn *grpc.ClientConn) (ControllerCapabilitySet, error)

func ProbeForever(conn *grpc.ClientConn, singleProbeTimeout time.Duration) error 

func probeOnce(conn *grpc.ClientConn, timeout time.Duration) 

func Probe(ctx context.Context, conn *grpc.ClientConn) (ready bool, err error) 

is this intentional or do we need to remove duplicate functions and make use of a single package?
if we need to remove it. I can do it.

CC @msau42 @pohly

Add support for pprof for CSI sidecars

Currently, some of the CSI sidecars support pprof profliing (like the azuredisk-csi-driver, the secrets-store-csi-driver or the node-driver-registrar but not all. I believe it could be useful to add this option on all CSI components and this repo seems like a good place to do it.

It could probably go in the metrics.go file in the CSIMetricsManager interface with a RegisterPprofToServer function that would register the pprof handlers.

What do you folks think about it? I would be happy to contribute this (and the implementation in the CSI sidecars) if that is something folks are willing to see ๐Ÿ˜„

configurable lease section timers?

hi,
it seems the CSI lease timers are all hard coded, could that be enhanced as configurable?
in case it's used in some slow environment, as other lease? like kube-scheduler/kube-controller-manager

defaultLeaseDuration = 15 * time.Second
defaultRenewDeadline = 10 * time.Second
defaultRetryPeriod = 5 * time.Second

thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.