GithubHelp home page GithubHelp logo

Comments (20)

DarthSim avatar DarthSim commented on July 23, 2024

Hey @Vincent-Vais!

Does GODEBUG=http2client=0 solve the issue?

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

We have tried passing GODEBUG=http2client=0 env through deployment.yaml. We saw the env value being set in the pod but it did not solve the issue. Is there another way we can try passing in it to imgproxy? Just fyi we are using https://github.com/imgproxy/imgproxy-helm

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

You can use features.custom.GODEBUG: "http2client=0" for this.

If it doesn't fix the issue, you can set GODEBUG="http2debug=1" or GODEBUG="http2debug=2" to enable debug messages from HTTP2 clients.

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

@DarthSim Thanks for the suggestions. I tried setting http2client=0 through features.custom.GODEBUG - still seeing the same issue. I am going to enable debug messages tomorrow via GODEBUG="http2debug=1" and hopefully will get more information on why the client is seemingly getting "stuck"

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Hey @DarthSim, we have enabled DataDog APM tracing to help us debug the issue, and based on the spans it looks like some of the requests are getting "stuck" in the queue.

Here is a stacktrace that we found in DataDog

timeout_error: Request was timed out after 10.000171803s
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setTagError
	/root/go/pkg/mod/gopkg.in/!data!dog/[email protected]/ddtrace/tracer/span.go:328
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).SetTag
	/root/go/pkg/mod/gopkg.in/!data!dog/[email protected]/ddtrace/tracer/span.go:123
github.com/imgproxy/imgproxy/v3/metrics/datadog.SendError
	/app/metrics/datadog/datadog.go:145
github.com/imgproxy/imgproxy/v3/metrics.SendError
	/app/metrics/metrics.go:132
main.sendErr
	/app/processing_handler.go:185
main.sendErrAndPanic
	/app/processing_handler.go:190
main.checkErr
	/app/processing_handler.go:198
main.handleProcessing.func1
	/app/processing_handler.go:296
main.handleProcessing
	/app/processing_handler.go:298
main.buildRouter.withCORS.func1
	/app/server.go:113
main.buildRouter.withPanicHandler.func2
	/app/server.go:170
main.buildRouter.withMetrics.func3
	/app/server.go:102
github.com/imgproxy/imgproxy/v3/router.(*Router).ServeHTTP
	/app/router/router.go:155
net/http.serverHandler.ServeHTTP
	/usr/local/go/src/net/http/server.go:3137
net/http.(*conn).serve
	/usr/local/go/src/net/http/server.go:2039
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1695

We have tried increasing the resources available, but the issue is always reproducible when enough requests are sent. After the first request is "stuck" in the queue all the subsequent requests also become "stuck" and the only way to fix the issue is to restart the pod. From the stacktrace above it looks like we cannot acquire the semaphore lock, so the request just times out after 10s. We tried increasing the timeout to 100s, but the lock never gets acquired and all the subsequent requests timeout.

Do you know by any chance what the problem might be or what would be the next steps we can take to troubleshoot?

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

Did the requests before the stuck one complete? Maybe some of them stuck somewhere else and did not release their semaphore tokens

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

I assume they did complete, but we are missing a couple of the from DataDog dashboard. We sent 25 requests to imgproxy, and received back 25 successful responses, but only 22 of those are displayed in the DataDog dashboard, 3 are missing. We then sent another request to imgproxy and it got "stuck"

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

How do you send those requests? I'm trying to reproduce the issue, but I can't no matter how many requests I send

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Try using this command? seq 1 {numRequests} | xargs -n1 -P10 curl {url} - this would send {numRequests} in parallel

We are proxying request through another service, but the resolved URL that the service uses would be something like {host}/insecure/gravity:fp:0.5:0.5/plain/gs:/{path_to_image}

Also here is an image that we were testing with
test-image

Let me know if that makes sense or you need more info :)

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Sorry forgot to mention important thing - we are only seeing this behavior in the cloud. Locally everything works as expected, maybe because of the differences in resources available?

Edit: or differences in architechtures?

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

Are you able to connect to the container shell (since you've mentioned pods, I believe you can)? If you send SIGQUIT to imgproxy, it should quit and print its current stack trace. Can you show me a stack trace of a stuck instance?

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Hm, I am not seeing any stacktrace when killing the pod (tried during the request and after receiving timeout). I have used kill -3 1 to kill the pod and the only output I am seeing is

imgproxy@host:/$ command terminated with exit code 137

Is there any other command that I could try?

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

You need to send SIGQUIT right to imgproxy's process. PID 1 is most probably not imgproxy.

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

I used top to display all the running processes in the pod and looks like imgproxy is running under the PID 1? Or am I missing something here? 🤔

Screenshot 2024-05-29 at 3 07 32 PM

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

I somehow was sure that sh should've been PID 1... But ok.

Any Go app flushes its stack trace to STDERR when receiving SIGQUIT. I'm not quite familiar with k8s but I believe it should store it somewhere.

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Cool, I will try to find the stack traces and get back to you. Thanks!

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Found the logs from SIGQUIT

output.log

from imgproxy.

DarthSim avatar DarthSim commented on July 23, 2024

Ok, this is interesting. I see 4 goroutines writing HTTP responses. Let's make a very clean experiment:

  1. Run imgproxy and bring it to a stuck condition. To remove any obstacles, send all requests right from imgproxy's pod.
  2. Stop sending any requests to imgproxy.
  3. Check the logs. Check if all the Started... records have corresponding Completed... records. The request_id field will help you with this.
  4. Send SIGQUIT and give me the stack trace.

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

Thanks for taking a look at the logs! So far we have not been able to reproduce the issue when making requests from imgproxy's pod. Going to look more at it tomorrow, but the thinking is maybe the "stuck" request is not releasing the semaphor token, because it is busy trying to write a response to the connection that we have already closed? So maybe we can try adjusting the timeout value on our reverse proxy 🤔

from imgproxy.

Vincent-Vais avatar Vincent-Vais commented on July 23, 2024

@DarthSim Thanks for all your help on this. It looks like our issue was due to the incorrect handling of prematurely closed connections by the upstream proxy library. This was causing a socket on imgproxy side to stay open and prevented the worker from releasing the semaphore token. After applying this patch http-party/node-http-proxy#1586 to an upstream library the issue was resolved. Thanks again for all your help and best wishes with the project!

from imgproxy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.