Comments (20)
Hey @Vincent-Vais!
Does GODEBUG=http2client=0
solve the issue?
from imgproxy.
We have tried passing GODEBUG=http2client=0
env through deployment.yaml
. We saw the env value being set in the pod but it did not solve the issue. Is there another way we can try passing in it to imgproxy
? Just fyi we are using https://github.com/imgproxy/imgproxy-helm
from imgproxy.
You can use features.custom.GODEBUG: "http2client=0"
for this.
If it doesn't fix the issue, you can set GODEBUG="http2debug=1"
or GODEBUG="http2debug=2"
to enable debug messages from HTTP2 clients.
from imgproxy.
@DarthSim Thanks for the suggestions. I tried setting http2client=0
through features.custom.GODEBUG
- still seeing the same issue. I am going to enable debug messages tomorrow via GODEBUG="http2debug=1"
and hopefully will get more information on why the client is seemingly getting "stuck"
from imgproxy.
Hey @DarthSim, we have enabled DataDog APM tracing to help us debug the issue, and based on the spans it looks like some of the requests are getting "stuck" in the queue.
Here is a stacktrace that we found in DataDog
timeout_error: Request was timed out after 10.000171803s
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setTagError
/root/go/pkg/mod/gopkg.in/!data!dog/[email protected]/ddtrace/tracer/span.go:328
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).SetTag
/root/go/pkg/mod/gopkg.in/!data!dog/[email protected]/ddtrace/tracer/span.go:123
github.com/imgproxy/imgproxy/v3/metrics/datadog.SendError
/app/metrics/datadog/datadog.go:145
github.com/imgproxy/imgproxy/v3/metrics.SendError
/app/metrics/metrics.go:132
main.sendErr
/app/processing_handler.go:185
main.sendErrAndPanic
/app/processing_handler.go:190
main.checkErr
/app/processing_handler.go:198
main.handleProcessing.func1
/app/processing_handler.go:296
main.handleProcessing
/app/processing_handler.go:298
main.buildRouter.withCORS.func1
/app/server.go:113
main.buildRouter.withPanicHandler.func2
/app/server.go:170
main.buildRouter.withMetrics.func3
/app/server.go:102
github.com/imgproxy/imgproxy/v3/router.(*Router).ServeHTTP
/app/router/router.go:155
net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:3137
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:2039
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1695
We have tried increasing the resources available, but the issue is always reproducible when enough requests are sent. After the first request is "stuck" in the queue all the subsequent requests also become "stuck" and the only way to fix the issue is to restart the pod. From the stacktrace above it looks like we cannot acquire the semaphore lock, so the request just times out after 10s. We tried increasing the timeout to 100s, but the lock never gets acquired and all the subsequent requests timeout.
Do you know by any chance what the problem might be or what would be the next steps we can take to troubleshoot?
from imgproxy.
Did the requests before the stuck one complete? Maybe some of them stuck somewhere else and did not release their semaphore tokens
from imgproxy.
I assume they did complete, but we are missing a couple of the from DataDog dashboard. We sent 25 requests to imgproxy, and received back 25 successful responses, but only 22 of those are displayed in the DataDog dashboard, 3 are missing. We then sent another request to imgproxy and it got "stuck"
from imgproxy.
How do you send those requests? I'm trying to reproduce the issue, but I can't no matter how many requests I send
from imgproxy.
Try using this command? seq 1 {numRequests} | xargs -n1 -P10 curl {url}
- this would send {numRequests}
in parallel
We are proxying request through another service, but the resolved URL that the service uses would be something like {host}/insecure/gravity:fp:0.5:0.5/plain/gs:/{path_to_image}
Also here is an image that we were testing with
Let me know if that makes sense or you need more info :)
from imgproxy.
Sorry forgot to mention important thing - we are only seeing this behavior in the cloud. Locally everything works as expected, maybe because of the differences in resources available?
Edit: or differences in architechtures?
from imgproxy.
Are you able to connect to the container shell (since you've mentioned pods, I believe you can)? If you send SIGQUIT to imgproxy, it should quit and print its current stack trace. Can you show me a stack trace of a stuck instance?
from imgproxy.
Hm, I am not seeing any stacktrace when killing the pod (tried during the request and after receiving timeout). I have used kill -3 1
to kill the pod and the only output I am seeing is
imgproxy@host:/$ command terminated with exit code 137
Is there any other command that I could try?
from imgproxy.
You need to send SIGQUIT right to imgproxy's process. PID 1 is most probably not imgproxy.
from imgproxy.
I used top
to display all the running processes in the pod and looks like imgproxy
is running under the PID 1? Or am I missing something here? 🤔
from imgproxy.
I somehow was sure that sh
should've been PID 1... But ok.
Any Go app flushes its stack trace to STDERR when receiving SIGQUIT. I'm not quite familiar with k8s but I believe it should store it somewhere.
from imgproxy.
Cool, I will try to find the stack traces and get back to you. Thanks!
from imgproxy.
Found the logs from SIGQUIT
from imgproxy.
Ok, this is interesting. I see 4 goroutines writing HTTP responses. Let's make a very clean experiment:
- Run imgproxy and bring it to a stuck condition. To remove any obstacles, send all requests right from imgproxy's pod.
- Stop sending any requests to imgproxy.
- Check the logs. Check if all the
Started...
records have correspondingCompleted...
records. Therequest_id
field will help you with this. - Send SIGQUIT and give me the stack trace.
from imgproxy.
Thanks for taking a look at the logs! So far we have not been able to reproduce the issue when making requests from imgproxy's pod. Going to look more at it tomorrow, but the thinking is maybe the "stuck" request is not releasing the semaphor token, because it is busy trying to write a response to the connection that we have already closed? So maybe we can try adjusting the timeout value on our reverse proxy 🤔
from imgproxy.
@DarthSim Thanks for all your help on this. It looks like our issue was due to the incorrect handling of prematurely closed connections by the upstream proxy library. This was causing a socket on imgproxy
side to stay open and prevented the worker from releasing the semaphore token. After applying this patch http-party/node-http-proxy#1586 to an upstream library the issue was resolved. Thanks again for all your help and best wishes with the project!
from imgproxy.
Related Issues (20)
- Option to report source image errors HOT 5
- Style transformation for SVG broken since 3.23 HOT 3
- Error: SVG detection does not work when it uses namespaces HOT 2
- Can't download source image: invalid JPEG format: missing SOF marker HOT 2
- ERROR png2vips: unable to read source source HOT 2
- Produces invalid SVG when namespaced attributes appear before namespace declaration HOT 1
- Support for external IDs when using S3 with assumed roles HOT 3
- Error: store of inconsistently typed value into Value HOT 1
- [Feature Request] Apply an overlay over the whole image HOT 4
- Watermark text does not support special characters HOT 7
- Source image is unreachable HOT 2
- Autoquality HOT 2
- Font overlaps when converting SVG to JPG/PNG HOT 2
- [Question] I cannot manage to get watermarks to work HOT 4
- SVG rendering failed HOT 4
- Force transcoding/re-encoding HOT 2
- Disable source image download logs HOT 2
- Problem URL Encoded Parameters HOT 1
- All imgproxy instances become unhealthy at the same time HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imgproxy.