Comments (23)
Github outrageously refuses to accept attachments with the .vtc
extension. So I'll have to include it verbatim in the comment. Comments to follow.
varnishtest "Parent ESI fails despite setting ESP resp.status = 200"
server s1 {
rxreq
expect req.url == "/abort"
txresp -hdr {surrogate-control: content="ESI/1.0"} \
-body {before <esi:include src="/fail"/> after}
rxreq
expect req.url == "/fail"
txresp -hdr "content-length: 100" -nolen
delay 0.1
} -start
varnish v1 -cliok "param.set feature +esi_disable_xml_check"
## Doesn't matter whether this is turned on.
# varnish v1 -cliok "param.set feature +esi_include_onerror"
varnish v1 -vcl+backend {
sub vcl_backend_response {
# Prevent cache hits in the next tests.
set beresp.uncacheable = true;
## WEIRD
# set beresp.do_esi = true;
set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
unset beresp.http.surrogate-control;
}
sub vcl_deliver {
if (req.esi_level > 0 && resp.status != 200 &&
resp.status != 204) {
set resp.status = 200;
}
}
} -start
client c1 {
txreq -url "/abort"
non_fatal
rxresp
expect resp.body == "before "
} -run
server s1 -wait
server s1 -start
varnish v1 -vcl+backend {
sub vcl_backend_response {
# Prevent cache hits in the next tests.
set beresp.uncacheable = true;
set beresp.do_esi = true;
set beresp.do_stream = false;
}
sub vcl_deliver {
if (req.esi_level > 0 && resp.status != 200 &&
resp.status != 204) {
set resp.status = 200;
}
}
}
client c1 {
txreq -url "/abort"
non_fatal
rxresp
expect resp.body ~ "^before "
# Guru Meditation from the include in between
expect resp.body ~ " after$"
} -run
server s1 -wait
server s1 -start
varnish v1 -vcl+backend {
sub vcl_backend_response {
## WEIRDNESS
set beresp.do_esi = true;
}
sub vcl_deliver {
if (req.esi_level > 0 && resp.status != 200 &&
resp.status != 204) {
set resp.status = 200;
}
}
}
client c1 -run
from varnish-cache.
The VTC test passes, and the first client -run
illustrates the problem.
The server response does not have onerror="continue"
(so for this test, it doesn't matter whether the esi_include_onerror
is turned on). In all three client -run
s, the snippet in vcl_deliver
that was expected to prevent failure of the parent response is included in VCL.
As in e00035.vtc
, the server response is before <esi:include src="/fail"/> after
, and the response for /fail
fails because it sets Content-Length
> 0 but sends an empty response.
In the first client -run
, the response body seen by the client is before
-- the ESI failed, and the rest of the body with after
is left off. This is the unexpected behavior, because we thought the vcl_deliver
snippet should have prevented that.
In the last two client -run
s, the client sees before <Guru meditation from the ESI> after
in resp.body
, as we expected. For the second client -run
(the first of the two with the expected result), this is a part of vcl_backend_response
:
set beresp.do_esi = true;
set beresp.do_stream = false;
Which made me think that the problem is related to streaming.
Now the weird part. In the first client -run
, we have this in vcl_backend_response
, exactly as in e00035.vtc
:
set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
unset beresp.http.surrogate-control;
The server sets the Surrogate-Control header with matching contents, so in effect beresp.do_esi
is set conditionally, but turns out to be true
. This was the case for which the client response is truncated, which was the unexpected result.
In the third client -run
, we have just this in vcl_backend_response
:
sub vcl_backend_response {
set beresp.do_esi = true;
}
That is, beresp.do_esi
is set unconditionally to true
, and beresp.do_stream
is left to the default, which is true
.
In that case, we don't get the unexpected result. With the snippet in vcl_deliver
that sets resp.status = 200
when the ESI level > 0, the client sees before <Guru> after
in resp.body
. Hence the comments in the VTC about weirdness.
from varnish-cache.
What I think is happening here:
For the first client run, do_esi
is true
only for the request to /abort
and, consequently, do_stream
is false
also only for the the request to /abort
. That is, do_stream
is true
for /fail
.
VCL sees the status 200 and includes the streaming response of the request to /fail
, but then streaming fails and fails also the top request.
For the other two client runs, do_stream
is false
in both cases, explicitly in the first and explicitly in the second, because of set beresp.do_esi = true;
applying to both requests.
Possible solutions:
So, for now, I think the only workaround is to always set do_stream = false
for any potentially ESI included content which is not itself ESI.
From a Varnish-Cache perspective, two options come to my mind:
- ignore streaming errors for ESI includes
- add some option (parameter + VCL variable?) with the same effect as
onerror="continue"
from varnish-cache.
Could this be caused by 582ded6? Can you please try this commit and its parent? (pun intended)
from varnish-cache.
Without having tried it, I think this is originally caused by this change from 26097fb
req->doclose = SC_REM_CLOSE;
req->acct.resp_bodybytes += VDP_Close(req->vdc);
+
+ if (i && ecx->abrt) {
+ req->top->topreq->vdc->retval = -1;
+ req->top->topreq->doclose = req->doclose;
+ }
}
this is where we originally started aborting the parent when an include's delivery fails.
from varnish-cache.
@dridi cherry-picking that commit and its parent onto 7.4.2 results in conflicts. Instead of fiddling through that, I just ran the VTC in this issue against current master, and it passed.
Meaning that master also exhibits the unexpected result. Does that answer what you were getting at?
from varnish-cache.
So... I must admit I didn't read the whole text initially.
I introduced onerror support guarded by a feature flag in 26097fb and the default behavior changed in 582ded6.
I still have not read everything, only the ticket description. I'd like to mention that ESI processing has to disable streaming, because we need to process the beresp body before we can pass the object to clients. So there are no latency savings on an ESI fetch.
I'll read the third comment (after the VTC) later, but from the look of it we lack VCL control over ESI onerror support. We could copy the feature flag to a req flag exposed in VCL upon entering vcl_deliver, and only use this flag to control the ESI delivery.
from varnish-cache.
So, for now, I think the only workaround is to always set
do_stream = false
for any potentially ESI included content which is not itself ESI.
@nigoroll sounds like it might work, but the problem is that in vcl_backend_response
, there's no way to know "this beresp is an ESI leaf" (and the site at which the problem was noticed doesn't have a way to determine that, such as in the path or a header).
Come to think of it, that's what resp.can_esi
and obj.can_esi
do -- they're true if the OA_ESIDATA
attribute exists. But I can't think of a way to get from there to something useful in v_b_r
-- there's no way to know about ESIDATA
until you've parsed the body.
from varnish-cache.
@slimhazard what about something like this
sub vcl_miss {
if (req.esi_level > 0) {
set req.http.no-stream = "1";
} else {
unset req.http.no-stream;
}
}
sub vcl_backend_response {
if (bereq.http.no-stream) {
set beresp.do_stream = false;
}
}
from varnish-cache.
@nigoroll it looks like that works, this test passes against master:
varnishtest "VCL workaround for #4053"
server s1 {
rxreq
expect req.url == "/abort"
txresp -body {before <esi:include src="/fail"/> after}
rxreq
expect req.url == "/fail"
txresp -hdr "content-length: 100" -nolen
delay 0.1
} -start
varnish v1 -cliok "param.set feature +esi_disable_xml_check"
varnish v1 -vcl+backend {
sub vcl_miss {
if (req.esi_level > 0) {
set req.http.no-stream = "nope";
} else {
unset req.http.no-stream;
}
}
sub vcl_backend_response {
set beresp.do_esi = true;
if (bereq.http.no-stream) {
set beresp.do_stream = false;
}
}
sub vcl_deliver {
if (req.esi_level > 0 && resp.status != 200 &&
resp.status != 204) {
set resp.status = 200;
}
}
} -start
client c1 {
txreq -url "/abort"
non_fatal
rxresp
expect resp.body ~ "^before "
# Guru Meditation from the include in between
expect resp.body ~ " after$"
} -run
$ bin/varnishtest/varnishtest -i bin/varnishtest/tests/e00101.vtc
# top TEST bin/varnishtest/tests/e00101.vtc passed (1.014)
Notice that there's no ## WEIRD
part here about setting beresp.do_esi = true
.
I'll try the same test against 7.4.2.
from varnish-cache.
Can confirm that the VCL workaround works in 7.4.2:
$ ./bin/varnishd/varnishd -V
varnishd (varnish-7.4.2 revision cd1d10ab53a6f6115b2b4f3b2a1da94c1f749f80)
Copyright (c) 2006 Verdens Gang AS
Copyright (c) 2006-2023 Varnish Software
Copyright 2010-2023 UPLEX - Nils Goroll Systemoptimierung
$ bin/varnishtest/varnishtest -i bin/varnishtest/tests/e00101.vtc
# top TEST bin/varnishtest/tests/e00101.vtc passed (1.015)
So we have a solution for now at the site, and presumably this helps to understand what's going on in Varnish.
from varnish-cache.
Now I understand the problem better. You have an ESI top request, but sub-requests may be streamed when they aren't ESI requests themselves. And a streaming-but-failing 200 response will still be caught by esi_include_onerror
.
@nigoroll it looks like that works, this test passes against master:
Maybe, but this statement makes no difference in the test case:
set beresp.do_stream = false;
You can remove it or comment it out and the test case will still pass. The whole construct only makes sense if do_esi
is conditional.
You could do this instead:
varnishtest "VCL workaround for #4053, take 3"
server s1 {
rxreq
expect req.url == "/abort"
txresp -hdr {surrogate-control: content="ESI/1.0"} \
-body {before <esi:include src="/fail"/> after}
rxreq
expect req.url == "/fail"
txresp -hdr "content-length: 100"
delay 0.1
} -start
varnish v1 -cliok "param.set feature +esi_disable_xml_check"
varnish v1 -vcl+backend {
sub vcl_recv {
set req.http.esi-level = req.esi_level;
}
sub vcl_backend_response {
set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
set beresp.do_stream = bereq.http.esi-level == "0";
}
sub vcl_deliver {
if (req.esi_level > 0 && resp.status != 200 &&
resp.status != 204) {
set resp.status = 200;
}
}
} -start
client c1 {
txreq -url "/abort"
non_fatal
rxresp
expect resp.body ~ "^before "
# Guru Meditation from the include in between
expect resp.body ~ " after$"
} -run
from varnish-cache.
I'll read the third comment (after the VTC) later, but from the look of it we lack VCL control over ESI onerror support.
I was wrong, but it doesn't mean that it was a bad idea, see #4054.
from varnish-cache.
@dridi thanks for the PR. I put on a thumbs up, so that will help it get merged, of course.
If the PR is merged: it appears to me that, to also have ESI contents included even if onerror="continue"
is missing, then in addition to setting resp.esi_include_onerror
, they'll also need the part about setting resp.status
to 200. Is that correct?
from varnish-cache.
The tricky part about #4054 is that resp.esi_include_onerror
is set for the parent request, whereas the resp.status
check is performed on direct sub-requests.
So, you could need both fine-grained control of the flag and a status workaround, or just disable the flag when you know the backend doesn't specify onerror="continue"
. I'm not so sure actually, I should probably add coverage for that.
One thing I'm also wondering is whether to always initialize this from the parameter or do that only for req_top
(technically, resp_top
except that it does not exist) and then inherit the default value from req_top
in sub-requests.
from varnish-cache.
During bugwash, @dridi and myself had a discussion which was partly confusing to me. This comment is an attempt to clarify. Please correct me if you think that I am wrong:
The case we are looking at goes back to 26097fb, where we would fail the top level esi request if delivery of an ESI sub-request's object resulted in an error and esi_include_onerror = true
and the onerror=continue
attribute was not present.
In 582ded6 this was changed, we would now fail the top level esi request when esi_include_onerror = false
or the onerror=continue
attribute was not present.
The idea was that the new behavior would be closer to the standard and could still be avoided by overwriting the status, as mentioned in the changes.rst entry.
What we overlooked at the time is that, with streaming, we do not have the final status, it could be 200 as inspected by VCL, but be 503 as seen by the ESI VDP.
While we never stream ESI objects, ESIs may include ordinary objects, for which steaming is supported. This is @slimhazard's initial test case.
So the workaround is to disable streaming for non-esi includes of esis.
#4054 proposes a sensible thing, to give vcl control over esi_include_onerror
, but that does not help with this issue. We would need a flag for "always assume onerror=continue".
(edit: bold)
from varnish-cache.
Pretty good summary, thanks.
So the workaround is to disable streaming for non-esi includes of esis.
The VTC I submitted does just that, hopefully disambiguating the current behavior.
#4054 proposes a sensible thing, to give vcl control over
esi_include_onerror
, but that does not help with this issue.
As it states in the description, it was "inspired by" this ticket and never claimed to solve the problem.
We would need a flag for "always assume onerror=continue".
And I believe that would amount to a partial revert of 582ded6. I think it's a good idea to consider non 200/204 status codes as errors, but that should have been the extent of the change.
With 26097fb we already had this behavior of assuming onerror=continue
unless the flag is raised. The notion of a failure should have been extended from return(fail)
-or-equivalent-only to return(fail)
-or-unfit-status, and nothing more.
from varnish-cache.
How about: esi_include_onerror = attribute | abort | continue
?
attribute
: continue only if onerror="continue"
is present
abort
: stop ESI processing on any error, irrespective of the attribute
continue
continue ESI processing as before 26097fb
(and same for the new vcl variable)
from varnish-cache.
As we are at it: The ESI TR specifies:
If an ESI Processor can fetch neither the src nor the alt, it returns a HTTP status code greater than 400 with an error message, unless the onerror attribute is present. If it is, and onerror="continue" is specified, ESI Processors will delete the include element silently.
The first aspect to return a >400 would imply buffering the ESI response until all subrequests are done, which we certainly won't do, so I think we should decide to deliberately not follow the TR.
But regarding the second aspect: This would, IMHO, imply to disable streaming and replace included objects with empty strings. So maybe we should document that, even with something like esi_include_onerror = attribute
, one would need special VCL handling to be ESI compliant to ensure that an error include's body is discarded (e.g. by going to synth()
).
from varnish-cache.
I wrote a shorter test case:
varnishtest "esi kerfluffle"
server s1 {
rxreq
expect req.http.esi-level == 0
txresp -body {before <esi:include src="/fail"/> after}
rxreq
expect req.http.esi-level == 1
txresp -status 500 -hdr "transfer-encoding: chunked"
delay 0.1
chunked 500
} -start
varnish v1 -cliok "param.set feature +esi_disable_xml_check"
varnish v1 -vcl+backend {
sub vcl_recv {
set req.http.esi-level = req.esi_level;
}
sub vcl_backend_response {
set beresp.do_esi = bereq.http.esi-level == "0";
}
sub vcl_deliver {
if (req.esi_level > 0 && resp.status != 200) {
set resp.status = 200;
}
}
} -start
client c1 {
txreq
rxresp
expect resp.body == "before 500 after"
} -run
It fails since 582ded6 that really changed the behavior of onerrror (or lack thereof) interpretation.
It should have only added "undesirable status codes" to the list of things considered an error and let the onerror handling as-is.
If there is one thing to fix, it's that.
from varnish-cache.
FYI:
We (@slimhazard, myself and a Varnish-Cache user) have now established that the VCL code to restore the Varnish-Cache 7.2 behavior is a little more involved than originally assumed.
The required workaround is something along these lines (no news in here, just a summary):
- in
vcl_miss {}
andvcl_pass {}
if (req.esi_level > 0) {
set req.http.no-stream = "nope";
} else {
unset req.http.no-stream;
}
- in
vcl_backend_response {}
if (bereq.http.no-stream) {
set beresp.do_stream = false;
}
- in
vcl_synth {}
andvcl_deliver {}
before everyreturn (deliver)
:
if (req.esi_level > 0 && resp.status != 200 && resp.status != 204) {
set resp.status = 200;
}
We are now working on a simplified solution for pESI, which we would like to give up once we have settled on a solution for Varnish-Cache.
In my mind, this could be to
- either replace the
esi_include_onerror
feature flag with a tri-stateesi_include_onerror = attribute | abort | continue
- or add another flag
esi_include_onerror_override
- or, as Dridi suggested, a partial revert of a partial revert of 582ded6, but we would need to discuss how this would look exactly.
In addition, all solutions should come with VCL control.
from varnish-cache.
I agree that this needs to always be under VCL control.
On the other hand, I'm not keen on this becoming more complicated than it already is :-/
from varnish-cache.
I submitted my suggestion: #4065.
from varnish-cache.
Related Issues (20)
- Assert error in mgt_vcl_discard(): Condition((vp->warm) HOT 1
- Large jumps in MAIN.s_resp_bodybytes output HOT 6
- VDP closing is still incomplete (and wrong since the last change) HOT 1
- ESI with bad status code is not aborted for empty includes
- in p7.vtc: Assert error in sml_extend(), Condition(st->len + l <= st->space) not true. HOT 2
- bitfield parameter behavior appears inconsistent HOT 3
- On cache Miss of request containing "Range: bytes=0-" response status is 200 (should be 206) HOT 7
- RFC: Director subscriptions
- varnish-7.5.0: Regressions on ppc64le, aarch64, ppc32 HOT 6
- varnish-7.5.0: regression on s390x: tests/t02014.vtc HOT 2
- vsl-query: floats do not behave as expected HOT 3
- 503 Invalid content-range header on partial response with start-end/* HOT 4
- varnishtest fails to recognise -D as an option on alpine HOT 4
- Intermittent timeout in r02310.vtc HOT 3
- HTTP/0.9 response HOT 1
- Remaining number of seconds to midnight HOT 5
- Assert error in VBP_Control(): Condition(vt->heap_idx == VBH_NOIDX) not true. HOT 6
- Child killed as it was unresponsive - stuck in deadlock with __GI___lll_lock_wait during vsl_get? HOT 10
- FAIL tests/c00057.vtc on MacOS X 14.5 with Varnish branch 6.0 LTS and master branch HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from varnish-cache.