Expected Behavior <a href="https://varnish-cache.org/docs/trunk/wh

What I think is happening here: For the first client run, <code cla

Could this be caused by <a class="commit-link" data-hovercard-type="commit" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

So, for now, I think the only workaround is to always set <code class="no

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Parent ESI fails even when VCL sets resp.status = 200 for the included ESI,about varnishcache/varnish-cache

Comments (23)

slimhazard commented on August 16, 2024

Github outrageously refuses to accept attachments with the .vtc extension. So I'll have to include it verbatim in the comment. Comments to follow.

varnishtest "Parent ESI fails despite setting ESP resp.status = 200"

server s1 {
	rxreq
	expect req.url == "/abort"
	txresp  -hdr {surrogate-control: content="ESI/1.0"} \
	    -body {before <esi:include src="/fail"/> after}

	rxreq
	expect req.url == "/fail"
	txresp -hdr "content-length: 100" -nolen
	delay 0.1
} -start

varnish v1 -cliok "param.set feature +esi_disable_xml_check"

## Doesn't matter whether this is turned on.
# varnish v1 -cliok "param.set feature +esi_include_onerror"

varnish v1 -vcl+backend {
	sub vcl_backend_response {
		# Prevent cache hits in the next tests.
		set beresp.uncacheable = true;

		## WEIRD
#		set beresp.do_esi = true;
		set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
		unset beresp.http.surrogate-control;
	}

	sub vcl_deliver {
		if (req.esi_level > 0 && resp.status != 200 &&
		    resp.status != 204) {
			set resp.status = 200;
		}
	}
} -start


client c1 {
	txreq -url "/abort"
	non_fatal
	rxresp
	expect resp.body == "before "
} -run

server s1 -wait
server s1 -start

varnish v1 -vcl+backend {
	sub vcl_backend_response {
		# Prevent cache hits in the next tests.
		set beresp.uncacheable = true;

		set beresp.do_esi = true;
		set beresp.do_stream = false;
	}

	sub vcl_deliver {
		if (req.esi_level > 0 && resp.status != 200 &&
		    resp.status != 204) {
			set resp.status = 200;
		}
	}
}

client c1 {
	txreq -url "/abort"
	non_fatal
	rxresp
	expect resp.body ~ "^before "
	# Guru Meditation from the include in between
	expect resp.body ~ " after$"
} -run

server s1 -wait
server s1 -start

varnish v1 -vcl+backend {
	sub vcl_backend_response {
		## WEIRDNESS
		set beresp.do_esi = true;
	}

	sub vcl_deliver {
		if (req.esi_level > 0 && resp.status != 200 &&
		    resp.status != 204) {
			set resp.status = 200;
		}
	}
}

client c1 -run

from varnish-cache.

slimhazard commented on August 16, 2024

The VTC test passes, and the first client -run illustrates the problem.

The server response does not have onerror="continue" (so for this test, it doesn't matter whether the esi_include_onerror is turned on). In all three client -runs, the snippet in vcl_deliver that was expected to prevent failure of the parent response is included in VCL.

As in e00035.vtc, the server response is before <esi:include src="/fail"/> after, and the response for /fail fails because it sets Content-Length > 0 but sends an empty response.

In the first client -run, the response body seen by the client is before -- the ESI failed, and the rest of the body with after is left off. This is the unexpected behavior, because we thought the vcl_deliver snippet should have prevented that.

In the last two client -runs, the client sees before <Guru meditation from the ESI> after in resp.body, as we expected. For the second client -run (the first of the two with the expected result), this is a part of vcl_backend_response:

set beresp.do_esi = true;
set beresp.do_stream = false;

Which made me think that the problem is related to streaming.

Now the weird part. In the first client -run, we have this in vcl_backend_response, exactly as in e00035.vtc:

set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
unset beresp.http.surrogate-control;

The server sets the Surrogate-Control header with matching contents, so in effect beresp.do_esi is set conditionally, but turns out to be true. This was the case for which the client response is truncated, which was the unexpected result.

In the third client -run, we have just this in vcl_backend_response:

sub vcl_backend_response {
	set beresp.do_esi = true;
}

That is, beresp.do_esi is set unconditionally to true, and beresp.do_stream is left to the default, which is true.

In that case, we don't get the unexpected result. With the snippet in vcl_deliver that sets resp.status = 200 when the ESI level > 0, the client sees before <Guru> after in resp.body. Hence the comments in the VTC about weirdness.

from varnish-cache.

nigoroll commented on August 16, 2024

What I think is happening here:

For the first client run, do_esi is true only for the request to /abort and, consequently, do_stream is false also only for the the request to /abort. That is, do_stream is true for /fail.
VCL sees the status 200 and includes the streaming response of the request to /fail, but then streaming fails and fails also the top request.

For the other two client runs, do_stream is false in both cases, explicitly in the first and explicitly in the second, because of set beresp.do_esi = true; applying to both requests.

Possible solutions:

So, for now, I think the only workaround is to always set do_stream = false for any potentially ESI included content which is not itself ESI.

From a Varnish-Cache perspective, two options come to my mind:

ignore streaming errors for ESI includes
add some option (parameter + VCL variable?) with the same effect as onerror="continue"

from varnish-cache.

dridi commented on August 16, 2024

Could this be caused by 582ded6? Can you please try this commit and its parent? (pun intended)

from varnish-cache.

nigoroll commented on August 16, 2024

Without having tried it, I think this is originally caused by this change from 26097fb

                req->doclose = SC_REM_CLOSE;
 
        req->acct.resp_bodybytes += VDP_Close(req->vdc);
+
+       if (i && ecx->abrt) {
+               req->top->topreq->vdc->retval = -1;
+               req->top->topreq->doclose = req->doclose;
+       }
 }

this is where we originally started aborting the parent when an include's delivery fails.

from varnish-cache.

slimhazard commented on August 16, 2024

@dridi cherry-picking that commit and its parent onto 7.4.2 results in conflicts. Instead of fiddling through that, I just ran the VTC in this issue against current master, and it passed.

Meaning that master also exhibits the unexpected result. Does that answer what you were getting at?

from varnish-cache.

dridi commented on August 16, 2024

So... I must admit I didn't read the whole text initially.

I introduced onerror support guarded by a feature flag in 26097fb and the default behavior changed in 582ded6.

I still have not read everything, only the ticket description. I'd like to mention that ESI processing has to disable streaming, because we need to process the beresp body before we can pass the object to clients. So there are no latency savings on an ESI fetch.

I'll read the third comment (after the VTC) later, but from the look of it we lack VCL control over ESI onerror support. We could copy the feature flag to a req flag exposed in VCL upon entering vcl_deliver, and only use this flag to control the ESI delivery.

from varnish-cache.

slimhazard commented on August 16, 2024

So, for now, I think the only workaround is to always set do_stream = false for any potentially ESI included content which is not itself ESI.

@nigoroll sounds like it might work, but the problem is that in vcl_backend_response, there's no way to know "this beresp is an ESI leaf" (and the site at which the problem was noticed doesn't have a way to determine that, such as in the path or a header).

Come to think of it, that's what resp.can_esi and obj.can_esi do -- they're true if the OA_ESIDATA attribute exists. But I can't think of a way to get from there to something useful in v_b_r -- there's no way to know about ESIDATA until you've parsed the body.

from varnish-cache.

nigoroll commented on August 16, 2024

@slimhazard what about something like this

sub vcl_miss {
  if (req.esi_level > 0) {
    set req.http.no-stream = "1";
  } else {
    unset req.http.no-stream;
  }
}

sub vcl_backend_response {
  if (bereq.http.no-stream) {
    set beresp.do_stream = false;
  }
}

from varnish-cache.

slimhazard commented on August 16, 2024

@nigoroll it looks like that works, this test passes against master:

varnishtest "VCL workaround for #4053"

server s1 {
	rxreq
	expect req.url == "/abort"
	txresp -body {before <esi:include src="/fail"/> after}

	rxreq
	expect req.url == "/fail"
	txresp -hdr "content-length: 100" -nolen
	delay 0.1
} -start

varnish v1 -cliok "param.set feature +esi_disable_xml_check"

varnish v1 -vcl+backend {
	sub vcl_miss {
		if (req.esi_level > 0) {
			set req.http.no-stream = "nope";
		} else {
			unset req.http.no-stream;
		}
	}

	sub vcl_backend_response {
		set beresp.do_esi = true;
		if (bereq.http.no-stream) {
			set beresp.do_stream = false;
		}
	}

	sub vcl_deliver {
		if (req.esi_level > 0 && resp.status != 200 &&
		    resp.status != 204) {
			set resp.status = 200;
		}
	}
} -start

client c1 {
	txreq -url "/abort"
	non_fatal
	rxresp
	expect resp.body ~ "^before "
	# Guru Meditation from the include in between
	expect resp.body ~ " after$"
} -run

$ bin/varnishtest/varnishtest -i bin/varnishtest/tests/e00101.vtc
#    top  TEST bin/varnishtest/tests/e00101.vtc passed (1.014)

Notice that there's no ## WEIRD part here about setting beresp.do_esi = true.

I'll try the same test against 7.4.2.

from varnish-cache.

slimhazard commented on August 16, 2024

Can confirm that the VCL workaround works in 7.4.2:

$ ./bin/varnishd/varnishd -V
varnishd (varnish-7.4.2 revision cd1d10ab53a6f6115b2b4f3b2a1da94c1f749f80)
Copyright (c) 2006 Verdens Gang AS
Copyright (c) 2006-2023 Varnish Software
Copyright 2010-2023 UPLEX - Nils Goroll Systemoptimierung

$ bin/varnishtest/varnishtest -i bin/varnishtest/tests/e00101.vtc
#    top  TEST bin/varnishtest/tests/e00101.vtc passed (1.015)

So we have a solution for now at the site, and presumably this helps to understand what's going on in Varnish.

from varnish-cache.

dridi commented on August 16, 2024

Now I understand the problem better. You have an ESI top request, but sub-requests may be streamed when they aren't ESI requests themselves. And a streaming-but-failing 200 response will still be caught by esi_include_onerror.

@nigoroll it looks like that works, this test passes against master:

Maybe, but this statement makes no difference in the test case:

set beresp.do_stream = false;

You can remove it or comment it out and the test case will still pass. The whole construct only makes sense if do_esi is conditional.

You could do this instead:

varnishtest "VCL workaround for #4053, take 3"

server s1 {
	rxreq
	expect req.url == "/abort"
	txresp -hdr {surrogate-control: content="ESI/1.0"} \
		-body {before <esi:include src="/fail"/> after}

	rxreq
	expect req.url == "/fail"
	txresp -hdr "content-length: 100"
	delay 0.1
} -start

varnish v1 -cliok "param.set feature +esi_disable_xml_check"

varnish v1 -vcl+backend {
	sub vcl_recv {
		set req.http.esi-level = req.esi_level;
	}

	sub vcl_backend_response {
		set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
		set beresp.do_stream = bereq.http.esi-level == "0";
	}

	sub vcl_deliver {
		if (req.esi_level > 0 && resp.status != 200 &&
		    resp.status != 204) {
			set resp.status = 200;
		}
	}
} -start

client c1 {
	txreq -url "/abort"
	non_fatal
	rxresp
	expect resp.body ~ "^before "
	# Guru Meditation from the include in between
	expect resp.body ~ " after$"
} -run

from varnish-cache.

dridi commented on August 16, 2024

I'll read the third comment (after the VTC) later, but from the look of it we lack VCL control over ESI onerror support.

I was wrong, but it doesn't mean that it was a bad idea, see #4054.

from varnish-cache.

slimhazard commented on August 16, 2024

@dridi thanks for the PR. I put on a thumbs up, so that will help it get merged, of course.

If the PR is merged: it appears to me that, to also have ESI contents included even if onerror="continue" is missing, then in addition to setting resp.esi_include_onerror, they'll also need the part about setting resp.status to 200. Is that correct?

from varnish-cache.

dridi commented on August 16, 2024

The tricky part about #4054 is that resp.esi_include_onerror is set for the parent request, whereas the resp.status check is performed on direct sub-requests.

So, you could need both fine-grained control of the flag and a status workaround, or just disable the flag when you know the backend doesn't specify onerror="continue". I'm not so sure actually, I should probably add coverage for that.

One thing I'm also wondering is whether to always initialize this from the parameter or do that only for req_top (technically, resp_top except that it does not exist) and then inherit the default value from req_top in sub-requests.

from varnish-cache.

nigoroll commented on August 16, 2024

During bugwash, @dridi and myself had a discussion which was partly confusing to me. This comment is an attempt to clarify. Please correct me if you think that I am wrong:

The case we are looking at goes back to 26097fb, where we would fail the top level esi request if delivery of an ESI sub-request's object resulted in an error and esi_include_onerror = true and the onerror=continue attribute was not present.

In 582ded6 this was changed, we would now fail the top level esi request when esi_include_onerror = false or the onerror=continue attribute was not present.

The idea was that the new behavior would be closer to the standard and could still be avoided by overwriting the status, as mentioned in the changes.rst entry.

What we overlooked at the time is that, with streaming, we do not have the final status, it could be 200 as inspected by VCL, but be 503 as seen by the ESI VDP.

While we never stream ESI objects, ESIs may include ordinary objects, for which steaming is supported. This is @slimhazard's initial test case.

So the workaround is to disable streaming for non-esi includes of esis.

#4054 proposes a sensible thing, to give vcl control over esi_include_onerror, but that does not help with this issue. We would need a flag for "always assume onerror=continue".

(edit: bold)

from varnish-cache.

dridi commented on August 16, 2024

Pretty good summary, thanks.

So the workaround is to disable streaming for non-esi includes of esis.

The VTC I submitted does just that, hopefully disambiguating the current behavior.

#4054 proposes a sensible thing, to give vcl control over esi_include_onerror, but that does not help with this issue.

As it states in the description, it was "inspired by" this ticket and never claimed to solve the problem.

We would need a flag for "always assume onerror=continue".

And I believe that would amount to a partial revert of 582ded6. I think it's a good idea to consider non 200/204 status codes as errors, but that should have been the extent of the change.

With 26097fb we already had this behavior of assuming onerror=continue unless the flag is raised. The notion of a failure should have been extended from return(fail)-or-equivalent-only to return(fail)-or-unfit-status, and nothing more.

from varnish-cache.

nigoroll commented on August 16, 2024

How about: esi_include_onerror = attribute | abort | continue ?

attribute: continue only if onerror="continue" is present
abort: stop ESI processing on any error, irrespective of the attribute
continue continue ESI processing as before 26097fb

(and same for the new vcl variable)

from varnish-cache.

nigoroll commented on August 16, 2024

As we are at it: The ESI TR specifies:

If an ESI Processor can fetch neither the src nor the alt, it returns a HTTP status code greater than 400 with an error message, unless the onerror attribute is present. If it is, and onerror="continue" is specified, ESI Processors will delete the include element silently.

The first aspect to return a >400 would imply buffering the ESI response until all subrequests are done, which we certainly won't do, so I think we should decide to deliberately not follow the TR.

But regarding the second aspect: This would, IMHO, imply to disable streaming and replace included objects with empty strings. So maybe we should document that, even with something like esi_include_onerror = attribute, one would need special VCL handling to be ESI compliant to ensure that an error include's body is discarded (e.g. by going to synth()).

from varnish-cache.

dridi commented on August 16, 2024

I wrote a shorter test case:

varnishtest "esi kerfluffle"

server s1 {
        rxreq
        expect req.http.esi-level == 0
        txresp -body {before <esi:include src="/fail"/> after}

        rxreq
        expect req.http.esi-level == 1
        txresp -status 500 -hdr "transfer-encoding: chunked"
        delay 0.1
        chunked 500
} -start

varnish v1 -cliok "param.set feature +esi_disable_xml_check"

varnish v1 -vcl+backend {
        sub vcl_recv {
                set req.http.esi-level = req.esi_level;
        }
        sub vcl_backend_response {
                set beresp.do_esi = bereq.http.esi-level == "0";
        }
        sub vcl_deliver {
                if (req.esi_level > 0 && resp.status != 200) {
                        set resp.status = 200;
                }
        }
} -start

client c1 {
        txreq
        rxresp
        expect resp.body == "before 500 after"
} -run

It fails since 582ded6 that really changed the behavior of onerrror (or lack thereof) interpretation.

It should have only added "undesirable status codes" to the list of things considered an error and let the onerror handling as-is.

If there is one thing to fix, it's that.

from varnish-cache.

nigoroll commented on August 16, 2024

FYI:

We (@slimhazard, myself and a Varnish-Cache user) have now established that the VCL code to restore the Varnish-Cache 7.2 behavior is a little more involved than originally assumed.

The required workaround is something along these lines (no news in here, just a summary):

in vcl_miss {} and vcl_pass {}

if (req.esi_level > 0) {
  set req.http.no-stream = "nope";
} else {
  unset req.http.no-stream;
}

in vcl_backend_response {}

if (bereq.http.no-stream) {
  set beresp.do_stream = false;
}

in vcl_synth {} and vcl_deliver {} before every return (deliver):

if (req.esi_level > 0 && resp.status != 200 && resp.status != 204) {
  set resp.status = 200;
}

We are now working on a simplified solution for pESI, which we would like to give up once we have settled on a solution for Varnish-Cache.

In my mind, this could be to

either replace the esi_include_onerror feature flag with a tri-state esi_include_onerror = attribute | abort | continue
or add another flag esi_include_onerror_override
or, as Dridi suggested, a partial revert of a partial revert of 582ded6, but we would need to discuss how this would look exactly.

In addition, all solutions should come with VCL control.

from varnish-cache.

bsdphk commented on August 16, 2024

I agree that this needs to always be under VCL control.
On the other hand, I'm not keen on this becoming more complicated than it already is :-/

from varnish-cache.

dridi commented on August 16, 2024

I submitted my suggestion: #4065.

from varnish-cache.

Parent ESI fails even when VCL sets resp.status = 200 for the included ESI about varnish-cache HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs