GithubHelp home page GithubHelp logo

turtledove's Introduction

FLEDGE has been renamed to Protected Audience API. To learn more about the name change, see the blog post

TURTLEDOVE

Some online advertising has been based on showing an ad to a potentially-interested person who has previously interacted with the advertiser or ad network. Historically this has worked by the advertiser recognizing a specific person as they browse across web sites, a core privacy concern with today's web.

The TURTLEDOVE effort is about offering a new API to address this use case while offering some key privacy advances:

  • The browser, not the advertiser, holds the information about what the advertiser thinks a person is interested in.
  • Advertisers can serve ads based on an interest, but cannot combine that interest with other information about the person — in particular, with who they are or what page they are visiting.
  • Web sites the person visits, and the ad networks those sites use, cannot learn about their visitors' ad interests.

Chrome has been running a FLEDGE Origin Trial since milestone 101 (March 2022). For details of the current design, see the FLEDGE explainer or the in progress FLEDGE specification.

The FLEDGE design draws on many discussions and proposals published during 2020, most notably:

Many additional contributions came from Issues opened in this repo, and from discussion in the W3C Web Advertising Business Group.

turtledove's People

Contributors

abrik0131 avatar alexmturner avatar appascoe avatar blu25 avatar brusshamilton avatar caraitto avatar dlaliberte avatar dmdabbs avatar domfarolino avatar gtanzer avatar jacobgo avatar jensenpaul avatar jurjendewal avatar jyasskin avatar kevinkiklee avatar kgraney avatar mattmenke2 avatar michaelkleber avatar miketaylr avatar morlovich avatar orrb1 avatar peiwenhu avatar qingxinwu avatar samdutton avatar shivanigithub avatar subhagam avatar thegreatfatzby avatar xiaochen-z avatar xtlsheep avatar yoavweiss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

turtledove's Issues

Server side components in TURTLEDOVE

Hello Michael,

During the last IWA call, you alluded to the existence of server-side component in all proposals.

In SPARROW, this emanates as what we called and detailed as "the gatekeeper".

In TURTLEDOVE, we can guess some of the server-side functions. But it would be very beneficial for all of us to see them clearly highlighted, and how their privacy-preserving properties would be established and controlled, and transparent for users, publishers and advertisers.

Could you elaborate on what those server side services would be (i.e. not in the browser)?

Lack of live feedback

Turtledove's current emphasis on 2 separate ad requests without feedback threatens some core value components of today’s data-driven marketing environment. Most importantly, receiving live feedback that an ad was served, and the price of that impression, is imperative to an advertiser being successful in the programmatic ad buying process. In programmatic, budgeting and allocation of those budgets are done in real-time, in order to ensure smooth and predictable delivery of advertising dollars. The supply curve fluctuates dynamically based on when and where inventory and users are available, and the ability to decide if and when to bid can change drastically in a matter of seconds. Without real-time responses to our bids, our advertisers are at risk to major over or under spend of their budgets. This uncertainty in spend will inevitably lead to lower spend levels from advertisers.

In order to precisely pace a campaign, budgeting algorithms account for changes in supply and fluctuations in win rates a few times per second. The real-time volatility in both of volume available to an audience and likelihood of winning an auction leave high margins for error for any predictive model that would be unacceptable to advertisers. In extreme cases, an advertiser looking to spend $10,000 in a day could spend upwards of $1,000,000 before a DSP is notified the budget was spent.

Lastly, the interest group bidding leaves very large questions in billing and auditing. With only aggregated reporting, settlement between buying and selling parties becomes obscured. The current audit and discrepancy process rely on impression level reporting, as errors in billing often can be attributed back to a single impression that was processed with an error. Without knowledge of which impression caused the issue, all parties are left with too little information in order to come to a settlement. With several companies in the space being publicly traded entities, transparency into this process for auditors is critical.

Are the interest-group membership APIs expected to run on the main thread?

The auction functions are expected to be run within a new sandboxed environment different from the main thread.

  • Are the new interest-group membership APIs expected to run on the main thread, that is, not limited by the constraints of the auction execution sandbox?
  • Is the scope of the new sandbox generally limited to the execution of auction functions or are there other use cases being considered?

Selecting the interest group ad?

I'm giving the proposal a close reading and have a question about the mechanics of the in-browser auction (particular around the selection of the interest-group targeted ad which will compete against the contextually targeted ad).

I noticed the text says "The response requests an in-browser auction to pick between a contextually-targeted ad and the previously-fetched interest-group-targeted ad" (emphasis added). This language implies that the auction is a two-party affair where a single contextually targeted contender competes against a single interest-group-targeted ad.

My question is how the interested-group-targeted contender is chosen in the case where a browser is a member of multiple interest groups.

Say for example I visit shoe.com and read an article about basketball shoes (and am added to an interest group basketball-shoes and then later I read an article about running shoes and am added to an interest group about running-shoes. It's reasonable to imagine that Nike would want to advertise to me to sell me basketball shoes, while Saucony would want to advertiser to me to sell me running shoes.

When first reading the proposal, I had originally assumed that the browser would (asynchronously) fetch a basketball-shoe ad (and bidding logic) from Nike and a running shoe ad (and bidding logic) from Saucony (or rather fetch one for each party from a given ad network). Then when I visit localnewspaper.com, the browser would either:

  1. Run an auction between Nike and Saucony to get a "winner" interest group ad
  2. Have a "run off" auction of that winner vs. the contextually targeted ad

or:

  1. Have the in-browser auction directly compare bids from Nike vs. Saucony vs Contextual.

But on closer re-reading, I don't see anything about how the Nike vs. Saucony decision would be made. I don't see any obvious problem with allowing an auction with n parties but wanted to make sure there isn't an alternative vision (or that I'm not missing something fundamental).

Am I overlooking some part of the proposal that would ensure that there would never be multiple competing parties who would want to try to do interest-group-targeted advertising in a single TURTLEDOVE slot? Or is there an existing vision for how that competition would be resolved?

(This is somewhat related to #34, but that issue seems to focus on multiple groups of interested to the same party as opposed to multiple groups of interest to different parties).

Should consider a mechanism to refresh the ads

Refreshing ads is an important aspect of advertising campaigns and the request happens at a random time between the joining the group and seeing an ad. But those ads change over time and might need to be paused. It would be a good idea to have a refreshing mechanism with some decently short interval (like once a day or a couple of times a day).

Limit to number of API calls

The limit to how many API calls, such as joinAdInterestGroup, that a owner can make should be large. When publishers work together to create large scale interest groups, the efforts are typically coordinated by a single owner. As a result, the limit to the number of daily API calls allowed for a single owner across all browser instances should be at least in the tens of billions.

Limit to number of interest groups

The current TURTLEDOVE states that the interest groups names may only be between 00-99. I propose that we allow a higher limit. It’s common practice for groups of publishers to work together to create large scale interest groups. These groups will be in taxonomies with tens of thousands of other groups. To support this use case, the limit to the number of interest groups should be at least 500,000.

Publisher controls and number of contextual bid responses

What controls will the publisher have around the contextual bids? And assuming that there are controls that the publisher can use to filter out ads from advertisers they don't approve of, is the intent of the contextual bid to return multiple bids or simply there will be a different mechanism to regulate the contextual bid that doesn't involve the browser?

This is going to also impact the work for exchanges since they'll have multiple partners that buy from them, and they'll need to select the best possible bid, but if they are limited to one, this would result in a drop in yield for the publisher.

E.g.: nice-shoes-fan.com has relationship with exchange1.com and exchange2.com, upon contextual request, with a single response possible, the publisher would end up with a contextual payload that contains a bid of $2.5 from very-nice-shoes-retailer.com through exchange1 and $5 from never-nice-shoes-retailer.com through exchange2. However the publisher doesn't like to work with never-nice-shoes-retailer.com and thus discards the bid and is left with a $2.5 to win it. However exchange2 had a second bid at $4 for a different retailer that would have been approved. In this case the publisher left $1.5CPM on the table.

Ad Signals in the auction and group membership

One of the most important features of retargeting is how recently the user agent has become a member of an interest group. And more generally speaking there are obviously many other features similar to that one.

This proposal doesn't address, probably due to scope, how or if those would be implemented in some form even post 3rd party cookies and limits itself to adSignals which look to be relatively simple stuff.

Considering that the auction is running locally and in the browser, would it be possible to introduce the notion of signals provided by the browser? These signals can be quantized or binned to increase the number of members of that group to a minimum non-identifiable way and then would be passed in the auction function.

For example membership of the group could be setup as UAs that joined the group less than: 1, 4, 7, 14, 21, 28 days ago. The browser could potentially provide the number of groups that it is a member of for a given owner and so on. Effectively a way to have some form of modeling of the UA within the browser, in a write only way and read only locally that would be preserving of privacy.

Lastly, with the understanding that it may be too risky to do, what would be the risk of providing global lists of memberships not tied to a specific domain but limited in number (for example you can only set 100 global groups on a browser assigned to my ad network domain). And those are only usable in the auction if enough browsers are members of those groups.

Clarification on Turtledove goals

Hello,
Reading the turtledove proposal, there is one sentence I am not sure how to interpret:
"Web sites the person visits (...) cannot learn about their visitors' ad interests"

Does it means :

  • simply that a web site should not be able to link any specific user with an interest group (but is still allowed to get the distribution of interest groups on its visitors) ?
  • or that it cannot even get the distribution of interest groups of its visitors ?

I believe it is the first, which seems the only part implied by the privacy model, but I find the formulation a bit ambiguous and may suggest the second one.
Could you confirm which interpretation is correct ?

Default capabilities of cross domain iframe

By default, cross-domain iframes should be able to call the TURTLEDOVE APIs. Today, groups of publishers often deploy iframes managed by a single owner to create large scale interest groups using third-party cookies. Updating these iframes can be a challenge for publishers. If the iframes are able to call the TURTLEDOVE APIs by default, the iframe owners can migrate to TURTLEDOVE, while the interest group creation will continue to work for the publishers, without the publisher having to make any changes.

Training ML models

Hi @michaelkleber,

I am trying to figure out how, in this framework, one can use ML models inside the JS function. In particular, I was thinking of all the ML models that use third party cookies of users that already converted to predict which new users will be likely to convert according to their browsing navigation and I can't see how these kinds of models could be trained/used in this framework.
Do you have any idea on how these could be achieved inside this framework?
Thanks in advance,
Best
Luca

Clarification on Entities

We were having a meeting the other day, and it dawned on us that we had some confusion between us on which entities are being called in the different requests on TURTLEDOVE. We'd appreciate some clarification. With the two different interpretations we have, we see different challenges.

Entities

The TURTLEDOVE doc refers to some entities as "ad networks," for example first-ad-network.com and second-ad-network.com. It's a little confusing what exactly these refer to. I see the breakdown as:

  • Advertiser: An entity that wishes to advertise its offerings.
  • Publisher: An entity that wishes to sell inventory for ads.
  • DSP: An entity that represents multiple advertisers. It submits bids to SSPs on the behalf of its advertisers.
  • SSP: An entity that represents multiple publishers. It receives bids from DSPs and runs auctions, selling its publishers' inventory.

Some entities may be a mix of these responsibilities, but for sake of argument, let's consider them separately.

On an advertiser's page, a DSP has a pixel that would add the browser to a set of interest groups. In this scenario, first-ad-network.com would be a DSP server. The interest group request needs to make a request to first-ad-network.com, and so the interest group response (containing partial bid data) would be derived from the DSP's servers as well.

Subsequently, there's the contextual request. According to the TURTLEDOVE docs:

An interest-group request: An additional ad request, of a new and different type, is constructed by the browser and sent to the same publisher ad network.

This implies that the interest group request and contextual request call out to the same entity, first-ad-network.com, already defined to be a DSP. (In addition, in #20 I see in the discussion, "I was imagining that each piece of in-browser JS would receive signals from one ad network — the same ad network that wrote the JS in the first place.") However, also as described in #20:

Here's what happens at the time of a page visit, calling out the things that I glossed over in the explainer:

  1. Person navigates to publisher page

  2. Publisher's ad network has a script on the page which issues the contextual/1p ad request to their ad server, like today. This includes all the normal information about what page the ad would appear on.

  3. Server-side, some exchange sends RTB call-outs to various DSPs, including contextual and 1p signals. In today's world, the responses are bids that go into an auction.
    In a TURTLEDOVE world: The DSP's response could include more stuff — some signals encoding that DSP's opinion about the topic of the publisher page.

This seems to indicate that the contextual request goes to an SSP instead of a DSP.

DSP/DSP Challenge

Assuming that both requests go to the same DSP, this seems like SSPs have no fundamental role in a TURTLEDOVE world, and would instead have to pivot to being solely DSPs. It would be incumbent on DSPs to have their domains included on an as many publisher ad-network lists as possible. This seems relatively low friction to just have a publisher add a domain to a text file, so I can't really see how exchanges provide any significant value in this scenario.

Reading:

In the latter case, a URL like https://first-ad-network.com/.well-known/ad-partners.txt can list the domain names of other ad networks that first-ad-network buys space from, and a public key that the browser can use to encrypt the interest group information while it is passing through other ad networks. (Probably this should be a part of the IAB ads.txt spec, instead of a new .well-known file; it's similar to their "authorized sellers" — and they can come up with a better name than "ad-partners" for the relationship.)

Does this mean that the DSP dsp.com would be able to write interest groups under SSP's name ssp.com? Even so, it still feels like a strong incentive for the DSP to create relationships with publishers directly.

If it's supposed to function like this, there's another issue. The DSP dsp.com writes into the browser, under the SSP ssp.com domain interest_group=www.wereallylikeshoes.com_athletic-shoes. But then later, the browser calls:

GET https://ssp.com/.well-known/fetch-ads?interest_group=www.wereallylikeshoes.com_athletic-shoes

However, dsp.com is the entity in generating the response. Are we expecting ssp.com to forward this request to dsp.com? Why? That seems like additional unnecessary traffic.

DSP/SSP Challenge

The issue here has to do with the contextual response. Given that the bidding.js function has signature function(adSignals, contextualSignals), it's unclear what the SSP would actually include in the contextualSignals object and how it gets passed around:

  1. Assuming that the SSP has coordinated a bunch of responses from DSPs, does the contextualSignals object contain data from all DSPs or some "winner(s)" that the SSP predetermines? If it contains all data, then this would seem to imply that every DSPs bidding.js would include contextual signals from all DSPs integrated with the SSP. If it only contains a subset, then not all DSP bidding.js functions can effectively execute; this is problematic because no interest group data was available during the SSP's selection, and valuable opportunities (for the DSP, SSP, advertiser, and publisher) are missed.

  2. If the contextualSignals object contains information that is solely derived by the SSP (without DSP input, that is), this would seem to hamper a DSP's ability to control its own bids on contextual opportunities, or really, even have control over its own bids when interest groups are involved. From a DSP's perspective, it's desirable to apply ML techniques to both the contextual and interest group requests, and have the browser combine them consistently.


The documentation seems ambiguous to us. Which of these scenarios is intended, or is it neither?

TurtleDove for Search Ads

Search ads have a very heavy auction stack. Advertisers can bid on tens of thousands of keywords. Each auction consists of several stages, each of which runs their own complex algorithms. These stages include selection, relevance, click prediction, ranking, allocation and placement.

The models used in each algorithm consider signals coming from users’ entire sessions. Given this, there is often a large amount of data that contributes to the ads a user sees in the search engine results page. Based on our current evaluation of Turtledove, we do not see a way that a client-side auction will be able to scale to effectively meet the needs of search ads.

In the search context, the user issues a query and the Search engine finds keywords closest to that query and ads related to the query, based on the keyword match algorithms advertisers have chosen. The auction uses the user’s current and previous query and click history in deciding the relevance and click probability of the ads. In addition, the auction includes remarketing list membership to either include/exclude sets of users or to modify the bid on the relevant keywords. Turtledove does not address search scenarios and how remarketing would work in that context. Scaling a complex auction with multiple inputs including remarketing membership to the client JS will probably not work well.

Are there plans for a new API to address this scaling limitation? If not, how is Google planning to adopt Turtledove for search-based scenarios?

Thanks.

Dynamic Creative Use Case

Use Case

A common advertising use case for online advertisement is dynamic creative. While there
are many recommendation algorithms that would function well in the TURTLEDOVE framework,
one of the most basic applications is somewhat problematic: recommending a set of products
the user has already viewed, an "identity" recommender.

Advertisers with a large number of products could face issues where certain products are
viewed so infrequently that just using an interest group would not be sufficiently
differentially private. Smaller advertisers may be completely locked out of this
functionality altogether. As such, a solution would be useful to a broad base of
clients.

Proposed Solution

Perhaps I am missing something, but I propose that in addition to the interest request,
advertisers have an opportunity to write web bundles into the browser as the user is
on the advertiser's site, when a pixel is fired. I see that:

  • Advertisers would be able to add very granular data into the web bundle that would
    enable individual products to be recommended, regardless of how many views they receive.
  • There appears to be no additional concerns for privacy, as since these data would be
    written while on the advertiser's site, first-party cookie tracking would be completely
    available anyway.
  • This capability has benefits beyond dynamic creative, as it could be applied to very
    granular interest groups in general to select much more targeted ads upfront. Indeed,
    the functionality could be fully duplicative of the interest group request in general,
    providing a bidding function and the interest group response package to later be
    combined with a contextual package in the browser when on a publisher site.

While perhaps this is an additional opportunity to write bundles beyond the interest
group request, I could see this as a replacement, or rather time-shift, of the interest
group request as a whole. For any advertisement, it provides an ability to use more
fine-grained machine learning models without revealing to advertisers any more
information than they already have.

If we wanted to avoid some inefficiency with advertisers sending web bundles back on
every pixel fired, we could provide some guarantee in the browser that if a set of
interest groups for an advertiser does not have an associated web bundle, the browser
would then at a random later time make an interest group request, complete with
differential privacy, as a last opportunity to provide a web bundle after the user has
left the advertiser's site. This gives advertisers a chance to hedge their bets without
inundating the client with loads of data, with some confidence that they won't
completely miss an opportunity for delivery.

One concern would be that an ad in the web bundle would have such a specific ID that
it would deanonymize the user, but the Aggregate Reporting API should handle this by
not reporting delivery data until it's met some differential privacy bar. Advertisers
would be incentivized to not provide IDs that are too granular so they could receive
reporting back in a timely manner.

Any additional thoughts, questions, and discussion are most welcome.

Attribution model

Hi @michaelkleber,
thanks for the interesting proposal.
I think than every ad network would be interested to measure performances associated with its campaigns (now done using attribution models, which rely heavily on third-party cookies) and I cannot find a way to achieve this in your proposal. Am I mistaken? Are you aware of any proposal which addresses the attribution issue without third-party cookies?
Thanks in advance
Luca

Ad Pricing

Ad space pricing varies from publisher to publisher and the value of a given ad displayed on a given publisher will vary from buyer to buyer. Completely restricting any knowledge of the publisher for interest-group ads at ad request time will force complex pricing logic to be shifted to the browser.

This can greatly increase demands on the client system, possibly leading to degrading performance and user experience.

Buyer and Sellers often hold their pricing rules close to the vest. Sending pricing rules to the browser will also expose them publicly which could be problematic for both parties and lead to price manipulation.

Finally, it’s not clear from the proposal that if pricing were to be resolved client side there would be any way to report back what price was ultimately chosen.

Contextual vs. Interest Group based

Might miss the next call, thus quickly here. Digging into the various proposals and how they could be pieced together with TURTLEDOVE, to me there is a central piece that might need more clarity in the explainer (its mentioned in the bidding logic segment and I guess the meaning of the contextual request is broader than you'd expect from its name)

An advertiser (as mentioned also in other contexts) will have certain must have requirements in order to be willing to spend any budget, which would be (amongst others) Brand Safety, Viewability, Fraud - these are independent from the respective audience as he will not want to bid on any ad inventory that does not meet them.

The interest group mechanism and bidding process isolated is mainly centred around the re-marketing use-case, not exposing any type of data that would mainly be needed to full-fill the above mentioned requirements (he would need to bid on random inventory from his perspective), given the sandboxes goal is to provide mechanisms that avoid information leaking while using them.

This brings me to the "contextual bid" which will

  • cater for additional use-cases (classical campaigns with publisher driven audiences (open auction with 1st party data), programmatic guaranteed, IO campaigns, ...)
  • need to provide the necessary metadata to the sandbox for interest group based processing (re-marketing) to avoid that an advertiser will render a creative in an unsafe context etc. and fall back to another one, thus ensuring the above mentioned points.

The contextual bid part as described above will mainly function with existing mechanics we have today, with the notable difference of course that they will only be able to leverage 1-st party / server side data by being stripped of 3rd party tracking IDs.

Does that reflect the intention @michaelkleber - to me the wording contextual bid is a bit tricky as contextual advertising it used mainly to describe ad placement that solely depends on the content of a page viewed and is not necessary widely known to also be leveraged with 1st party audiences.

Added: An issue that will be hard to address is that the contextual bid and the interest group based one will not be won by the same advertiser. Thus the metadata would not be available from the contextual request. One might need to add a brand-safety callback to the interest group definition.

System robustess to attacks from within

What are TURTLEDOVE protections against malicious browsers or any form of tampering with the bidding process or reporting happening in-browser?

In #20, it has been mentioned that:

each bidding script would be run by the browser in an isolated environment, where it and the publisher page cannot interact.

But it doesn't seem to cover cases where the browser is malicious / infested (browser extensions, etc.).

Since the reporting will lead to payments by the advertisers, it is paramount that they can be assured that the bidding process was conducted fairly and that the reporting is accurate.

Limit on Number of Interest Groups / Web Bundles

In the Aggregate Reporting API repo, it's written:

Pending reports take up storage on the client’s device, so there should be some limits on the total storage this API can use per origin.

I presume we would want similar per-origin limits on interest groups and web bundles in TURTLEDOVE. I think this is worth explicitly stating in the explainer.

This leads to a follow-up question: I would also presume that we would want some global limit on the amount of storage occupied on the client's device. Is this accurate?

I would argue in favor of a global limit, but such a limit does open up some attack vectors. For example, a malevolent actor could, in principle, spoof being many origins, reach the quota for each origin, so that the data for prior, legitimate origins get evicted. Perhaps a mechanism such that only origins that are .well-known can add data to the browser. Maybe this is the intent, but I don't see it specified.

Beyond this, there doesn't appear to be a specification that would prevent a malevolent actor from writing their own interest groups into the browser under a legitimate origin's name. This could result in eviction of legitimate data, or result in nonsense requests during the interest group request.

Clarification on Interest group requests

Hello,
After reading the spec, and the open and closed issues, there are still some bits that i'm failing to grasp, and most of them are about the interest group request.
Given an ad request, the context is the same for all possible advertisers, so all of them get the same info, and all can apply their logic and bid for the impression.
On the other hand, interest groups are unique per domain. Following the example in the proposal, a possible "interest space" of 00-99 groups could exist, per domain.

The proposal mentions:

  • [...]"Instead it contains information about a small collection of interest-group (owner, name) pairs to target"

I'm a bit confused about the "small collection" part. As an example, if 10 publishers add an user to 10 different groups each, how that small collection would be decided?

Focusing only in one "retargeter domain", not all interest groups may be as valuable. WeReallyLikeShoes may have users added to "LikesSportShoes", to "CartAbandoner" or to both. And in case only one of its interest groups can enter the small collection that are sent in the request, it would prefer to receive CartAbandoner over LikesSportShoes, if the current user belongs to both groups. Maybe the interest groups could have a priority field? That may reduce the number of groups to pick from (only from the top-priority groups for each publisher).

From the many-retargeters perspective, how would the small-collection be applied? Only some of them would be actually called? If that assumption is correct, is there any proposal of how would that be decided?

"[..]Browsers could also choose to prevent micro-targeting, i.e. disallow interest groups that are too small[..]".
That makes a lot of sense, but at the same time, means that many "retargeters" would never be able to re-target their niche customers (i'm thinking in any kind of geographically local service). So, it would set a bar, under which it'd make no sense to set interest groups (or run campaigns) because those would always be discarded by the browsers. So knowing the size threshold would be important for many marketers.
(Btw, similar argument can be said about aggregated reporting).
Avoiding micro targeting, makes complete sense...This is just a tough problem..

And, finally, the proposal defines an Interest group as:
"An "interest group" is a collection of people whom an advertiser or their ad network believes will be interested[..]"
Can you please explain a bit more about the "or their ad network" part?

Thank you for your time!

Jose María Rodriguez

Multiple concerns with this proposal

Hello all. I have read the spec twice and I believe I have the gist of it, even if I don't completely understand how all the details would work in practice. Full disclosure, I work for a DSP in the Real Time Bidding programmatic advertising industry. As such, my concerns and questions will be catered towards specifically what this would mean for my company and similar ones in our space and what it will do to our clients and their expectations of our product offerings.

Here are some of my questions/concerns around specific proposals from the README:

the site operator can add people to a number of interest groups. 

How are they going to make a consistent list of "interest groups? (think of one site that uses "automobiles" and another site uses "cars")? Doesn't this create a lot of work for website owners? Every new piece of content will then need to be tagged w/ interest groups for advertisers?

This model can support the use case where an advertiser is unwilling to run their ad on pages about a certain topic,

the gist of this seems to indicate that ad networks must rely on accurate topic choosing by the web publisher. This also seems ripe for fraud, for low quality sites that want to put as many people as possible into as many interest groups, since the ad network has no way of determining this information for themselves or to block "low quality domains" who try to show as many ads as possible with minimal relevant content or content that is of poor quality.

The motivating use cases seems like it is centered almost exclusively around site retargeting and segments. But there is a lot of use cases outside of retargeting.

I'm concerned about an "equalizing effect" this will have on the ad industry, where the possibility of new products / innovation is basically impossible, since everyone will have to use the same exact segmentation/interest group approach. It will no longer be possible to infer new data elements based on user behavior. Maybe this is the intention, to put many/most independent ad agencies out of business and prevent new ones from appearing.

Someone recently asked about ML approaches that would also be severely restricted since we are relying on the browser (and vendors) to do everything. This echoes my concern.

If the winning ad is interest-group targeted, then the browser renders it inside some sort of new environment, an "opaque iframe", which does not allow information exchange with the surrounding page: no postMessage, no way to crawl the window tree using window.parent or window.frames[], etc."

Our advertisers definitely want to know what in-page keywords or domains won the bid for future improvement. In this case, we'd have no way of know this and therefore no way of focusing on targeting domains that have better conversion or click thru rates, or pages (even w/o domain information) that have better keywords, since we have no way of seeing the DOM tree of the winning page in this opaque iframe model. This approach severely limits the ability of advertisers to see ROI.

Similarly, budgeting precision for interest-group-targeted ad campaigns will suffer when the interest-group requests happen only a few times per day. Since interest-group-targeted ads tend to be relatively valuable to advertisers, we expect this loss of budget precision will be a cost worth paying.

This is probably fine for larger advertisers and ad networks with large budgets that can absorb temporary overspends. However, the smaller advertisers that count advertising budget in the tens of dollars instead of thousands of dollars, will likely balk at such a lack of granular control over budget. We experience this first hand with our clients.

Blind rendering in opaque iframes. This is hard because it requires ads that can render without network access, and also requires the switch to aggregate reporting. 

Again, client advertisers currently use 3rd party ad servers so they can have a neutral third party measure ads effectiveness via metrics (clicks, impressions, etc.) Disabling network access for ads eliminates this possibility entirely. We seem to be asking these advertisers to just "trust the web browser(s)" in terms of measurement, which will likely not be acceptable to many.

Other questions:

  • How do ad networks get notified that their ad was downloaded, viewed or interacted with (Clicks?) How do we determine conversion metrics?
  • Is turtledove intended to eventually expand to native mobile ads (iphone, android, in-app ads?)
  • Wouldn't having auctions run client side potentially be ripe for fraud? Malicious script altering the execution of bidding seems possible with this model. Would the bidding js code be completely sandboxed so that the publisher's page cannot alter it?

Keeping marketing strategies private

In this system, the interest groups are available to the user. That means a competing advertiser B has direct access to specific elements of the marketing strategy of advertiser A. This may already be an issue for many advertisers.

To a much greater extent, having the bidding logic and ad bundles loaded in hundreds of millions of browsers raises serious concern for marketing strategic planning teams, even though these logics may not be directly available in clear.
Indeed, marketing strategies often correlates with sensitive and proprietary information such as remaining stock, margin levels, specific partnerships, etc. Companies might not want to take the risk to expose these, which will results in lower advertising spend, as the performance they would get from it would decrease.

What would prevent to reverse engineer certain components and make sure that the logic remains fully hidden?

Creative and ad bundle specs

This proposal doesn't address a potentially big area of advertising that has to do with creatives. With the understanding that you wouldn't necessarily be ok with the status quo, it's not clear if the only format accepted in the creative is images or if they could be javascript or videos or native formats but more.

Understanding that no network activity would be accepted in the rendering of the winning ad, and that more formats than just images exist, networks likely would need a way to provide all the assets inside the web bundle, including other javascript for rendering or interacting with the user or the page.

On a related note around creatives, usually one of the main reasons, aside from tracking, for which they need network activity is viewability and brand safety checks. They would be handled differently in this new world given that viewability could be a reporting factor, but brand safety seems a bit more complicated since it would typically need to access the domain of the site, if not the full URL, in order to provide the ability to block the rendering of the ad. Not exactly sure how this would work in TURTLEDOVE.

Intelligent Customer Discovery (aka Look-alike modeling)

Problem
In our understanding of Turtledove we believe it only supports 2 types of advertising: Retargeting and contextual. This will incentivize advertisers to follow one of the following strategies:

• Create ads that will follow people around the web.
• Take a shot gun approach with contextual and buy low CPM ads to bombard users with ads.

According to Nielson, retargeting is one of the most disliked forms of advertising among consumers (link). Another report (link) shows that 79% of users think they are being tracked due to retargeting ads. Even though retargeting is an important use case for a lot of brands, it represents a small percentage of total data-driven advertising spend (think about the last time you went to tide.com).

That’s because most sophisticated advertisers have already progressed from retargeting. For the most part, they want to leverage their own first-party data, much of which has been volunteered to them by long-term consumers, through loyalty programs, etc. This data can be used to model the characteristics of their most loyal customers, and find where those same characteristics may be present elsewhere in the market – where are their next 100k most loyal customers?

Therefore, we believe the focus should be on allowing brands to find users who might be interested in buying their products. There are several key methodologies that turtledove doesn’t address:

  1. Audience Modelling: We should be trying to preserve sophisticated targeting methodologies, that leverages a brand’s valuable first party data to find the next 100k users most likely to be interested in a product, show it to them with reasonable frequency, and pay a healthy price. Audience modeling helps consumers discover new products, and it allows small brands to get traction and cut through the noise with the consumers that are likely to be interested in their new products.
  2. Audience Intersection: Another methodology that helps brands find new users is using combination of audiences. Brands use sophisticated models to understand the type of users they want to reach. If they are limited to a single interest-based segments, they will end up wasting money on buying ads they don’t need. A good example here is: A small real estate firm is looking for highly affluent individuals with an interest in real estate in a specific DMA. They try to target "interested in real estate and finance" in the Denver DMA that uses 2 ands, and a geo target. Without these intersections they will need to spend precious marketing dollars just to figure out what works.

Publisher and User Impact
Today publishers of all sizes can realize value in the ad space on their sites because advertisers believe that they can find their customers on these sites. Data driven advertising especially helps bring value to publishers with a smaller footprint (local news, sport blogs, etc). However as described above, brands leverage tools beyond just site visits and context to figure out where they can reach their potential customers. If advertisers lose the ability to easily discover new customers, they will not know how to effectively value publisher inventory. Since advertisers still need to reach their customers, this will either lead to them taking a shotgun approach and pay less per ad or move their budgets to a few big publishers (CNN, NYT, ESPN etc). To make up for the lost revenue, publishers will either have to show more ads per page or erect paywalls, neither of which are ideal or economically feasible outcomes for the end user or the long-term future of the internet.

Interest group attributes

It would be useful if there was a generic mechanism allowing advertisers to associate (attribute_name, value) pairs with interest groups on device. This information can then be used in the on-device auction.

A particular use-case is a user visiting a specific page on an advertiser's website. While in principle it is possible to implement this functionality by appending attribute-value pairs to interest group names, such an approach does not conceptually correspond well to the use-case of making an interest-group request where the reply depends on the attributes and their values.

interest group level bidding functions: maximum size and other limitations?

This issues stems directly from #46, but for the sake of clarity i thought it would be better to spin off this particular point of discussion.

Assuming that the interest group bidding logic has to stay in the browser, I think it is fair to say that you expect it to be small enough in size to accommodate for the many, many interest groups each advertiser is likely to put their visitors in. And we should expect many interest groups.

A website like Walmart, having 100 000 000 monthly visitors and 43 000 000 products is likely to have in the range of 100 000 different interest groups. No users will be eligible to all of these interest groups, but it is realistic to assume that each visit on a product page would set the user in an additional interest group. After a few days of browsing, users could be in thousands of different interest groups.

Assuming the browser won’t want (or be able to) allocate more than a few megabytes to advertising (not including prefetched creatives), the space allocated to the interest group bidding logic would be in the range of a kilobyte. This kilobyte might seem to be more than enough to accommodate for a simple value and a few rules, but not so not for to account for the subtleties and synergies between a particular interest group and a particular contextual situation. Here are a few examples:

  • publisher-advertiser synergies: sports-related interest groups on Walmart would resonate more with sport blogs and photography-enthusiasts groups with photography blogs,
  • format-products synergies: vertical or horizontal banner size will be valued differently depending on the message/creative/product you advertise for,
  • location/time/weather: these will also be of relatively different value depending on the interest group

These subtleties and synergies account for a lot of the performance – and ultimately, the revenue publishers will ultimately receive.

The interest group bidding logic cannot embed all per-publisher synergies, unless we heavily bucketize them, and keep only the most extreme value to weight the bid – but that would be at the detriment of performance. And if we start to use the contextual signal at (contextual) bid request time to pass this information, we face another issue: in order to preserve performance, each and every advertiser would need for every advertiser to send a contextual bid for every single opportunity in a market they are addressing. This has the potential to create a huge infra burden for small advertisers (whose user base is small vs the total population in the market they operate in), and a huge revenue risk for newly created content, or long-tail of publishers who would be ignored from these inclusion / exclusion lists.

All in all, we should not neglect this aspect as this could heavily impact the resulting outcomes (advertising performance / ROI, and de facto budget spent in total). My estimate of the sizing is very poor, so a refined estimation on your side would be very much welcomed!

Running a Test to measure revenue loss

Hi Michael!

I am really interested in running some actual tests of this proposal. I think it would really help inform some of the design considerations, such as the minimum size of an interest group.

I've spoken with a number of engineers on Facebook's Audience Network team to think about how we could go about designing such a test, and we immediately encountered a few big open questions we need to resolve in order to design an experiment. I'll post about them one at a time to simplify the discussion.

  1. Attribution. Currently, we are able to train machine learning systems by providing them with training data like: "We showed this ad, in this context, to this person and it did/did-not lead to a conversion". With Chrome's proposed Conversion Measurement API this would still be possible. We take large numbers of rows of this type of training data and send it to an ML model to learn. In a TURTLEDOVE world, what attributed conversion data will be available for model training? I assume we would NOT be able to use the conversion measurement API in this context and will only have access to aggregated metrics. If that's the case, what aggregate metrics will we have available for model training? Will the reporting be standardized and automatically generated by the browser, or will we have some degree of control here?

I am not sure what you have in mind, nor am I sure what would be the most useful metrics, but here are some random ideas of potential things one might attempt to measure to kick off a discussion:

  • I served campaign_id = 0x15283750234 when I received a private interest group ad request for an unknown context. It did / did-not result in a conversion.
  • In the last 24 hours, I have served a total of 10,000 advertisements on publisher X via the TURTLEDOVE API. Out of those, 234 of them were ads for advertiser Y. The aggregated reporting API tells me that 14 conversions happened on advertiser Y's website as a result of those 234 ad impressions.
  • In the last 7 days, I have served a total of 100 ads to interest group Z across a variety of publishers. Of those ads, 22 of them were ads for advertiser Y. The aggregated reporting API tells me that 1 conversion happened on advertiser Y's website as a result of those 22 ad impressions.

Thanks in advance for helping us understand these constraints so that we can properly model such an experiment.

Use case for Audience Extension

The main use case for turtledove is for Advertisers to retarget an aggregate of their users.
Could it also be used by Publishers to build audience extension solutions (sell their 1st party audiences on 3rd party websites) ?
i.e. Could Guardian target an aggregate of past readers of www.theguardian.com/uk/technology on external websites ?

Apologies if the question had been also raised, but couldn't find it. Thanks!

Product-level Turtledove

Hi all,

At RTB House we've put together a more detailed description of "Product-level Trutledove", an idea first discussed in #36, but also relevant to #31 and #41.

https://github.com/jonasz/product_level_turtledove

We think the proposed change would be a great boost to Turtledove product recommendation quality, especially for ecommerce advertisers.

Importantly, assuming product-level perspective, we were able to estimate the high level impact of adopting Turtledove on CTR.

We invite all feedback, and we're looking forward to further discussion!

Best regards,
Jonasz

Opaque Auction

The results of an ad auction has a direct impact on the revenue of multiple parties (Publisher, Advertiser, and Ad Tech intermediaries). If a party manipulates the auction, they can receive financial gain at the expense of other parties. Today, there is some trust extended between these parties that is often boosted by transparency in the form of logging, aggregated reporting, and spot checking.
As currently proposed, TURTLEDOVE would force an opaque, client side auction, which would require that trust to be extended to the browser vendor, while significantly reducing or eliminating the transparency that is possible today.

Has there been any feedback from Publishers or Advertisers about their comfort level with this change?

Have there been any proposals which could provide additional transparency to the auction in a privacy focused way?

Anti-fraud, ads.txt, and domain blindness

This is touched upon in #12, #19, and #20, however anti-fraud in itself seems important enough to call out very specifically.

The ads.txt standard was designed to solve for a specific form of fraud called "domain spoofing". In this fraud, an individual (somehow) sends bid requests through programmatic (RTB) channels which claim to originate from example.com (a presumably high value domain), yet (a) the traffic is not "legitimate" in that it does not originate from actual human beings browsing the real web site; and (b) the downstream payee of the fraudulent ad impression is unassociated with the business owner of example.com.

Ads.txt solves for this by forcing the publisher to publicly declare their authorized ad platforms. The path example.com/ads.txt is expected to enumerate ALL of the authorized channels for that domain along with the associated publisher ID on each channel. With this information, a buying platform can validate any particular RTB request against this data, rejecting non-matching platforms or publisher ID values.

While there are other forms of ad fraud, this solution has dramatically reduced the volume of domain spoofing. A solution to this seems necessary.

turtledove-js implementation

Hi all,

We wanted to have a bit better insight into how TURTLEDOVE could work on the web, so we created its simple implementation and pushed it into a public github repository. We aimed to provide a solution that will be so similar to the original standard, that it could be a drop-in replacement. Until TURTLEDOVE is implemented in browsers you can take advantage of this project to override proposed Navigator object’s methods and use it the same way you would use the TURTLEDOVE itself. You need just one import and one initialization call to enable it. Of course as we don’t aim to modify browser code by itself, the simulation is based on existing technologies:

  • localStorage
  • embedded iframes
  • communication by postMessages

All data is stored locally inside a browser and the turtledove domain acts as Turtledove would in the proposal - keeping private information private, only exposing well-defined APIs without any private data leakage

Alongside the core code, we implemented a few sample websites. Everything is currently available on the Internet, so you can play with our demo without worrying about its deployment. The functionality of the core of our simulation is not limited to the demo itself - everyone can write such an example, one just needs some dummy ad network, an advertiser that will put that ad network in the readers field and a publisher that is 'integrated' with the very same ad network.

You can check out more details about simulation in our Github repository:

https://github.com/dervan/turtledove-demo

There you will also find links to example websites. You can add yourself to a few sample user groups and see an ad based on turteldove-like auction.

We invite you to play a bit with it, align your expectations and vision with our simulation and we hope that such a demo will be a useful catalyst for discussion about how TURTLEDOVE should look like.

And the very last thing is, that of course it is not any kind of reference implementation, it has a bunch of imperfections and inaccuracies (as TURTLEDOVE itself), but we hope that we will together reach some better understanding and clearer vision of how it will look like.

@michaelkleber, would you like to take a look at our implementation and let us know if you see any obvious inconsistencies with your idea of TD?

Best regards,
Michał Jagielski

Using Interest groups as exclusion list

Hello,

I've been reading the explainer and compared it with the previous PIGIN proposal an I think there is a missing use case in the new proposal.

This proposal enables retargeting strategies in a privacy way, but today, for a lot of our ad campaigns, we are actually using third party cookies to avoid retargeting web users. This would not be possible with Turtledove.

I understand the move from PIGIN to TURTLEDOVE to ensure privacy concerns and groups correlations witch could lead to malicious identity analysis. As far as I undestands, this explainer ends with two separated worlds :

  • Retargeting ad campaigns which can use multiple user groups to provide their bid
  • contextual / segment targeting ad campaigns which can use contextual and FLOC data to provide their bids (but no access to user groups)

My question is : as an advertiser, what could I do to run contextual (or FLOC targeting) ad campaigns and ensure that I only provide my ads to new users ? This means that I would like to exclude bidding on any users with an interest group comming from my website.

This is particulary important for advertisers willing to focus their ad budgets on new customers, and also for web users who could end with more ad pressure from an advertiser if they get retargeting ads + contextual (or segement targeting) ads from the same advertiser at the same time.

The information "this user has already saw my website" should not be considered as a data leakage as hundreds of thousands users fall into the same large group. We would actualy not need to ad users to groups but only to flag users from the advertiser website and be able to use this generic information during the bidding process.

Any thoughts about this use case ?

Rodolphe

Contextual Bid

This is mentioned in the TURTLEDOVE doc, but I'd like to advocate for its official inclusion.

const contextualBid = 107;

As a possible extension to this idea, the contextual response might also contain a JS bidding function, rather than a fixed bid.

The reason for this is because it is very likely that first-ad-network.com may have written interest groups into the browser, but may also be running fully contextual campaigns for other advertisers. However, at contextual request time, first-ad-network.com is not aware if any interest groups have been added to the browser.

If no interest groups are present, a fixed bid makes sense.

If there are interest groups present, then we'd almost always want to fall back to using function(adSignals, contextualSignals) simply because it contains more data. For example, there may be a positive interaction between interest groups and the topic of the publisher. This would encourage us to submit a fixed bid of -1 to force the fallback, but this isn't possible without the knowledge.

We feel that using something more dynamic such as a contextual bidding.js file would allow us to implement logic to achieve this coordination in-browser.

User needs research

There doesn't seem to be any evidence of user research. For example:

The type of ad targeting we propose supporting can be of great value to people browsing the web, who often prefer ads for things they are interested in

Which users have you spoken to, or what research have you done, to support this?

People who like ads that remind them of sites they're interested in can keep seeing those sorts of ads.

What evidence is there that such users exist?

There's a lot of great technical content here. But before designing an API, it would be helpful to do user research to see what it is that users want. Is there a desire for individuals to see why they've been targeted? If blocking specific advertisers is a user need, why don't existing tools meet this demand?

User-tied A/B testing

Currently user tied flights are a good way to understand performance and user preference outcomes for modeling-based capabilities including creative optimization and also for measuring the impact of interest-based targeting. For this the ad network randomly assigns some users to one flight and others to another. These treatment and control flight differences are measured and then decisions for further mainstreaming of the changes are taken. It is unclear how this will work with Turtledove where the final auction happens on the browser side and the ad network does not know which ad was shown to which set of users.

Browser-side personalization (eliminating the privacy-personalization tradeoff)

Hi all,

Under the current Turtledove specification there is a tradeoff between personalization quality and privacy guarantees. The less the ad network knows about the user, the better privacy we get, but poorer ad personalization (the user gets ads that are less interesting to him/her). That is: If we increase the minimum size of an interest group, we get better privacy, but poorer ad personalization.

This is not an issue for big advertisers who will be able to create interest-groups consisting of users with homogeneous interests. Ads created for such interest-groups will be well tailored for each user within a group. However, small advertisers will not be able to do so, and will have to cluster users with varying interests into the same group. Our goal is to mitigate this imbalance between small and big advertisers.

The heart of the issue is that personalization (picking / customizing the items in the ad based on interest groups of the user) happens on the ad network's server.

We propose to eliminate the tradeoff, and thus boost both privacy and personalization quality, by performing the personalization (or a part of it) in the browser:

  • Advertiser's site stores information on user interests in the browser. This data never leaves the browser.
  • During the interest-group request, the browser obtains the ad web-bundle which contains javascript personalization logic.
  • When an interest-based ad wins, the browser-side personalization logic inspects the user interests data and picks the right items from the ad web-bundle. All this happens in an isolated manner and conforms with the opaque-iframe / blind-rendering concept.

From the technical perspective the proposal boils down to a simple extension of the API:

  • Advertiser can save custom personalization data:
      var personalizationData = { ... };
      var myGroup = {'owner' : 'www.wereallylikebooks.com',
                     'name' : 'fantasy-books',
                     'readers' : ['first-ad-network.com',
                                  'second-ad-network.com'],
                     'personalizationData' : personalizationData,
                    };
      navigator.joinAdInterestGroup(myGroup, 30 * kSecsPerDay);
  • The personalizationData is a custom blob of data, its specification is up to the advertiser. In principle its purpose is to allow for tailoring ads better suited for this particular user.
      var personalizationData = {
          // If the web bundle contains more books than we can
          // display, let's focus on user's favorite authors.
          'favorite_authors': ['J. R. R. Tolkien’,
                               ‘J. K. Rowling'],
          // Similarly, we can rank higher the items similar
          // to the ones liked by the user.
          'reviewed/ranked_items': [
              'The Hobbit': 5,
              'Harry Potter and the Prisoner of Azkaban’: 5,
              ‘Harry Potter and the Goblet of Fire': 2
          ]
          // Let's not show ads for these items anymore.
          'purchased_items': [
              'Dune',
              'Canticle for Leibowitz',
          ]
      };
  • During the rendering of the ad, the web bundle's javascript can use a simple API call to fetch the personalization data that corresponds to its interest group:
      var personalizationData = document.getInterestGroupPersonalizationData();

That data is then used to select and rank the items in the web bundle to maximize user satisfaction.

Some observations:

  • This way, from the personalization perspective, we could work with even bigger interest groups without sacrificing personalization quality. Therefore we eliminate the tradeoff from the picture and win both better privacy and better personalization.
  • We stress that the personalization data saved in the browser is never shared with any entity. It is processed within the web bundle at the time of ad rendering.
  • The browser-side personalization phase happens within the opaque iframe and has no access to contextual signals. Therefore, the mechanism does not introduce any way to leak the contextual information; neither to the advertiser nor to the ad network.
  • The personalization logic could be very lightweight, or could be more computationally demanding. However:
    • It’s done only for the ads that won the auction, not during bidding.
    • In principle, in future we could allow the web bundle to offload resource-heavy processes to trusted servers.
    • It can be limited by the browser.
  • Even if only lightweight processing is supported, this still will be very beneficial for the personalization quality.
  • For a more detailed perspective on personalization, see https://github.com/RTBHOUSE/web-advertising/blob/personalization/personalization.md

The changes to the Turtledove specification are minimal and we are able to boost both personalization and privacy. It would be great to hear your thoughts on this proposal.

SPARROW technical workshop July 16

The Criteo SPARROW folks have proposed a 1 hour technical workshop on SPARROW on Thursday July 16 at 5pm CET = 11am US East Coast time.

Zoom call details are here: https://lists.w3.org/Archives/Public/public-web-adv/2020Jul/0012.html

Purpose of this workshop is to address technical open questions, with a particular emphasis on reporting (and how and why it differs from TURTLEDOVE). You can create issues in advance on GitHub for questions you would like to discuss (https://github.com/WICG/sparrow).
Minutes will be published in the form of answers to these questions.

@lbdvt @BasileLeparmentier Thanks for setting this up, and maybe you should open an issue like this on the SPARROW repo also, so that anyone watching that repo gets a notification about it.

Feed-based Product Retargeting

Advertisers set up campaigns with ad networks where they supply a feed of product SKU ids. Then ads shown to these users contain specific product/SKU related items that are relevant to the user as long as sufficient number of users have seen the same product or related complementary products (some minimal privacy threshold of number of users such as 300/1000 etc). The decision of which product item to show in the ad is driven by ML algorithms on the server side that look at large sets of users, clicks and their conversion outcomes. Turtledove allows remarketing basic interest segments like ‘visited product view page’ but it may not be easy for publishers to manage/track segments per item/SKU and invite browsers to join interest groups at SKU levels. This item is covered partly in the discussions of this issue where the idea of item_interest_group is discussed.

User Account Level Opts Out

Users can express their personalization choices including opting out individual retargeting ads at an account level with an ad network or publisher. These preferences roam across devices, browsers, and apps wherever the user uses that account to identify themselves. Since Turtledove is scoped to a single browser and the ad network does not know which user is represented when the remarketing request reaches the ad network, there is no way in Turtledove to honor these account-level choices including opting out retargeting ads.

Video advertising on the web

Hi there,

Apologies if this question has already been tackled, but I couldn't seem to find anything on it, please redirect me if that's the case.

Have you considered how video web advertising targeted to interest groups could work within the scope of TURTLEDOVE?

Mostly I believe the differences and concern is around this part of rendering the ad:

Once the winning ad is chosen, it needs to render in the browser. If the winning ad is targeted at an interest group, it implicitly knows about the browser's group membership, and so the ad should be rendered in a privacy-preserving way to avoid leaking information.

Not sure whether the "opaque IFRAME" applies to the situation of VAST video ad players injecting winning video ads as pre or mid rolls, for example? If not, can you think of alternative techniques to prevent the leaking of IG information through the winning video URL sent back to the player?

Also, and perhaps related to this, any concerns with the potential need to pre-download and cache potentially large video files, if remote download is to be avoided at win/playback time?

Capabilities of the proposal for publishers

Hi all,

after analyzing the proposal and the comments (as many as possible), the following points remained open for us:

  • Can a publisher control which bidders are allowed to participate in the auction? This is an important point, as publishers have a need to control who is advertising on their inventory.

  • Can a publisher set a floor bid on the inventory? Publishers often want to make sure that the inventory is not sold under a given bid/price.

  • Is it possible to make ad servers compete in the browser? In the proposal this seems possible by chaining consecutive calls to renderInterestGroupAd with different values of metadata.network. This would be similar to a waterfall model, where the first winning bid wins even though another ad server would have won later in the chain. Did you think of an alternative where bidding could happen in parallel for many ad servers?

  • According to the proposal, the contextual request can contain any first-party targeting information (which could be a first-party profile of the user for example). This would mean that if a publisher wants to use its own first-party interest groups only, it could completely bypass the renderInterestGroupAd method to render ads. Is this correct? This would be a way to preserve IO (insertion order) deals.

  • Who controls the bidding rules in the browser (on-device-bid.js)? As commented here, we assume that advertisers define the script dependending on their campaigns. For each interest group an advertiser wishes to use, one bidding script can be defined to determine the bid depending on contextual and ad signals. Is this correct?

  • How does this contextual signal of the contextual response look like and where does it come from? If no schema is defined, this would imply that the author of the bidding rules that read the contextual signals (on-device-bid.js) needs to coordinate with the provider of the contextual signals.

In addition, we would be very glad if you could inform us on the progress of the proposal as well as on the further specification/development process.

Best regards,
Angelo Brillout

Working with SSPs and Exchanges

In trying to clarify this flow, let’s see how it would work in the traditional structure of exchanges and DSPs:

var myGroup = {'owner' : wereallylikeshoes.com,
              'name' : 'www.wereallylikeshoes.com-athletic-shoes',
              'readers' : ['dsp.com', 'google.com']
             };
navigator.joinAdInterestGroup(myGroup, 30 * kSecsPerDay);

How will the browser call the fetch-ads URL, will it call it to both readers? Or would the idea here be that if dsp.com works with google they’ll just put google.com in their .well-known/ad-partners.txt while only writing dsp.com in the readers and google will be able to forward the encrypted owner/name pair to dsp.com?

But in the general sense of having multiple readers, would it mean that the fetch-ads call goes out to both?

Can COWL be useful for ths use case?

Nice! Out of curiosity, have you considered using a more general mechanism like COWL for some of this? For example the opaque iframes can be implemented pretty easily using labeled iframes. I think some of the examples we describe can be retrofitted for this use case (while providing a more general mechanism beyond ads for web developers).

Outcome-based Turtledove

Hi all,

At RTB House we have formalized some of our thoughts on "outcome-based Turtledove", an idea first discussed in #5.

https://github.com/jonasz/outcome_based_turtledove

Benefits include:

  • More accurate bidding possible.
  • Mathematical guarantees on microtargeting prevention.
  • Decoupling of microtargeting prevention and bidding mechanisms.

Please let us know what you think, and we will be happy to discuss this idea further.

Best regards,
Jonasz

Advertiser Ad Quality / Brand Protection

Advertisers do not want their ads shown on questionable sites as it reflects poorly on them. Rubicon Project knows this well, we have trouble even finding a single free PSA organization that is comfortable being shown on all our publishers.

The TURTLEDOVE proposal seems to suggest that a blacklist/whitelist of meta-data (topics, domains, ect...) could be returned along with each interest-group ad, then contextual meta-data could be provided to the bidding function which would determine if the interest-group ad could be shown on the given site.

I would guess that higher-end buyers would have a whitelist of publishers they would run on, and that list would be on the order of hundreds of domains. I would also guess these buyers would include a blacklist of terms/topics they would want to layer on top of that which would likely be on the order of hundreds, but reach into the thousands for more sensitive brands.

Today the buyer whitelisting / blacklisting is primarily configured in the buyers DSP. However, the bidding function would need to be controlled by the publisher, or a publisher agent. RTB was designed to provide ad request information to the DSP where the decisioning is made on appropriateness of a given ad server-side. The TURTLEDOVE paradigm would require DSPs to provide specific criteria to the SSP on when a given ad could serve. This would be a significant departure from RTB as we know it.

Additionally, given the number of restrictions that are likely to be attached to a given interest-based ad, the likelihood of a given interest-based ad serving on a given page-load may be small. As a result, the browser would need to request and cache multiple ads to increase the likelihood of an interest-based ad being chosen in any given request. This would result in a higher ad-request to ad-impression ratio for ad serving systems, and a large memory/storage requirement for browser clients.

Intent vs Interest Groups

Interesting stuff.

In the "motivating use case" section it reads to me that it is assumed that the ad network will somehow be able to translate GET https://first-ad-network.com/.well-known/fetch-ads?interest_group=www.wereallylikeshoes.com_athletic-shoes into an intent signal in order to place a bid. Is this indeed the case?

If so, would this require those setting the interest group to use highly effective naming conventions so that the ad network could indeed determine the intent signal associated with this group. In other words if in this example the group that was set was to be called "item323442" instead of "www.wereallylikeshoes.com_athletic-shoes" the group name would likely be meaningless to a programmatic group of buyers who didn't exactly know they were looking to target something called "item323442" but instead were looking to target just "athletic shoes".

If my assumption here is completely wrong - could you provide some additional clarity as to how intent based targeting will work within the turtledove environment?

How can advertisers learn about interest groups?

In our W3C meeting yesterday(June 2nd), we explained that advertisers could learn about the members of a FLoC by analyzing the behavior(URLs) observed under a given FloC as part of ad requests. Through this learning, advertisers can reach an audience by intelligently choosing which set of FloC identifiers to target.

Is there a way to achieve analogous behavior under turtledove such that advertisers can learn about interest groups? Can the aggregate reporting API be used to communicate aggregate URL information to be further analyzed by advertisers?

Browser-side recent advertising events

TD proposal briefly suggests a functionality for ad frequency control:

 'max-times-per-minute': 1,
 'max-times-per-hour': 6,

Instead of rigid browser-side rules we suggest supplying information on recent ad events to the bidding function. This way:

  • Much more flexible logic is possible allowing better user experience and bidding precision (in our experience with the CTR model, features based on recent ad events are among the strongest ones)
  • The browser API is much more generic (e.g. there’s no need to decide whether we need hourly or daily counters)
  • User privacy is not affected.
function(adSignals, contextualSignals, recentAdEvents) {
    # calculate the base bid:
    bid = contextualSignals.is_above_the_fold ? 
        adSignals.atf_value : adSignals.btf_value;

    # fine-tune the bid based on former user interactions 
    # with ads (or lack thereof):

    # slightly decrease bid after each impression
    for (imp in recentAdEvents.impressions):
        if (now - imp.timestamp < 1h) {
            bid *= 0.5;
        } else if (now - imp.timestamp < 24h) {
            bid *= 0.8;
    }

    # and increase after each click
    for (click in recentAdEvents.clicks) {
        if (now - click.timestamp < 24h) {
            bid *= 1.2;
        }
    }

    return bid;
}

In a basic version recentAdEvents related to an interest group could be exposed only to the bidding function of this interest group. Potentially, this can be extended to maximize utility.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.