The IDL document describes an "invalid UUID" file alongside the sum part emitted by th

The <a href="https://docs.google.com/document/d/1L06dpE7OcC4CXho2UswrfHrnWKtbA9aSSmO_5

Cross-posting helpful insights from a colleague from email: <div class="snippet-cl

Protocol for invalid packet handling about prio-server HOT 6 CLOSED

divviup commented on August 10, 2024

Protocol for invalid packet handling

from prio-server.

Comments (6)

tgeoghegan commented on August 10, 2024

The IDL document names files like {aggregation_prefix}/YYmmddHHMM-YYmmddHHMM.invalid_uuid_{n}.avro to be emitted alongside the sum parts by the facilitator. The format of this file is TBD, but I took a stab at specifying it in #2 (short version: it's a list of UUIDs). In that PR, @tlepoint suggested "to add a reason for rejection, for example INVALID_CIPHERTEXT or INVALID_PROOF." That seems very reasonable to me, but there's a number of places in the pipeline that could fail, all the way from ingestion through sum part construction. I think we need to keep track of invalid packets the whole way through from ingestor to final sum part construction.

I'm going to run through the pipeline stages I see and try to enumerate error cases so we can agree what to do about them. For each failure case I identify, I have noted how I think it should be handled. The heuristic I'm using for classification is that errors that can be resolved with a software fix to a single component should halt batch processing so we can deploy a fix and try again (e.g., facilitator is rejecting valid Avro messages because of a typo in facilitator code) but errors caused by an individual packet being malformed for any reason should not block processing of the rest of the batch, and will be recorded in an "invalid packets" list that moves through the pipeline with the good data.

I will probably end up using "validation" and "verification" interchangeably below, for which I apologize. If someone can make an argument for using one or the other word consistently everywhere, I am all ears.

Ingestion

Per Apple, if anything goes wrong during ingestion, the relevant packet or batch will be discarded, so there's nothing for data share processors to do.

PHA/Facilitator intake

(i.e. generation of validation share)

I/O errors (file not found, short reads, network failures, etc.): stop processing the batch, retry later.
Malformed ingestion header (including bad signature): stop processing the batch, alert humans.
Malformed individual packet (bad encoding): record bad packet with INVALID_PACKET, move on.
Individual packet cannot be decrypted: record bad packet with INVALID_CIPHERTEXT, move on.
Individual packet with bad value of a parameter like r_pit): record bad packet with INVALID_PARAMETERS, move on.

To enable the intake step to indicate failures to subsequent steps, the PrioValidityPacket Avro structure would be changed to contain a union over the triple (f_r, g_r, h_r) and a rejection reason. Keeping the invalid packets inline with the list of valid ones makes it easier to resolve the (ingestion packet, own validation packet, peer validation packet) triples during the aggregation step.

PHA/Facilitator aggregation

I/O errors (file not found, short reads, network failures, etc.): stop processing the batch, retry later.
Malformed ingestion header (including bad signature): stop processing the batch, alert humans.
Mismatch between parameters in validation or ingestion headers (i.e., inconsistent batch ID, name, bins, epsilon, prime, number of servers or hamming weight): stop processing the batch, alert humans.
Mismatch in packet count between validation batches (e.g., facilitator ingestion batch is 100 packets, facilitator emits 100 validation packets, but PHA only emits 50 validation packets): validate packets present in both validation batches, record missing ones as bad packets with MISSING_PEER_VALIDATION.
Verification of individual ingestion packet against verification shares fails: record bad packet with INVALID_PROOF, move on.

The invalid_packet Avro structure would be augmented to contain a rejection_reason field. It also needs a batch_uuid field: since the aggregation step sums over multiple batches, the packet's UUID is not sufficient (so the list of UUIDs I used in #2 is already wrong).

The file emitted during the aggregation step will be named {aggregation_prefix}/YYmmddHHMM-YYmmddHHMM.invalid_packets_{n}.avro since it contains more than just UUIDs now.

We end up with this enumeration of packet rejection reasons which may appear alongside the sum part sent by the facilitator to the PHA:

INVALID_PARAMETERS
INVALID_CIPHERTEXT
INVALID_PACKET
MISSING_PEER_VALIDATION
INVALID_PROOF

from prio-server.

tgeoghegan commented on August 10, 2024

One other question: if we encounter zero invalid packets going through the whole pipeline, what should the facilitator emit in the invalid_packets_{n}.avro file? An empty file? An Avro file containing an empty list?

from prio-server.

tgeoghegan commented on August 10, 2024

Cross-posting helpful insights from a colleague from email:

> I'm not sure what the protocol between devices and ingestion servers look like, but are there any failure cases where an individual packet could be rejected but the overall batch can continue? If so, should those failures be reported to the next stage (PHA and facilitator servers) to be rolled forward into invalid packet files?

There are reasons for rejecting, but those rejected packets will not be added to the batch, so no need to forward to the list of invalid packets.

> One other question: if we encounter zero invalid packets going through the whole pipeline, what should the facilitator emit in the invalid_packets_{n}.avro file? An empty file? An Avro file containing an empty list?

Avro file with empty list feels like the right answer. Other options mean extra code to distinguish the empty case and do something special with it.

I think our colleague is right on both counts and plan to adopt these recommendations as part of closing this ticket.

from prio-server.

tgeoghegan commented on August 10, 2024

This won't make it for the integration test, punt.

from prio-server.

tgeoghegan commented on August 10, 2024

We decided we would onboard the first PHA without this.

from prio-server.

tgeoghegan commented on August 10, 2024

The system has been in operation for a year-ish and we haven't ever felt a need to gather and expose this kind of information, so I am closing this as not to be fixed.

from prio-server.

Protocol for invalid packet handling about prio-server HOT 6 CLOSED

Comments (6)

Ingestion

PHA/Facilitator intake

PHA/Facilitator aggregation

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs