GithubHelp home page GithubHelp logo

Comments (6)

tgeoghegan avatar tgeoghegan commented on August 10, 2024

The IDL document names files like {aggregation_prefix}/YYmmddHHMM-YYmmddHHMM.invalid_uuid_{n}.avro to be emitted alongside the sum parts by the facilitator. The format of this file is TBD, but I took a stab at specifying it in #2 (short version: it's a list of UUIDs). In that PR, @tlepoint suggested "to add a reason for rejection, for example INVALID_CIPHERTEXT or INVALID_PROOF." That seems very reasonable to me, but there's a number of places in the pipeline that could fail, all the way from ingestion through sum part construction. I think we need to keep track of invalid packets the whole way through from ingestor to final sum part construction.

I'm going to run through the pipeline stages I see and try to enumerate error cases so we can agree what to do about them. For each failure case I identify, I have noted how I think it should be handled. The heuristic I'm using for classification is that errors that can be resolved with a software fix to a single component should halt batch processing so we can deploy a fix and try again (e.g., facilitator is rejecting valid Avro messages because of a typo in facilitator code) but errors caused by an individual packet being malformed for any reason should not block processing of the rest of the batch, and will be recorded in an "invalid packets" list that moves through the pipeline with the good data.

I will probably end up using "validation" and "verification" interchangeably below, for which I apologize. If someone can make an argument for using one or the other word consistently everywhere, I am all ears.

Ingestion

Per Apple, if anything goes wrong during ingestion, the relevant packet or batch will be discarded, so there's nothing for data share processors to do.

PHA/Facilitator intake

(i.e. generation of validation share)

  • I/O errors (file not found, short reads, network failures, etc.): stop processing the batch, retry later.
  • Malformed ingestion header (including bad signature): stop processing the batch, alert humans.
  • Malformed individual packet (bad encoding): record bad packet with INVALID_PACKET, move on.
  • Individual packet cannot be decrypted: record bad packet with INVALID_CIPHERTEXT, move on.
  • Individual packet with bad value of a parameter like r_pit): record bad packet with INVALID_PARAMETERS, move on.

To enable the intake step to indicate failures to subsequent steps, the PrioValidityPacket Avro structure would be changed to contain a union over the triple (f_r, g_r, h_r) and a rejection reason. Keeping the invalid packets inline with the list of valid ones makes it easier to resolve the (ingestion packet, own validation packet, peer validation packet) triples during the aggregation step.

PHA/Facilitator aggregation

  • I/O errors (file not found, short reads, network failures, etc.): stop processing the batch, retry later.
  • Malformed ingestion header (including bad signature): stop processing the batch, alert humans.
  • Mismatch between parameters in validation or ingestion headers (i.e., inconsistent batch ID, name, bins, epsilon, prime, number of servers or hamming weight): stop processing the batch, alert humans.
  • Mismatch in packet count between validation batches (e.g., facilitator ingestion batch is 100 packets, facilitator emits 100 validation packets, but PHA only emits 50 validation packets): validate packets present in both validation batches, record missing ones as bad packets with MISSING_PEER_VALIDATION.
  • Verification of individual ingestion packet against verification shares fails: record bad packet with INVALID_PROOF, move on.

The invalid_packet Avro structure would be augmented to contain a rejection_reason field. It also needs a batch_uuid field: since the aggregation step sums over multiple batches, the packet's UUID is not sufficient (so the list of UUIDs I used in #2 is already wrong).

The file emitted during the aggregation step will be named {aggregation_prefix}/YYmmddHHMM-YYmmddHHMM.invalid_packets_{n}.avro since it contains more than just UUIDs now.

We end up with this enumeration of packet rejection reasons which may appear alongside the sum part sent by the facilitator to the PHA:

  • INVALID_PARAMETERS
  • INVALID_CIPHERTEXT
  • INVALID_PACKET
  • MISSING_PEER_VALIDATION
  • INVALID_PROOF

from prio-server.

tgeoghegan avatar tgeoghegan commented on August 10, 2024

One other question: if we encounter zero invalid packets going through the whole pipeline, what should the facilitator emit in the invalid_packets_{n}.avro file? An empty file? An Avro file containing an empty list?

from prio-server.

tgeoghegan avatar tgeoghegan commented on August 10, 2024

Cross-posting helpful insights from a colleague from email:

> I'm not sure what the protocol between devices and ingestion servers look like, but are there any failure cases where an individual packet could be rejected but the overall batch can continue? If so, should those failures be reported to the next stage (PHA and facilitator servers) to be rolled forward into invalid packet files?

There are reasons for rejecting, but those rejected packets will not be added to the batch, so no need to forward to the list of invalid packets.

> One other question: if we encounter zero invalid packets going through the whole pipeline, what should the facilitator emit in the invalid_packets_{n}.avro file? An empty file? An Avro file containing an empty list?

Avro file with empty list feels like the right answer. Other options mean extra code to distinguish the empty case and do something special with it.

I think our colleague is right on both counts and plan to adopt these recommendations as part of closing this ticket.

from prio-server.

tgeoghegan avatar tgeoghegan commented on August 10, 2024

This won't make it for the integration test, punt.

from prio-server.

tgeoghegan avatar tgeoghegan commented on August 10, 2024

We decided we would onboard the first PHA without this.

from prio-server.

tgeoghegan avatar tgeoghegan commented on August 10, 2024

The system has been in operation for a year-ish and we haven't ever felt a need to gather and expose this kind of information, so I am closing this as not to be fixed.

from prio-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.