GithubHelp home page GithubHelp logo

privacysandbox / measurement-simulation Goto Github PK

View Code? Open in Web Editor NEW
12.0 9.0 3.0 60.98 MB

This repository contains instructions and scripts to set up and test the Privacy Sandbox Measurement Simulation library.

License: Apache License 2.0

Starlark 1.67% Java 78.73% Python 2.05% Makefile 0.06% HTML 0.13% JavaScript 17.34% CSS 0.03%

measurement-simulation's Introduction

Attribution Reporting Simulation Library

The Attribution Reporting Simulation Library allows you to examine the impact of the Attribution Reporting API by taking historical data and presenting it as if it were collected by the API in real-time. This allows you to compare historical conversion numbers with Attribution Reporting Simulation Library results to see how reporting accuracy would change. You can also use the Simulation Library to experiment with different aggregation key structures and batching strategies, and train optimization models on Simulation Library reports to compare projected performance with models based on current data.

The Attribution Reporting Simulation Library provides a simplified mock environment, allowing you to test parameters and evaluate how the API can satisfy ad tech measurement use cases while making minimal investments in local infrastructure and resources.

This document shows you how to get up and running with the Attribution Reporting Simulation Library. For more details, see OVERVIEW.md

Build & Run

This repository depends on Bazel 4.2.2 with JDK 11 and Python 3.8. The following environment variables should be set in your local environment (the exact location will depend on your environment):

JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
python=/usr/local/bin/python

Simulation CLI arguments

The library uses general simulation related arguments which can be passed in the CLI. Following is the list of such arguments:

Running Python wrapper Running java code Description
input_directory inputDirectory The top level directory of where the library will get its inputs
output_directory outputDirectory The directory that will hold results from the simulation
source_start_date sourceStartDate The first date of attribution source events
source_end_date sourceEndDate The last date of attribution source events, should come on or after source_start_date
attribution_source_file_name attributionSourceFileName The file name that will be used to identify the files that hold attribution source events. Default value: "attribution_source.json"
trigger_start_date triggerStartDate The first date of trigger events
trigger_end_date triggerEndDate The last date of trigger events, should come on or after trigger_start_date
trigger_file_name triggerFileName The file name that will be used to identify the files that hold trigger events. Default value: "trigger.json"
extension_event_start_date extensionEventStartDate The first date of install/uninstall events
extension_event_end_date extensionEventEndDate The last date of install/uninstall events, should come on or after extension_event_start_date
extension_event_file_name extensionEventFileName The file name that will be used to identify the files that hold install/uninstall events. Default value: "extension.json"

Configuring Privacy parameters

The library allows you to configure the privacy params for both Event and Aggregate API. These params are located in the library's config directory:

  1. AggregationArgs.properties: Configurable params for Aggregation service.
  2. PrivacyParams.properties: Configurable noising params for Event API.

Simply modify the params in these properties file and run the library.

You can run the simulation library as a standalone Python library, imported as a Python module, or as a JAR:

Standalone Python library

  1. Navigate to the root directory of the library and run bazel build //... to build the repo.
  2. Execute it by running bazel run -- //python:main, followed by the desired arguments, e.g. --input_directory=<path_to_testdata>.

Import as Python module

  1. Copy your script that contains the existing pipeline to the Python directory.

  2. Import the Attribution Reporting Simulation Library and instantiate the simulation runner:

    from simulation_runner_wrapper import SimulationRunnerWrapper
    simulation_runner = SimulationRunnerWrapper()
    
  3. Instantiate the SimulationConfig class to provide command-line arguments:

    from simulation_config import SimulationConfig
    config = SimulationConfig(input_directory=<path_to_testdata>, source_start_date="2022-10-20", ...)
    
  4. Finally, execute the simulation by calling

    simulation_runner.run(simulation_config=config)
    

Run the JAR directly

Execute the simulation by running bazel run -- //:SimulationRunner followed by desired arguments, e.g. --inputDirectory=<path_to_testdata>.

Sample run

$ bazel run -- //:SimulationRunner --sourceStartDate=2022-01-15 --sourceEndDate=2022-01-16 --triggerStartDate=2022-01-15 --triggerEndDate=2022-02-06 --inputDirectory=<path_to_simulation_library>/testdata/ --outputDirectory=<path_to_output_directory>

After the successful run, you should see the following files and directories in the output directory:

  • input_batches

    • Several .avro files - These are aggregatable batches that are sent to the aggregation service as input.
  • For each .avro file, the following will also be generated:

    • <input_avro_file_name>/output.avro - Output aggregate report
    • <input_avro_file_name>/result_info.json
  • OS/U1/event_reports.json - Event reports for the user "U1" using logs for the OS platform

  • OS/U2/event_reports.json - Event reports for the user "U2" using logs for the OS platform

Reading from the output avro files

You can download the Avro tools jar 1.11.1 here. To read the avro file in human-readable json format, run:

java -jar avro-tools-1.11.1.jar tojson <output_avro_file>

You can see the output as:

{"bucket": "key1", "metric": <value1>}
{"bucket": "key2", "metric": <value2>}

Contribution

Please see CONTRIBUTING.md for details.

measurement-simulation's People

Contributors

cshmerling avatar sanbeiji avatar suriheemanshu avatar yawfrempong-goog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

measurement-simulation's Issues

IllegalArgumentException: Can not determine a default Coder error

Getting the error when run

bazel run -- //:SimulationRunner --sourceStartDate=2022-01-15 --sourceEndDate=2022-01-16 --triggerStartDate=2022-01-15 --triggerEndDate=2022-02-06 --inputDirectory=/input_directory=/path/measurement-simulation/testdata --outputDirectory=/path/measurement-simulation/output-testdata

NFO: Build completed successfully, 20 total actions
Simulating Attribution Reporting API...
Exception in thread "main" java.lang.IllegalArgumentException: Can not determine a default Coder for a 'Create' PTransform that has no elements. Either add elements, call Create.empty(Coder), Create.empty(TypeDescriptor), or call 'withCoder(Coder)' or 'withType(TypeDescriptor)' on the PTransform.
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
at org.apache.beam.sdk.transforms.Create.getDefaultCreateCoder(Create.java:684)
at org.apache.beam.sdk.transforms.Create.access$300(Create.java:110)
at org.apache.beam.sdk.transforms.Create$Values.expand(Create.java:359)
at org.apache.beam.sdk.transforms.Create$Values.expand(Create.java:277)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:548)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:482)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:44)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:177)
at com.google.measurement.DataProcessor.getAttributionSourceFiles(DataProcessor.java:157)
at com.google.measurement.DataProcessor.buildUserToSourceMap(DataProcessor.java:92)
at com.google.measurement.SimulationRunner.run(SimulationRunner.java:87)
at com.google.measurement.SimulationRunner.main(SimulationRunner.java:153)

Security Policy violation Binary Artifacts

This issue was automatically created by Allstar.

Security Policy Violation
Project is out of compliance with Binary Artifacts policy: binaries present in source code

Rule Description
Binary Artifacts are an increased security risk in your repository. Binary artifacts cannot be reviewed, allowing the introduction of possibly obsolete or maliciously subverted executables. For more information see the Security Scorecards Documentation for Binary Artifacts.

Remediation Steps
To remediate, remove the generated executable artifacts from the repository.

Artifacts Found

  • lib/LocalTestingTool_deploy.jar

Additional Information
This policy is drawn from Security Scorecards, which is a tool that scores a project's adherence to security best practices. You may wish to run a Scorecards scan directly on this repository for more details.


Allstar has been installed on all Google managed GitHub orgs. Policies are gradually being rolled out and enforced by the GOSST and OSPO teams. Learn more at http://go/allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

New LocalTestingTool jar is needed.

Hi team,
Looks like LocalTestingTool jar has a fix in the JSON writer (LocalJsonResultFileWriter.java) which fixes the bucket id.
The issue is that from if we are using the new LocalTestingTool which can be download from here:

VERSION=0.12.0; curl -f -o LocalTestingTool_$VERSION.jar https://aggregation-service-published-artifacts.s3.amazonaws.com/aggregation-service/$VERSION/LocalTestingTool_$VERSION.jar

We will get the following exception when trying to process more than 1 event:

at com.google.inject.internal.InternalProvisionException.toProvisionException(InternalProvisionException.java:251)
	at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1104)
	at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1134)
	at com.google.aggregate.adtech.worker.AggregationWorker.createServiceManager(AggregationWorker.java:31)
	at com.google.aggregate.adtech.worker.LocalRunner.internalMain(LocalRunner.java:55)
	at com.google.measurement.adtech.LocalAggregationRunner.runAggregator(LocalAggregationRunner.java:32)
	at com.google.measurement.adtech.ProcessBatch.processElement(ProcessBatch.java:118)
Caused by: java.lang.IllegalStateException: GlobalOpenTelemetry.set has already been called. GlobalOpenTelemetry.set must be called only once before any calls to GlobalOpenTelemetry.get. If you are using the OpenTelemetrySdk, use OpenTelemetrySdkBuilder.buildAndRegisterGlobal instead. Previous invocation set to cause of this exception.
	at io.opentelemetry.api.GlobalOpenTelemetry.set(GlobalOpenTelemetry.java:104)
	at io.opentelemetry.sdk.OpenTelemetrySdkBuilder.buildAndRegisterGlobal(OpenTelemetrySdkBuilder.java:85)
	at com.google.privacysandbox.otel.OtlpJsonLoggingOTelConfigurationModule.provideOtelConfig(OtlpJsonLoggingOTelConfigurationModule.java:60)
	at com.google.privacysandbox.otel.OtlpJsonLoggingOTelConfigurationModule$$FastClassByGuice$$16364919.GUICE$TRAMPOLINE(<generated>)
	at com.google.privacysandbox.otel.OtlpJsonLoggingOTelConfigurationModule$$FastClassByGuice$$16364919.apply(<generated>)
	at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:260)
	at com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171)
	at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185)
	at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:169)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:40)
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:60)
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:113)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:300)
	at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:60)
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:40)
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:60)
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:113)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:300)
	at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:60)
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:40)
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:60)
	at com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171)
	at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185)
	at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162)
	at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1101)
	at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1134)
	at com.google.aggregate.adtech.worker.AggregationWorker.createServiceManager(AggregationWorker.java:31)
	at com.google.aggregate.adtech.worker.LocalRunner.internalMain(LocalRunner.java:55)
	at com.google.measurement.adtech.LocalAggregationRunner.runAggregator(LocalAggregationRunner.java:32)
	at com.google.measurement.adtech.ProcessBatch.processElement(ProcessBatch.java:118)
	at com.google.measurement.adtech.ProcessBatch$DoFnInvoker.invokeProcessElement(Unknown Source)
	at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:228)
	at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:187)
	at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:79)
	at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:244)
	at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
	at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:165)
	at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:129)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.Throwable
	at io.opentelemetry.api.GlobalOpenTelemetry.set(GlobalOpenTelemetry.java:112)
	... 47 more
	

Any possible solution here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.