aangelopoulos / ppi_py Goto Github PK
View Code? Open in Web Editor NEWA package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.
License: MIT License
A package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.
License: MIT License
I'm trying to do the power analysis as found in the examples for NPS values. I have data that I shared in the attached file for Y_total
and Yhat_total
. I copied the code from the examples and modified it for my use case.
I do not understand the error I am getting. I am trying to report confidence intervals for the average value of the NPS. I can get this to work for percentages. Can you explain the error reason?
# Find n such that we reject H0: average NPS score <= 3 with probability 80% using a test at level alpha
num_experiments = 100
list_rand_idx = [
np.random.permutation(n_total) for i in range(num_experiments)
]
alpha=0.05
def _to_invert_ppi(n):
n = int(n)
nulls_rejected = 0
# Data setup
for i in range(num_experiments):
rand_idx = list_rand_idx[i]
_Yhat = Yhat_total[rand_idx[:n]]
_Y = Y_total[rand_idx[:n]]
_Yhat_unlabeled = Yhat_total[rand_idx[n:]]
ppi_ci = ppi_mean_ci(_Y, _Yhat, _Yhat_unlabeled, alpha=alpha)
if ppi_ci[0] > 3:
nulls_rejected += 1
return nulls_rejected / num_experiments - 0.8
def _to_invert_classical(n):
n = int(n)
nulls_rejected = 0
# Data setup
for i in range(num_experiments):
rand_idx = list_rand_idx[i]
_Y = Y_total[rand_idx[:n]]
classical_ci = classical_mean_ci(_Y, alpha=alpha)
if classical_ci[0] > 3:
nulls_rejected += 1
return nulls_rejected / num_experiments - 0.8
n_ppi = int(brentq(_to_invert_ppi, 1, 1000, xtol=1))
n_classical = int(brentq(_to_invert_classical, 1, 1000, xtol=1))
print(
f"The PPI test requires n={n_ppi} labeled data points to reject the null."
)
print(
f"The classical test requires n={n_classical} labeled data points to reject the null."
)
The computation of lhat is slow due to a for loop to compute the numerator of the optimal lambda.
Replacing this for loop with a matrix product drastically improves performance.
Thanks for this awesome package. As far as I could read them, both the package docs as well as the related research papers mostly speak of "labels" for X and Y. However, some of the examples also use continuous variables. Will this package work also for continous variables for both X, Y, Yhat, X_unlabeled etc.?
The ppi_mean_ci
function does not handle well cases where labelled/unlabelled datapoints are multidimensional.
The observation weights do not broadcast well with the Y
matrix.
For instance, here, weights of shape (n_obs,)
, can't be multiplied with Y
, when the latter are of shape (n_obs, d)
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.