The ppi_py from aangelopoulos

ppi_py's Issues

Power Analysis Error - ValueError: f(a) and f(b) must have different signs

I'm trying to do the power analysis as found in the examples for NPS values. I have data that I shared in the attached file for Y_total and Yhat_total. I copied the code from the examples and modified it for my use case.

I do not understand the error I am getting. I am trying to report confidence intervals for the average value of the NPS. I can get this to work for percentages. Can you explain the error reason?

# Find n such that we reject H0: average NPS score <= 3 with probability 80% using a test at level alpha
num_experiments = 100
list_rand_idx = [
    np.random.permutation(n_total) for i in range(num_experiments)
]
alpha=0.05


def _to_invert_ppi(n):
    n = int(n)
    nulls_rejected = 0
    # Data setup
    for i in range(num_experiments):
        rand_idx = list_rand_idx[i]
        _Yhat = Yhat_total[rand_idx[:n]]
        _Y = Y_total[rand_idx[:n]]
        _Yhat_unlabeled = Yhat_total[rand_idx[n:]]

        ppi_ci = ppi_mean_ci(_Y, _Yhat, _Yhat_unlabeled, alpha=alpha)
        if ppi_ci[0] > 3:
            nulls_rejected += 1
    return nulls_rejected / num_experiments - 0.8


def _to_invert_classical(n):
    n = int(n)
    nulls_rejected = 0
    # Data setup
    for i in range(num_experiments):
        rand_idx = list_rand_idx[i]
        _Y = Y_total[rand_idx[:n]]

        classical_ci = classical_mean_ci(_Y, alpha=alpha)

        if classical_ci[0] > 3:
            nulls_rejected += 1
    return nulls_rejected / num_experiments - 0.8


n_ppi = int(brentq(_to_invert_ppi, 1, 1000, xtol=1))
n_classical = int(brentq(_to_invert_classical, 1, 1000, xtol=1))
print(
    f"The PPI test requires n={n_ppi} labeled data points to reject the null."
)
print(
    f"The classical test requires n={n_classical} labeled data points to reject the null."
)

trial_values.json

The optimal lambda computation can be optimized

The computation of lhat is slow due to a for loop to compute the numerator of the optimal lambda.
Replacing this for loop with a matrix product drastically improves performance.

Can "labels" also be continuous variables?

Thanks for this awesome package. As far as I could read them, both the package docs as well as the related research papers mostly speak of "labels" for X and Y. However, some of the examples also use continuous variables. Will this package work also for continous variables for both X, Y, Yhat, X_unlabeled etc.?

`ppi_mean_ci` does not support multidimensional inference problems

The ppi_mean_ci function does not handle well cases where labelled/unlabelled datapoints are multidimensional.

The observation weights do not broadcast well with the Y matrix.
For instance, here, weights of shape (n_obs,), can't be multiplied with Y, when the latter are of shape (n_obs, d).

aangelopoulos / ppi_py Goto Github PK

ppi_py's People

Contributors

Stargazers

Watchers

Forkers

ppi_py's Issues

Power Analysis Error - ValueError: f(a) and f(b) must have different signs

The optimal lambda computation can be optimized

Can "labels" also be continuous variables?

`ppi_mean_ci` does not support multidimensional inference problems

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs