GithubHelp home page GithubHelp logo

aangelopoulos / ppi_py Goto Github PK

View Code? Open in Web Editor NEW
169.0 5.0 8.0 2.73 MB

A package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.

License: MIT License

Python 100.00%
confidence-interval confidence-intervals inference machine-learning p-value p-values statistical-analysis statistical-inference statistics

ppi_py's People

Contributors

aangelopoulos avatar actions-user avatar eltociear avatar pierreboyeau avatar tijana-zrnic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ppi_py's Issues

Power Analysis Error - ValueError: f(a) and f(b) must have different signs

I'm trying to do the power analysis as found in the examples for NPS values. I have data that I shared in the attached file for Y_total and Yhat_total. I copied the code from the examples and modified it for my use case.

I do not understand the error I am getting. I am trying to report confidence intervals for the average value of the NPS. I can get this to work for percentages. Can you explain the error reason?

# Find n such that we reject H0: average NPS score <= 3 with probability 80% using a test at level alpha
num_experiments = 100
list_rand_idx = [
    np.random.permutation(n_total) for i in range(num_experiments)
]
alpha=0.05


def _to_invert_ppi(n):
    n = int(n)
    nulls_rejected = 0
    # Data setup
    for i in range(num_experiments):
        rand_idx = list_rand_idx[i]
        _Yhat = Yhat_total[rand_idx[:n]]
        _Y = Y_total[rand_idx[:n]]
        _Yhat_unlabeled = Yhat_total[rand_idx[n:]]

        ppi_ci = ppi_mean_ci(_Y, _Yhat, _Yhat_unlabeled, alpha=alpha)
        if ppi_ci[0] > 3:
            nulls_rejected += 1
    return nulls_rejected / num_experiments - 0.8


def _to_invert_classical(n):
    n = int(n)
    nulls_rejected = 0
    # Data setup
    for i in range(num_experiments):
        rand_idx = list_rand_idx[i]
        _Y = Y_total[rand_idx[:n]]

        classical_ci = classical_mean_ci(_Y, alpha=alpha)

        if classical_ci[0] > 3:
            nulls_rejected += 1
    return nulls_rejected / num_experiments - 0.8


n_ppi = int(brentq(_to_invert_ppi, 1, 1000, xtol=1))
n_classical = int(brentq(_to_invert_classical, 1, 1000, xtol=1))
print(
    f"The PPI test requires n={n_ppi} labeled data points to reject the null."
)
print(
    f"The classical test requires n={n_classical} labeled data points to reject the null."
)

trial_values.json

Can "labels" also be continuous variables?

Thanks for this awesome package. As far as I could read them, both the package docs as well as the related research papers mostly speak of "labels" for X and Y. However, some of the examples also use continuous variables. Will this package work also for continous variables for both X, Y, Yhat, X_unlabeled etc.?

`ppi_mean_ci` does not support multidimensional inference problems

The ppi_mean_ci function does not handle well cases where labelled/unlabelled datapoints are multidimensional.

The observation weights do not broadcast well with the Y matrix.
For instance, here, weights of shape (n_obs,), can't be multiplied with Y, when the latter are of shape (n_obs, d).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.