GithubHelp home page GithubHelp logo

osofr / condensier Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 1.68 MB

Non-parametric conditional density estimation with binned conditional histograms

License: MIT License

R 98.52% C++ 1.21% TeX 0.27%
density learner-density hazard bin-hazard cross-validation likelihood

condensier's People

Contributors

frbl avatar nhejazi avatar osofr avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

rfherrerac

condensier's Issues

What is the license?

The description file says that the license is GPL-2; the README that the license is MIT.

It probably does not matter much either way since both licenses are "open-source", but the GPL may be more "copyleft" than MIT

Implement sampling from a long-format density fit

  • sample_value() will not work when using pool = TRUE.
  • The approach needs to somehow replicate the functionality in
    for (k_i in seq_along(private$PsAsW.models)) {
    sampleA_newcat <- private$PsAsW.models[[k_i]]$sampleA(newdata = newdata, ...)
    if (k_i == 1L) sampleA_mat[, k_i] <- sampleA_newcat
    # carry forward all previously sampled 1's (degenerate ones a bin a chosen for the first time)
    if (k_i > 1) {
    # if you succeeded at the previous bin, your 1L is carried through till the end:
    sampleA_mat[(sampleA_mat[, k_i - 1] == 1L), k_i] <- 1L
    # if you haven't succeeded at the previous bin, you get a chance to succeed at this category:
    sampleA_mat[(sampleA_mat[, k_i - 1] == 0L), k_i] <- sampleA_newcat[(sampleA_mat[, k_i - 1] == 0L)]
    }
    }
    but based on a single (pooled) model fit for all bin hazards.
  • This needs to be done directly inside BinOutModel$sampleA. One potential approach is to re-create a loop over nbins, inside the loop mutate newdata with a new bin indicator, then keep call predict for the same fit.

fitting densities with weights

For a particular application I've run into, it would be very useful to be able to incorporate a weights argument into fit_density, similar in style to what's currently present in standard methods like glm. This would likely be rather easily accomplished by incorporating weights into the step where the likelihood is fit within the fit_density function (I'm not familiar with the code base for this package; otherwise, I'd offer a solution via a PR and not simply open an issue).

Would it be possible to incorporate this weights argument if it's trivial @osofr? I'd be happy to help if I can.

Weird sampling issue

Hi @osofr,

I couldn't sleep so I created a minimal working example for the problem I told you about. I wrote it without my glasses on, so sorry for any typos :-)

The code:

 library('condensier')
 library('data.table')
 library('speedglm')
 library('magrittr')

 nbins <- 10
 w_prob <- 0.75
 nobs <- 13000

 X <- rbinom(n=nobs,size=1, prob=w_prob)
 mean <- 10 + (15 * X)
 Y <- (rnorm(nobs,mean,5))
 dat <- data.table(X = X, Y = Y)


 bin_estimator <- condensier::glmR6$new()

 dens_fit <- condensier::fit_density(X = 'X',
                         Y = 'Y',
                         input_data = dat,
                         nbins = nbins,
                         bin_estimator = bin_estimator)

 res <- estimated_probabilities <- condensier::predict_probability(dens_fit, dat)

 # Expect true twise
 print(mean(res[dat$X == 1 & dat$Y >= 17]) > mean(res[dat$X == 0 & dat$Y < 17]))
 print(mean(res[dat$X == 0 & dat$Y >= 17]) < mean(res[dat$X == 1 & dat$Y < 17]))

 print(mean(res[dat$X == 1 & dat$Y >= 17])) # Expect high
 print(mean(res[dat$X == 1 & dat$Y < 17])) # Expect low
 print(mean(res[dat$X == 0 & dat$Y >= 17])) # Expect low
 print(mean(res[dat$X == 0 & dat$Y < 17])) # Expect high

 res <- condensier::sample_value(dens_fit, dat)
 dat$Y[dat$X==0] %>% mean %>% print # xpect ~ 10
 dat$Y[dat$X==1] %>% mean %>% print # xpect ~ 25
 res[dat$X==0] %>% mean %>% print # xpect ~ 10
 res[dat$X==1] %>% mean %>% print # xpect ~ 25

And the output:

TRUE
FALSE

0.05643434
0.01318017
0.04243487
0.003734035

10.06833
25.00594
25.05065
25.00862

As you can see, the outputs are not entirely what I expected them to be. I'm running this code using the condensier version in my branch, but I could try in the morning to run it of master (or an earlier version), to see if the problem persists.

I hope this clarifies the problem a bit.

Best,
Frank

Poor pooled estimates in simple case

The following code attempts to fit a marginal density using both pooled and unpooled condensier estimates by way of sl3. The true density is standard normal. The unpooled estimates (red) approximate the true density(blue), but the unpooled estimates (green) do not.

options(sl3.verbose = FALSE)
library("condensier")
library("sl3")
library("simcausal")
library(ggplot2)


D <- DAG.empty()
D <- D + node("I", distr = "rbern", prob = 1) +
     node("B", distr = "rnorm", mean = 0, sd = 1)

D <- set.DAG(D, n.test = 10)
datO <- sim(D, n = 10000, rndseed = 12345)

# ================================================================================
task <- sl3_Task$new(datO, covariates=c("I"),outcome="B")
lrn_unpooled <- Lrnr_condensier$new(nbins = 25, bin_method = "equal.len", pool = FALSE,
                            bin_estimator = Lrnr_glm_fast$new(family = binomial()))
lrn_unpooled_fit <- lrn_unpooled$train(task)

lrn_pooled <- Lrnr_condensier$new(nbins = 25, bin_method = "equal.len", pool = TRUE,
                                    bin_estimator = Lrnr_glm_fast$new(family = binomial()))
lrn_pooled_fit <- lrn_pooled$train(task)


# ================================================================================


# evaluate fit on training data
datO$pred_fB_unpooled <- lrn_unpooled_fit$predict(task)
datO$pred_fB_pooled <- lrn_pooled_fit$predict(task)
datO$fB <- dnorm(datO$B)
long <- melt(datO, id=c("B"), measure=c("pred_fB_unpooled","pred_fB_pooled","fB"))
ggplot(long, aes(x=B, y=value, color=variable)) + geom_point() + theme_bw()

image

Add simcausal to remotes

Looks like simcausal isn't on CRAN anymore. Can you add its github repo to the list of remotes in DESCRIPTION?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.