osofr / condensier Goto Github PK
View Code? Open in Web Editor NEWNon-parametric conditional density estimation with binned conditional histograms
License: MIT License
Non-parametric conditional density estimation with binned conditional histograms
License: MIT License
Regular multinomial regressions / classifiers typically will predict the probability of each class (for each observations)
Need a predict_class
or predict.condensier
function that will loop over all categorical classes available and separately predict the probability of each class.
The description file says that the license is GPL-2; the README that the license is MIT.
It probably does not matter much either way since both licenses are "open-source", but the GPL may be more "copyleft" than MIT
sample_value()
will not work when using pool = TRUE
.condensier/R/SummariesModelClass.R
Lines 205 to 215 in 9e177e5
BinOutModel$sampleA
. One potential approach is to re-create a loop over nbins, inside the loop mutate newdata with a new bin indicator, then keep call predict for the same fit.For a particular application I've run into, it would be very useful to be able to incorporate a weights
argument into fit_density
, similar in style to what's currently present in standard methods like glm
. This would likely be rather easily accomplished by incorporating weights into the step where the likelihood is fit within the fit_density
function (I'm not familiar with the code base for this package; otherwise, I'd offer a solution via a PR and not simply open an issue).
Would it be possible to incorporate this weights
argument if it's trivial @osofr? I'd be happy to help if I can.
Hi @osofr,
I couldn't sleep so I created a minimal working example for the problem I told you about. I wrote it without my glasses on, so sorry for any typos :-)
The code:
library('condensier')
library('data.table')
library('speedglm')
library('magrittr')
nbins <- 10
w_prob <- 0.75
nobs <- 13000
X <- rbinom(n=nobs,size=1, prob=w_prob)
mean <- 10 + (15 * X)
Y <- (rnorm(nobs,mean,5))
dat <- data.table(X = X, Y = Y)
bin_estimator <- condensier::glmR6$new()
dens_fit <- condensier::fit_density(X = 'X',
Y = 'Y',
input_data = dat,
nbins = nbins,
bin_estimator = bin_estimator)
res <- estimated_probabilities <- condensier::predict_probability(dens_fit, dat)
# Expect true twise
print(mean(res[dat$X == 1 & dat$Y >= 17]) > mean(res[dat$X == 0 & dat$Y < 17]))
print(mean(res[dat$X == 0 & dat$Y >= 17]) < mean(res[dat$X == 1 & dat$Y < 17]))
print(mean(res[dat$X == 1 & dat$Y >= 17])) # Expect high
print(mean(res[dat$X == 1 & dat$Y < 17])) # Expect low
print(mean(res[dat$X == 0 & dat$Y >= 17])) # Expect low
print(mean(res[dat$X == 0 & dat$Y < 17])) # Expect high
res <- condensier::sample_value(dens_fit, dat)
dat$Y[dat$X==0] %>% mean %>% print # xpect ~ 10
dat$Y[dat$X==1] %>% mean %>% print # xpect ~ 25
res[dat$X==0] %>% mean %>% print # xpect ~ 10
res[dat$X==1] %>% mean %>% print # xpect ~ 25
And the output:
TRUE
FALSE
0.05643434
0.01318017
0.04243487
0.003734035
10.06833
25.00594
25.05065
25.00862
As you can see, the outputs are not entirely what I expected them to be. I'm running this code using the condensier version in my branch, but I could try in the morning to run it of master (or an earlier version), to see if the problem persists.
I hope this clarifies the problem a bit.
Best,
Frank
The following code attempts to fit a marginal density using both pooled and unpooled condensier
estimates by way of sl3
. The true density is standard normal. The unpooled estimates (red) approximate the true density(blue), but the unpooled estimates (green) do not.
options(sl3.verbose = FALSE)
library("condensier")
library("sl3")
library("simcausal")
library(ggplot2)
D <- DAG.empty()
D <- D + node("I", distr = "rbern", prob = 1) +
node("B", distr = "rnorm", mean = 0, sd = 1)
D <- set.DAG(D, n.test = 10)
datO <- sim(D, n = 10000, rndseed = 12345)
# ================================================================================
task <- sl3_Task$new(datO, covariates=c("I"),outcome="B")
lrn_unpooled <- Lrnr_condensier$new(nbins = 25, bin_method = "equal.len", pool = FALSE,
bin_estimator = Lrnr_glm_fast$new(family = binomial()))
lrn_unpooled_fit <- lrn_unpooled$train(task)
lrn_pooled <- Lrnr_condensier$new(nbins = 25, bin_method = "equal.len", pool = TRUE,
bin_estimator = Lrnr_glm_fast$new(family = binomial()))
lrn_pooled_fit <- lrn_pooled$train(task)
# ================================================================================
# evaluate fit on training data
datO$pred_fB_unpooled <- lrn_unpooled_fit$predict(task)
datO$pred_fB_pooled <- lrn_pooled_fit$predict(task)
datO$fB <- dnorm(datO$B)
long <- melt(datO, id=c("B"), measure=c("pred_fB_unpooled","pred_fB_pooled","fB"))
ggplot(long, aes(x=B, y=value, color=variable)) + geom_point() + theme_bw()
Looks like simcausal isn't on CRAN anymore. Can you add its github repo to the list of remotes in DESCRIPTION?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.