GithubHelp home page GithubHelp logo

jaredhuling / oem Goto Github PK

View Code? Open in Web Editor NEW
27.0 6.0 6.0 8.66 MB

Penalized least squares estimation using the Orthogonalizing EM (OEM) algorithm

Home Page: http://jaredhuling.org/oem

C++ 53.99% R 20.56% C 0.52% CSS 0.44% HTML 24.49%
variable-selection cran r-package machine-learning penalized-regression lasso group-lasso oem oem-algorithm mcp

oem's People

Contributors

jaredhuling avatar yixuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

oem's Issues

Support AIC and BIC calculation

Would it be a lot of trouble to define methods to calculate AIC and BIC values for a fitted oem model, similar to the way the ncvreg package does this? (i.e. returning a vector of AIC or BIC values for each lambda value that was used) This would help selecting the best lambda based on the AIC or BIC criterion, which is faster than using cross validation.

Mean Squared Error - OEM

Hi Jared,

We have constructed a model for Adaptive Group Lasso, and we are going to run an out-of-sample prediction.
We are struggling to make an out-of-sample prediction, where we want to measure the corresponding MSE.

Please find the codes below, hope this is something you can help us with!

second lasso step

m2 <- oem(x=Z[,-1], y=y, intercept=T, penalty="grp.lasso", groups=group.ind.lasso, group.weights=lasso.weights, ncores=8)
B <- m2$beta[[1]]
n.nonzeros <- colSums(abs(B)>0)
lssr <- log(colSums(((Z %*% B)-YY)**2))
lssr[!is.finite(lssr)] <- 1e10

information criterion

b_np <- solve(crossprod(Z), crossprod(Z,y))
group.norm.ols <- sapply(1:p2, function(idx) sqrt(sum((b_np[-1][group.ind.lasso==idx])**2)))
group.norm.ols <- matrix(rep(group.norm.ols,times=dim(B)[2]),ncol=dim(B)[2],nrow=p2)
group.norm.lasso <- sapply(1:dim(B)[2],function(i) sapply(1:p2, function(idx) sqrt(sum((B[-1,][group.ind.lasso==idx,i])**2))))
df <- (Kn+1) * colSums((group.norm.lasso / group.norm.ols)) + n.nonzeros / (Kn+2)
bic <- lssr + df * log(n)/n
min.ic <- which(bic==min(bic))

if(n.nonzeros[min.ic]>1) {
selected.chars.final <- unique(sapply(strsplit(names(B[abs(B[,min.ic])>0,min.ic])[-1],"[.]"),function(x) x[1]))
}
else { selected.chars.final <- NA }

AUPRC as a loss function

Hi all,

Thanks for the excellent package! Is there any way to use the Area Under the Precision-Recall Curve (AUPRC) as the measure to evaluate cross-validation? This would be useful in cases of an imbalanced dataset.

Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3), e0118432.

Cheers,
Pedro

fatal error: 'Linalg/AVX.h' file not found

Hi. Package looks awesome, want to try it. During installation got following error:

In file included from oem_dense.cpp:3:
./DataStd.h:10:10: fatal error: 'Linalg/AVX.h' file not found
#include "Linalg/AVX.h"
^
1 error generated.
make: *** [oem_dense.o] Error 1
ERROR: compilation failed for package ‘oem’
removing ‘/Users/dymitriyselivanov/Library/R/3.3/library/oem’
Error: Command failed (1)

separate lambda sequence for each penalty

Currently, the user is forced to use the same sequence of tuning parameters (lambda) for each penalty, yet different penalties (like elastic net vs lasso vs grp lasso) require their tuning parameters to be on different scales. A fix would be to change all code structures to allow for separate sequence for each penalty. If user specifies lambda=NULL, then a sequence will be computed for each penalty. Users can specify a vector for lambda which would apply the same sequence for all penalties, or the user could supply a list of vectors.

This issue will take a decent amount of work to resolve

Support observation weights

I noticed that observation weights are not yet supported - are these not simple to incorporate by noticing that weighted least squares minimizes sum( w * (Y - X %% beta)^2 )
= sum_i( (sqrt(w[i]) * X[i, ] %
% beta - sqrt(w[i]) * Y[i])^2 ), meaning that one could simulate observation weights simply by multiplying both the design matrix X and the dependent variance Y by sqrt(weights) ?

Support box constraints?

Would you see any possibility to support box constraints on the fitted coefficients, similar to what can be achieved with lower.limits and upper.limits in glmnet?

Mnet or Snet penalties

Were there any plans to add Mnet or Snet penalties? I might give it a try, if you don't mind.

Error Using OEM with Categorical Variables

Hello,
I am trying to use OEM for grouped LASSO on a tall dataset which contains many categorical variables (and as a result, lots of binary variables in the model matrix). While running cv.oem, I keep getting the following error:

Error in oemfit.binomial(is.sparse, x, y, family, penalty, weights, groups, :
TridiagEigen: failed to compute all the eigenvalues

This does not just extend to my use case, but also to smaller datasets like the birthwt data from the MASS package. I am pasting a reproducible example below:

`
library(MASS)
library(splines)
library(oem)

#Load and create Model Matrix
data("birthwt")
view(birthwt)

birthwt$race = as.factor(birthwt$race)
birthwt$smoke = as.factor(birthwt$smoke)
birthwt$low = as.factor(birthwt$low)

X = model.matrix(low~ns(age,3)+ns(lwt,3)+race+smoke+ptl, birthwt)[,-1]
Y = birthwt$low

#Define Groups
grouping = c(1,1,1,2,2,2,3,3,4,5,5,5)

#Run cv.oem for Logistic Regression with Group LASSO penalty:
cvoem = cv.oem(X, Y, family = "binomial", penalty = "grp.lasso", groups = grouping, nfolds = 10)
`

Any help regarding: 1) an explanation of the issue and 2) a workaround would be much appreciated. cv.gglasso is just too slow!

Thanks.

n slighly > p

Dear OEM,

I have been using cv.oem. I was trying to keep n > p. The problem is that once n is slightly closer to p, R session either crashes or does not finish, both on Windows and Linux (both in RStudio or the console). The issue gets worse with grouping and cross validation. Please find my example.

Thank you.

nobs  <- 150
nvars <- 60

X <- matrix(rnorm(nobs * nvars), ncol = nvars)
group.indicators <- rep(1:(60/10), each = 6)
y <- rbinom(nobs, 1, 
  prob = 1 / (1 + exp(-drop(X %*% c(0.15, 0.15, -0.15, -0.15, 0.25, rep(0, nvars - 5)))))
)

input <- X

train_rows <- sample(1:nobs, 0.66 * nobs)
x.train <- as.matrix(input[train_rows, ])
x.test <- as.matrix(input[- (train_rows), ])

y.train <- y[train_rows]
y.test <- y[-train_rows]

cvfit <- oem::cv.oem(
  x = x.train, y = y.train,
  penalty = "grp.lasso",
  groups = group.indicators,
  type.measure = "auc",
  nlambda = 100,
  grouped = TRUE,
  family = "binomial"
)
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.1.19 

Unable to install OEM

Hi, I am trying to install the package OEM through devtools and install.package, but it there seems to be something wrong with the package. Does anyone know another way to download OEM? :)

devtools::install_github("jaredhuling/oem")

Error:` Failed to install 'oem' from GitHub:
  (converted from warning) installation of package ‘/var/folders/yq/wkf_lslx1p92415pjkl238_c0000gn/T//Rtmp9eG72C/file5538fb525f0/oem_2.0.10.tar.gz’ had non-zero exit status

oem for real data analysis

I tryed using the oem package for variable selection for a quite large dataset, about 1 million observations and several variabales, around 60, both numerical and categorical. I used the sparse.model.matrix to transform the categorical to numerical, so I get some 1500 colons to the dataset. But then the X^TX is no longer invertible and so the oem function does not work. Can you recommend me what can I do when I want to do variable selection in a dataset with huge amount of lines and some categorical explanatory variables and also a few numerical ?

Installation error on windows

I'm getting the following error when installing oem on windows, using devtools::install_github("jaredhuling/oem")

*** arch - i386
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c lanczos.cpp -o lanczos.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem.cpp -o oem.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_big.cpp -o oem_big.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_dense.cpp -o oem_dense.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_fb_big.cpp -o oem_fb_big.o
g++ -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -O3 -Wall -std=gnu99 -mtune=generic -c oem_init.c -o oem_init.o
cc1plus.exe: warning: command line option '-std=gnu99' is valid for C/ObjC but not for C++
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_logistic_dense.cpp -o oem_logistic_dense.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_logistic_sparse.cpp -o oem_logistic_sparse.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_sparse.cpp -o oem_sparse.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_xtx.cpp -o oem_xtx.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c oem_xval_dense.cpp -o oem_xval_dense.o
C:/Rtools/mingw_32/bin/g++ -std=gnu++11 -I"C:/PROGRA1/R/R-361.2/include" -DNDEBUG -I. -DNDEBUG -I"C:/Users/Bryor/Documents/R/win-library/3.6/Rcpp/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppEigen/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/BH/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/bigmemory/include" -I"C:/Users/Bryor/Documents/R/win-library/3.6/RcppArmadillo/include" -fopenmp -O3 -c utils.cpp -o utils.o
C:\Rtools\mingw_32\bin\nm.exe: oem_init.o: File format not recognized
C:/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o oem.dll tmp.def lanczos.o oem.o oem_big.o oem_dense.o oem_fb_big.o oem_init.o oem_logistic_dense.o oem_logistic_sparse.o oem_sparse.o oem_xtx.o oem_xval_dense.o utils.o -fopenmp -LC:/PROGRA1/R/R-361.2/bin/i386 -lRlapack -LC:/PROGRA1/R/R-361.2/bin/i386 -lRblas -lgfortran -lm -lquadmath -LC:/PROGRA1/R/R-361.2/bin/i386 -lR
oem_init.o: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'oem'

  • removing 'C:/Users/Bryor/Documents/R/win-library/3.6/oem'
    Error: Failed to install 'oem' from GitHub:
    (converted from warning) installation of package ‘C:/Users/Bryor/AppData/Local/Temp/Rtmp4auAQT/file53b832487104/oem_2.0.10.tar.gz’ had non-zero exit status

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.