GithubHelp home page GithubHelp logo

rudolfjagdhuber / exhaustivesearch Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 1.0 146 KB

ExhaustiveSearch: A Fast and Scalable Exhaustive Feature Selection Framework

Home Page: https://github.com/RudolfJagdhuber/ExhaustiveSearch/

License: GNU General Public License v3.0

R 48.20% C++ 51.80%
aic exhaustive-search feature-selection linear-regression logistic-regression machine-learning model-selection mse r-package

exhaustivesearch's People

Contributors

rudolfjagdhuber avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

minghao2016

exhaustivesearch's Issues

All R-CMD-checks currently fail

For some reasons all current checks are failing.

This is not related to changes I made. I checked the tests on a previous stable version, which had 0/0/0 and now also fails with identically cryptic error.

On my local machine and rhub all (extended) checks are successful.

Warnings refer to Rcpp, not my package. So I suppose time will solve this by itself. There is no problem with the current version.

installation error on solaris (submission review)

In the final checks, after the package was accepted by the reviewer, errors on installation at r-patched-solaris-x86 were found (Everything else worked).
Here is the install log:

* installing to library ‘/home/ripley/R/Lib32’
* installing *source* package ‘ExhaustiveSearch’ ...
** package ‘ExhaustiveSearch’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
/opt/csw/bin/g++ -std=gnu++11 -I"/home/ripley/R/gcc/include" -DNDEBUG  -I'/home/ripley/R/Lib32/Rcpp/include' -I'/home/ripley/R/Lib32/RcppArmadillo/include' -I/opt/csw/include -I/usr/local/include -fopenmp -fPIC  -O2  -c Combination.cpp -o Combination.o
In file included from Combination.cpp:4:0:
Combination.h:16:2: error: ‘size_t’ does not name a type
  size_t m_nCombinations;
  ^
Combination.h:18:3: error: ‘size_t’ does not name a type
   size_t m_nBatches;
   ^
Combination.h:20:15: error: ‘size_t’ was not declared in this scope
   std::vector<size_t> m_batchSizes;
               ^
Combination.h:20:15: note: suggested alternatives:
In file included from /opt/csw/include/c++/5.2.0/bits/stl_algobase.h:59:0,
                 from /opt/csw/include/c++/5.2.0/vector:60,
                 from Combination.h:3,
                 from Combination.cpp:4:
/opt/csw/include/c++/5.2.0/i386-pc-solaris2.10/bits/c++config.h:196:26: note:   ‘std::size_t’
   typedef __SIZE_TYPE__  size_t;
                          ^
/opt/csw/include/c++/5.2.0/i386-pc-solaris2.10/bits/c++config.h:196:26: note:   ‘std::size_t’
In file included from Combination.cpp:4:0:
Combination.h:20:21: error: template argument 1 is invalid
   std::vector<size_t> m_batchSizes;
                     ^

Obviously size_t is not recognized as a type here.
This should be very easy to fix by including the respective header, that is missing.

  • Make fixes
  • Test explicitly for the given environment
  • Resubmit fixed version

Improve batch splitting algorithm

Currently, combinations are not consistently split into nThreads partitions.

Improve the simple "split-by-first-digit" algorithm to something more consistent.

Main question: How to consistently find combinations, that split the whole task into B approximately equal sized batches.

Eg.: Task is all combinations of 1 to 10 in two partitions

  • Range (1) to (1, 2, 3, 4, 5) contains 511 combinations
  • Range (1, 2, 3, 4, 6) to (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) conatins 512 combinations

Review for submission on CRAN

After the first submission, a review with additional tasks came back. These are:

  • If there are references, add these to the DESCRIPTION.

  • Please add \value{} to the .Rd files regarding exported methods and explain the functions results in the documentation. Please write about the structure of the output (class) and also what the output means. (If a function does not return a value, please document that too, e.g. \value{No return value, called for side effects} or similar)

  • \dontrun should only be used if the example really cannot be executed (e.g. because of missing additional software, missing API keys, ...) by the user. That's why wrapping examples in \dontrun{} adds the comment ("# Not run:") as a warning for the user. Does not seem necessary. Please unwrap the examples if they are executable in < 5 sec, or replace \dontrun{} with \donttest{}.

  • Fix, add comments and resubmit.

Estimate of remaining runtime.

Instead (or additionally) to the runtime of the exhaustive search, a nice piece of information for the status logs during execution would be the expected remaining runtime.

Should be easy to implement.

Include checks (e.g Rcmd-check, travis)

Many repositories include badges that show the results of automated checks on the current github version.

I think this is a nice feature to include here as well.

Make a readme for Github

The github page needs a README.md with

  • motivation for package
  • installation
  • examples
  • performance
  • ...

Splitting Exhaustive Tasks

An exhaustive evaluation could be splitted into multiple smaller subtasks. These need to be consistent whereever you execute them.

Easy implementation, as task is already split into consistent batches.

-> Split into more batches and guide which one to execute in the function call.

This also would need a similar logic to the one in C++ to combine these multiple result objects correctly.

Allow user interrupts from R

The execution in C++ is multi-threaded, so simple calls to Rcpp::checkUserInterrupt() won't work.

Furthermore, for some reason I cannot get RcppThreads working with futures, as it does not compile in a setting, where everything else is similar and only std::thread is replaced by RcppThread::Thread. So i can also not use this framework out of the box.

Currently, R crashes when an interrupt is requested. This needs to be solved somehow.

R-cmd-check: Note on compiled code

Note: information on .o files for x64 is not available
File './ExhaustiveSearch.Rcheck/ExhaustiveSearch/libs/x64/ExhaustiveSearch.dll':
Found 'abort', possibly from 'abort' (C), 'runtime' (Fortran)
Found 'exit', possibly from 'exit' (C), 'stop' (Fortran)
Found 'printf', possibly from 'printf' (C)

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs. The detected symbols are linked into the code but
might come from libraries and not actually be called.

See 'Writing portable packages' in the 'Writing R Extensions' manual.

Release ExhaustiveSearch 1.0.0

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • usethis::use_cran_comments()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • urlchecker::url_check()
  • Update cran-comments.md

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_news_md()
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Update install instructions in README

Add some analyses and plots of the result

Currently, the only functionality on the results of the exhaustive search is to have a nice print function.

It would also be nice to have a custom plot(), or other very basic analyses of the obtained results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.