vinecopulib / vinecopulib Goto Github PK
View Code? Open in Web Editor NEWA C++ library for vine copula models (w/ interfaces to R + Python)
License: MIT License
A C++ library for vine copula models (w/ interfaces to R + Python)
License: MIT License
Can we delete it now?
Should we ask Travis to enable OSX builds for vinecopulib?
https://github.com/tvatter/vinecopulib/blob/clean-up/src/bicop/indep.cpp#L53
/home/vagrant/devel/vinecopulib/src/bicop/indep.cpp: In member function ‘virtual Eigen::MatrixXd vinecopulib::IndepBicop::tau_to_parameters(const double&)’:
/home/vagrant/devel/vinecopulib/src/bicop/indep.cpp:53:31: error: call of overloaded ‘Matrix(int)’ is ambiguous
Eigen::VectorXd pars(0);
@tvatter
So I finally found a "bug" i've been searching for two days in the structure selection code. The bug is that
I tried to read the R-vine matrix in a way that makes sense, and not the way VineCopula wants me to do it ;) That reminded me of a other things I didn't like about the way the matrix is oriented. Let me explain with a small example:
4 0 0 0
3 3 0 0
2 2 2 0
1 1 1 1
4, 3; 2 1
, but in VineCopula it's supposed to be read as 3, 4; 2 1
.(i, j)
entry of the matrix is related to edge j
in tree i
. Instead it relates to edge j
in tree 5 - i
. That's also one of the reasons why indexing this matrix in algorithms is so painful (and I reversed the matrices internally). Also it doesn't look like a grapevine ;)So a solution to both points would be to store the matrix as
1 1 1 1
2 2 2 0
3 3 0 0
4 0 0 0
and read it from the bottom up, e.g. 4, 3 ; 2 1
etc.
So my question is: Should we use this storage order for the matrix?
The downside would be that we risk to get people confused. There will be three different ways to write this matrix out there (the third is the "upper triangular style" used in Harry Joe's book and library and causes the same indexing problems).
the following install logic for CMake will place it in the correct place for OSX/Linux
install(
FILES
vinecopulib-config.cmake
DESTINATION
share/vinecopulib
)
Python bindings do not require Boost 1.63 anymore (since we use NumPy C bindings instead of the Boost.NumPy).
Should we relax the requirement in cmake/findDependencies.cmake?
(I remember that there was something requireing a recent version in the C++ vinecopulib code, but I don't remember what it was and what the version was)
Finally, if 1.63 is not required, perhaps we can skip the Ubuntu update in travis?
In file included from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:38:0,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/home/vagrant/devel/vinecopulib/src/../include/c_tools.h:8:0: warning: "false" redefined
#define false 0
^
In file included from /usr/include/c++/4.9/bits/atomic_base.h:36:0,
from /usr/include/c++/4.9/atomic:41,
from /usr/include/boost/math/special_functions/detail/bernoulli_details.hpp:19,
from /usr/include/boost/math/special_functions/bernoulli.hpp:16,
from /usr/include/boost/math/special_functions/gamma.hpp:35,
from /home/vagrant/devel/vinecopulib/src/../include/boost_tools.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdbool.h:42:0: note: this is the location of the previous definition
#define false false
^
In file included from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:38:0,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/home/vagrant/devel/vinecopulib/src/../include/c_tools.h:9:0: warning: "true" redefined
#define true (!false)
^
In file included from /usr/include/c++/4.9/bits/atomic_base.h:36:0,
from /usr/include/c++/4.9/atomic:41,
from /usr/include/boost/math/special_functions/detail/bernoulli_details.hpp:19,
from /usr/include/boost/math/special_functions/bernoulli.hpp:16,
from /usr/include/boost/math/special_functions/gamma.hpp:35,
from /home/vagrant/devel/vinecopulib/src/../include/boost_tools.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdbool.h:43:0: note: this is the location of the previous definition
#define true true
Github has its "release" mechanism. Currently, it shows no releases for vinecopulib:
This can be achieved by tagging a commit. Before doing it, I suggest committing the git_revision.hpp file, so it will be present in the zip/tarball downloadable by the users. After releasing the file can be removed from the repo (it is autogenerated by CMake for users obtaining the library via git).
Here is a hello world Boost.Graph code demonstrating:
Code:
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graph_utility.hpp>
#include <Eigen/Dense>
typedef boost::adjacency_list<
boost::setS, // store neighbors in a std::set
boost::vecS, // store vertices in a std::vector
boost::undirectedS
> GraphT;
GraphT make_complete_graph(int n)
{
auto graph = GraphT(n);
auto vertices = boost::vertices(graph);
auto &first = vertices.first;
auto &end = vertices.second;
for (auto &va = first; va != end; ++va)
for (auto vb = va+1; vb != end; ++vb)
add_edge(*va, *vb, graph);
return graph;
}
Eigen::MatrixXd adjacency_matrix(const GraphT &graph)
{
auto n = boost::num_vertices(graph);
auto m = Eigen::MatrixXd(n,n);
auto mm = m.triangularView<Eigen::Upper>();
auto edges = boost::edges(graph);
auto &first = edges.first;
auto &end = edges.second;
m.setZero();
for (auto edge = first; edge != end; ++edge)
{
int i = boost::source(*edge, graph),
j = boost::target(*edge, graph);
m(i,j) = 1;
}
return mm;
}
int main()
{
auto g = make_complete_graph(5);
std::cout << "complete graph" << std::endl;
boost::print_graph(g);
std::cout << adjacency_matrix(g) << std::endl;
boost::remove_edge(4,0, g);
std::cout << "after removal of one edge" << std::endl;
boost::print_graph(g);
std::cout << adjacency_matrix(g) << std::endl;
}
Output:
complete graph
0 <--> 1 2 3 4
1 <--> 0 2 3 4
2 <--> 0 1 3 4
3 <--> 0 1 2 4
4 <--> 0 1 2 3
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
0 0 0 0 0
after removal of one edge
0 <--> 1 2 3
1 <--> 0 2 3 4
2 <--> 0 1 3 4
3 <--> 0 1 2 4
4 <--> 1 2 3
0 1 1 1 0
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
0 0 0 0 0
Below we post an example of how to call the Boost.Graph Prim's MST algorithm (which minimises edge weights sum). Boost also implements the Kruskal algorithm but its complexity is higher when starting with a complete graph.
#include <iostream>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/prim_minimum_spanning_tree.hpp>
#include <boost/graph/graph_utility.hpp>
using VertexT = boost::no_property;
using EdgeT = boost::property < boost::edge_weight_t, double >; // tau
typedef boost::adjacency_list <
boost::vecS,
boost::vecS,
boost::undirectedS,
VertexT,
EdgeT
> GraphT;
GraphT MST(const GraphT &in)
{
std::vector<int> p(num_vertices(in));
prim_minimum_spanning_tree(in, p.data());
GraphT out;
for (int source = 0; source < p.size(); ++source)
{
auto target = p[source];
if (source != target)
boost::add_edge(source, p[source], out);
// TODO: return weights?
}
return out;
}
int main()
{
GraphT g(5);
boost::add_edge(0,2, 1, g);
boost::add_edge(1,3, 1, g);
boost::add_edge(1,4, 2, g);
boost::add_edge(2,1, 7, g);
boost::add_edge(2,3, 3, g);
boost::add_edge(3,4, 1, g);
boost::add_edge(4,0, 1, g);
boost::print_graph(g);
std::cout << std::endl;
boost::print_graph(MST(g));
}
Output:
0 <--> 2 4
1 <--> 3 4 2
2 <--> 0 1 3
3 <--> 1 2 4
4 <--> 1 3 0
0 <--> 2 4
1 <--> 3
2 <--> 0
3 <--> 1 4
4 <--> 3 0
Hi,
So I was thinking about GSL vs Boost for the pdf/cdf/quantile. To mix GSL and Eigen, we can do:
#include <gsl/gsl_cdf.h>
#include <gsl/gsl_randist.h>
template<typename T> T dnorm_gsl(const T& x)
{
return x.unaryExpr(std::ptr_fun(gsl_ran_ugaussian_pdf));
};
template<typename T> T pnorm_gsl(const T& x)
{
return x.unaryExpr(std::ptr_fun(gsl_cdf_ugaussian_P));
};
template<typename T> T qnorm_gsl(const T& x)
{
return x.unaryExpr(std::ptr_fun(gsl_cdf_ugaussian_Pinv));
};
Equivalently, for boost, we can do:
#include <boost/bind.hpp>
#include <boost/math/distributions.hpp>
#include <boost/function.hpp>
template<typename T> T dnorm(const T& x)
{
boost::math::normal std_normal;
return x.unaryExpr(boost::bind<double>(boost::math::pdf<boost::math::normal,double>, std_normal, _1));
};
template<typename T> T pnorm(const T& x)
{
boost::math::normal std_normal;
return x.unaryExpr(boost::bind<double>(boost::math::cdf<boost::math::normal,double>, std_normal, _1));
};
template<typename T> T qnorm(const T& x)
{
boost::math::normal std_normal;
return x.unaryExpr(boost::bind<double>(boost::math::quantile<boost::math::normal,double>, std_normal, _1));
};
Note that the templates are used to include both matrices and vectors....
At first, I though that the drawback of the boost implementation was the instantiation of the distribution. However, when benchmarking for a sample size of 1e5, I noticed that GSL is simply faster (~4 times for the pdf and ~12 times for the cdf/quantile).
All install() commands are in an if-not-windows block as of now (
vinecopulib/cmake/buildTargets.cmake
Line 39 in 724bd60
Execution of the installation on Windows can be done with:
cmake --build . --config Release --target install
bicop_class.cpp, bicop_class.hpp
The Bicop
class is our main class for bivariate copula models. It is the base class from which classes for copula families inherit.
An open question is how we implement the estimation and selection methods:
Bicop
object, orBicop
class.HuEach copula family is implemented as a class inheriting from the base class Bicop
. For each virtual method of Bicop
a family specific method has to be implemented.
[Edit with items from the discussion, sorted by (arguable) order in which tasks should be dealt with)]
As discussed, it is now time for a clean-up:
Done:
#includes
from .hpp into .cpp files whenever possible.Bicop::select()
arguments and their order to Vinecop::select()
.preselect_family
and `get_c1c2Well, we've removed the git_revision.hpp but there is no alternative - shall we include the version number somewhere in vinecopulib.hpp?
there are set_parameters() and set_rotation(), but no set_family() - is it intentional?
I wrote my unit tests in #18 and #19 by hardcoding input values and VineCopula output - mainly because I was to lazy to deal with the R interface (we will remove it at some point anyway).
It is a bit ugly, but there's little reason not to do it. The tests catch 99% of the errors the R interface would and can be used even with R interface removed. A cleaner version of this would be to write an R script that writes inputs and results into a file which is loaded from within the tests.
What do you think?
Here's a hello-world code demonstrating C++11/Boost.Graph:
HTH,
S.
boost_graph.hpp
#pragma once
#include <boost/graph/adjacency_list.hpp>
// to allow for (auto e : boost::edges(g)) notation
namespace std
{
template <class T>
T begin(const std::pair<T,T>& eItPair) { return eItPair.first; }
template <class T>
T end(const std::pair<T,T>& eItPair) { return eItPair.second; }
}
example.cpp
#include <iostream>
#include "ctpl_stl.h" // https://github.com/vit-vit/CTPL
#include "boost_graph.hpp" // Boost graph includes + tweaks
// types
struct VertexT {};
struct EdgeT
{
double data;
std::shared_future<double> tau;
};
typedef boost::adjacency_list<
boost::setS, // store neighbors in a std::set
boost::vecS, // store vertices in a std::vector
boost::bidirectionalS,
VertexT,
EdgeT> GraphT;
typedef unsigned V;
typedef std::pair<V, V> E;
// dummy number crunching function
double rocketscience(double data)
{
return 44 * data;
}
// example: edge traversal + thread-pool submit task / get output
int main()
{
// data
E e[] = { E(1,2), E(2,3), E(1,3) };
GraphT g(std::begin(e), std::end(e), 5);
// initialise a thread pool
ctpl::thread_pool pool(2);
// do some rocket science for each edge on a thread pool in the background
for (auto e : boost::edges(g))
g[e].tau = pool.push([e, g](int) { return rocketscience(g[e].data); });
// print the results of computation (waiting if any of the results did not finish)
for (auto e : boost::edges(g))
std::cout << boost::source(e, g) << ' ' << boost::target(e, g) << " -> " << g[e].tau.get() << std::endl;
}
what about using:
#include <boost/math/constants/constants.hpp>
and boost::math::constants::pi;
instead of
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
in elliptical.cpp?
C:\projects\vinecopulib\src\bicop_parametric.cpp(68): warning C4244: 'initializing': conversion from 'double' to 'int', possible loss of data
C:\projects\vinecopulib\src\bicop_parametric.cpp(98): warning C4244: 'argument': conversion from 'double' to 'int', possible loss of data
C:\projects\vinecopulib\src\bicop_parametric.cpp(160): warning C4244: 'argument': conversion from 'double' to 'int', possible loss of data
results in build failure:
/home/vagrant/devel/vinecopulib/test/src_test/include/r_instance.hpp:24:21: fatal error: RInside.h: No such file or directory
#include <RInside.h>
^
compilation terminated.
I think that it should be vcl_matrix
instead of vc_matrix
at lines 97 and 98 of test_vinecop_class.cpp
. However, if we do that, the test fails.
both 32-bit and 64-bit builds:
64-bit build:
as of now, some code is within vinecopulib, and some within other namespaces - was it intentional?
/home/vagrant/devel/vinecopulib/src/vinecop/class.cpp: In member function ‘void vinecopulib::Vinecop::update_vinecop(std::vector<boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS, tools_structselect::VertexProperties, boost::property<boost::edge_weight_t, double, tools_structselect::EdgeProperties> > >&)’:
/home/vagrant/devel/vinecopulib/src/vinecop/class.cpp:273:23: error: no match for ‘operator=’ (operand types are ‘Eigen::ArrayWrapper<Eigen::Matrix<long unsigned int, -1, -1> >’ and ‘int’)
mat.array() = 0;
We've been mentioning the idea of including serialization in order to be able to save/load objects from this library.
I've been exploring the possibilities and narrowed things to:
a) Boost serialization
b) Cereal
Pros of a): we're already using Boost.
Cons of a): it's complex and arguably more importantly it doesn't seem to be header only (and hence not included in the BH R package).
Pros of b): simple, header only and there is a Rcereal R package.
Cons of b): adding a new dependency.
Any thoughts?
Currently, the binaries are placed outside of the build directory, what is - in my opinion - counterintuitive (i.e., not the default bahaviouor CMake used in most (all) projects I've tried). It is also not possible to generate several builds not overwriting the binaries, e.g.:
If a user wants to put the binaries in "vinecopulib/bin" this would still be possible by setting CMAKE_PREFIX_PATH and doing "make install"
What about placing the binaries within "build"?
Currently, the speed of the library is comparable to VineCopula
at best (e.g., for the pdf evaluation of a vine or the selection of a bivariate copula). I open this issue in order to list the potential areas of improvements.
cut_and_rotate
binaryExpr
to compute stuff for Archimedean families instead of calling unaryExpr
for the generator and its derivatives separately.Any ideas?
When running cmake, you may have noticed the following warning:
Policy CMP0048 is not set: project() command manages VERSION variables. Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy command to set the policy and suppress this warning.
This is a known issue of gtest and I don't really know what we can do about it.
https://ci.appveyor.com/project/tvatter/vinecopulib/build/1.0.319/job/6ioq0nov78veoygi
CMake Warning (dev) at build-release/googletest-src/CMakeLists.txt:3 (project):
Policy CMP0048 is not set: project() command manages VERSION variables.
Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
The following variable(s) would be set to empty:
PROJECT_VERSION
PROJECT_VERSION_MAJOR
PROJECT_VERSION_MINOR
PROJECT_VERSION_PATCH
PROJECT_VERSION_TWEAK
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at build-release/googletest-src/googlemock/CMakeLists.txt:40 (project):
Policy CMP0048 is not set: project() command manages VERSION variables.
Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
The following variable(s) would be set to empty:
PROJECT_VERSION
PROJECT_VERSION_MAJOR
PROJECT_VERSION_MINOR
PROJECT_VERSION_PATCH
PROJECT_VERSION_TWEAK
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at build-release/googletest-src/googletest/CMakeLists.txt:47 (project):
Policy CMP0048 is not set: project() command manages VERSION variables.
Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
The following variable(s) would be set to empty:
PROJECT_VERSION
PROJECT_VERSION_MAJOR
PROJECT_VERSION_MINOR
PROJECT_VERSION_PATCH
PROJECT_VERSION_TWEAK
This warning is for project developers. Use -Wno-dev to suppress it.
Things that are missing:
For now, it's convenient to use the same numbering as he VineCopula package:
In the foreseeable future, we may want to change this, because we will include several families that are not part of VineCopula. I think it gets very confusing if we just take the smallest free integer whenever we implement a new family. For example, we could use the number as an indication of the number of parameters, like:
family_ < 100
nonparametric100 <= family_ < 200
one parameter200 <= family_ < 300
two parametersComments, suggestions?
This requires e.g. makeing the nlopt include available at the time of building the bindings, what should not be needed (?).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.