The vinecopulib's discuss from vinecopulib

std::endl instead of "\n" in error messages? (to make it work on Windows)

enable OSX on Travis

Should we ask Travis to enable OSX builds for vinecopulib?

build failure on clean-up branch with gcc 6.3

https://github.com/tvatter/vinecopulib/blob/clean-up/src/bicop/indep.cpp#L53

/home/vagrant/devel/vinecopulib/src/bicop/indep.cpp: In member function ‘virtual Eigen::MatrixXd vinecopulib::IndepBicop::tau_to_parameters(const double&)’:
/home/vagrant/devel/vinecopulib/src/bicop/indep.cpp:53:31: error: call of overloaded ‘Matrix(int)’ is ambiguous
         Eigen::VectorXd pars(0);

@tvatter
So I finally found a "bug" i've been searching for two days in the structure selection code. The bug is that
I tried to read the R-vine matrix in a way that makes sense, and not the way VineCopula wants me to do it ;) That reminded me of a other things I didn't like about the way the matrix is oriented. Let me explain with a small example:

Intuitively I read the edge in the last tree as 4, 3; 2 1, but in VineCopula it's supposed to be read as 3, 4; 2 1.
One would expect that the (i, j) entry of the matrix is related to edge j in tree i. Instead it relates to edge j in tree 5 - i. That's also one of the reasons why indexing this matrix in algorithms is so painful (and I reversed the matrices internally). Also it doesn't look like a grapevine ;)

So a solution to both points would be to store the matrix as

and read it from the bottom up, e.g. 4, 3 ; 2 1 etc.

So my question is: Should we use this storage order for the matrix?

The downside would be that we risk to get people confused. There will be three different ways to write this matrix out there (the third is the "upper triangular style" used in Harry Joe's book and library and causes the same indexing problems).

create vinecopulib-config.cmake to enable CMake users to do find_package(vinecopulib)

the following install logic for CMake will place it in the correct place for OSX/Linux

install(
  FILES
    vinecopulib-config.cmake
  DESTINATION
    share/vinecopulib
)

boost ver requirement?

Python bindings do not require Boost 1.63 anymore (since we use NumPy C bindings instead of the Boost.NumPy).

Should we relax the requirement in cmake/findDependencies.cmake?
(I remember that there was something requireing a recent version in the C++ vinecopulib code, but I don't remember what it was and what the version was)

Finally, if 1.63 is not required, perhaps we can skip the Ubuntu update in travis?

ASAN-reported error in test_bicop_select on travis

https://travis-ci.org/tvatter/vinecopulib/builds/196939374#L7384

silence preprocessor redefinition warnings

In file included from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:38:0,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/home/vagrant/devel/vinecopulib/src/../include/c_tools.h:8:0: warning: "false" redefined
 #define false 0
 ^
In file included from /usr/include/c++/4.9/bits/atomic_base.h:36:0,
                 from /usr/include/c++/4.9/atomic:41,
                 from /usr/include/boost/math/special_functions/detail/bernoulli_details.hpp:19,
                 from /usr/include/boost/math/special_functions/bernoulli.hpp:16,
                 from /usr/include/boost/math/special_functions/gamma.hpp:35,
                 from /home/vagrant/devel/vinecopulib/src/../include/boost_tools.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdbool.h:42:0: note: this is the location of the previous definition
 #define false false
 ^
In file included from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:38:0,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/home/vagrant/devel/vinecopulib/src/../include/c_tools.h:9:0: warning: "true" redefined
 #define true (!false)
 ^
In file included from /usr/include/c++/4.9/bits/atomic_base.h:36:0,
                 from /usr/include/c++/4.9/atomic:41,
                 from /usr/include/boost/math/special_functions/detail/bernoulli_details.hpp:19,
                 from /usr/include/boost/math/special_functions/bernoulli.hpp:16,
                 from /usr/include/boost/math/special_functions/gamma.hpp:35,
                 from /home/vagrant/devel/vinecopulib/src/../include/boost_tools.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_class.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_parametric.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/../include/bicop_archimedean.hpp:23,
                 from /home/vagrant/devel/vinecopulib/src/bicop_archimedean.cpp:20:
/usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdbool.h:43:0: note: this is the location of the previous definition
 #define true true

github release tag for 0.0

Github has its "release" mechanism. Currently, it shows no releases for vinecopulib:

This can be achieved by tagging a commit. Before doing it, I suggest committing the git_revision.hpp file, so it will be present in the zip/tarball downloadable by the users. After releasing the file can be removed from the repo (it is autogenerated by CMake for users obtaining the library via git).

introduce Appveyor CI for Windows

Boost.Graph hello worlds

Here is a hello world Boost.Graph code demonstrating:

generating a complete graph
computing an adjacency matrix (as an Eigen matrix)
edge removal

Code:

#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graph_utility.hpp>

#include <Eigen/Dense>

typedef boost::adjacency_list<
  boost::setS, // store neighbors in a std::set
  boost::vecS, // store vertices in a std::vector
  boost::undirectedS
  > GraphT;

GraphT make_complete_graph(int n)
{
    auto graph = GraphT(n);

    auto vertices = boost::vertices(graph);
    auto &first = vertices.first;
    auto &end = vertices.second;

    for (auto &va = first; va != end; ++va)
        for (auto vb = va+1; vb != end; ++vb)
            add_edge(*va, *vb, graph);

    return graph;
}


Eigen::MatrixXd adjacency_matrix(const GraphT &graph)
{
    auto n = boost::num_vertices(graph);
    auto m = Eigen::MatrixXd(n,n);
    auto mm = m.triangularView<Eigen::Upper>();

    auto edges = boost::edges(graph);
    auto &first = edges.first;
    auto &end = edges.second;

    m.setZero();

    for (auto edge = first; edge != end; ++edge)
    {
        int i = boost::source(*edge, graph),
            j = boost::target(*edge, graph);
        m(i,j) = 1;
    }

    return mm;
}


int main()
{
    auto g = make_complete_graph(5);

    std::cout << "complete graph" << std::endl;
    boost::print_graph(g);
    std::cout << adjacency_matrix(g) << std::endl;

    boost::remove_edge(4,0, g);

    std::cout << "after removal of one edge" << std::endl;
    boost::print_graph(g);
    std::cout << adjacency_matrix(g) << std::endl;
}

Output:

complete graph
0 <--> 1 2 3 4
1 <--> 0 2 3 4
2 <--> 0 1 3 4
3 <--> 0 1 2 4
4 <--> 0 1 2 3
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
0 0 0 0 0
after removal of one edge
0 <--> 1 2 3
1 <--> 0 2 3 4
2 <--> 0 1 3 4
3 <--> 0 1 2 4
4 <--> 1 2 3
0 1 1 1 0
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
0 0 0 0 0

why not use enums for tree_criterion, parametric_method, selection_criterion?

Boost.Graph MST hello world

Below we post an example of how to call the Boost.Graph Prim's MST algorithm (which minimises edge weights sum). Boost also implements the Kruskal algorithm but its complexity is higher when starting with a complete graph.

#include <iostream>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/prim_minimum_spanning_tree.hpp>
#include <boost/graph/graph_utility.hpp>

using VertexT = boost::no_property;
using EdgeT = boost::property < boost::edge_weight_t, double >; // tau

typedef boost::adjacency_list <
    boost::vecS,
    boost::vecS,
    boost::undirectedS,
    VertexT,
    EdgeT
    > GraphT;

GraphT MST(const GraphT &in)
{
    std::vector<int> p(num_vertices(in));

    prim_minimum_spanning_tree(in, p.data());

    GraphT out;
    for (int source = 0; source < p.size(); ++source)
    {
        auto target = p[source];
        if (source != target)
            boost::add_edge(source, p[source], out);
        // TODO: return weights?
    }

    return out;
}

int main()
{
    GraphT g(5);
    boost::add_edge(0,2, 1, g);
    boost::add_edge(1,3, 1, g);
    boost::add_edge(1,4, 2, g);
    boost::add_edge(2,1, 7, g);
    boost::add_edge(2,3, 3, g);
    boost::add_edge(3,4, 1, g);
    boost::add_edge(4,0, 1, g);

    boost::print_graph(g);
    std::cout << std::endl;
    boost::print_graph(MST(g));
}

Output:

0 <--> 2 4
1 <--> 3 4 2
2 <--> 0 1 3
3 <--> 1 2 4
4 <--> 1 3 0

0 <--> 2 4
1 <--> 3
2 <--> 0
3 <--> 1 4
4 <--> 3 0

Boost vs GSL

Hi,

So I was thinking about GSL vs Boost for the pdf/cdf/quantile. To mix GSL and Eigen, we can do:

#include <gsl/gsl_cdf.h>
#include <gsl/gsl_randist.h>

template<typename T> T dnorm_gsl(const T& x)
{
    return x.unaryExpr(std::ptr_fun(gsl_ran_ugaussian_pdf));
};
template<typename T> T pnorm_gsl(const T& x)
{
    return x.unaryExpr(std::ptr_fun(gsl_cdf_ugaussian_P));
};
template<typename T> T qnorm_gsl(const T& x)
{
    return x.unaryExpr(std::ptr_fun(gsl_cdf_ugaussian_Pinv));
};

Equivalently, for boost, we can do:

#include <boost/bind.hpp>
#include <boost/math/distributions.hpp>
#include <boost/function.hpp>

template<typename T> T dnorm(const T& x)
{
    boost::math::normal std_normal;
    return x.unaryExpr(boost::bind<double>(boost::math::pdf<boost::math::normal,double>, std_normal, _1));
};

template<typename T> T pnorm(const T& x)
{
    boost::math::normal std_normal;
    return x.unaryExpr(boost::bind<double>(boost::math::cdf<boost::math::normal,double>, std_normal, _1));
};

template<typename T> T qnorm(const T& x)
{
    boost::math::normal std_normal;
    return x.unaryExpr(boost::bind<double>(boost::math::quantile<boost::math::normal,double>, std_normal, _1));
};

Note that the templates are used to include both matrices and vectors....

At first, I though that the drawback of the boost implementation was the instantiation of the distribution. However, when benchmarking for a sample size of 1e5, I noticed that GSL is simply faster (~4 times for the pdf and ~12 times for the cdf/quantile).

enable install target for Windows

All install() commands are in an if-not-windows block as of now (

vinecopulib/cmake/buildTargets.cmake

Line 39 in 724bd60

if (NOT WIN32)

)

Execution of the installation on Windows can be done with:

cmake --build . --config Release --target install

build example programs in Appveyor

The `Bicop` class

bicop_class.cpp, bicop_class.hpp

The Bicop class is our main class for bivariate copula models. It is the base class from which classes for copula families inherit.

An open question is how we implement the estimation and selection methods:

as free functions returning a pointer to a Bicop object, or
as methods of the Bicop class.

Copula families

HuEach copula family is implemented as a class inheriting from the base class Bicop. For each virtual method of Bicop a family specific method has to be implemented.

Elliptical families bicop_elliptical.cpp, bicop_elliptical.hpp

Gaussian bicop_normal.cpp, bicop_normal.hpp
Student t bicop_student.cpp, bicop_student.hpp

Archimedean families bicop_archimedean.cpp, bicop_archimedean.hpp

Clayton bicop_clayton.cpp, bicop_clayton.hpp
Gumbel bicop_gumbel.cpp, bicop_gumbel.hpp
Frank bicop_frank.cpp, bicop_frank.hpp
Joe bicop_joe.cpp, bicop_joe.hpp
BB families

Other parameteric families

Tawns

Nonparameteric methods

Transformation kernel
Bernstein copula

Refactoring of Vinecop class

As discussed in #49:

Change to more natural matrix style (see #48 for the discussion)
New methods get_families(), get_rotations() that return the full model matrix.
Unit tests for structure selection.

Clean-Up

[Edit with items from the discussion, sorted by (arguable) order in which tasks should be dealt with)]

As discussed, it is now time for a clean-up:

Beef up Appveyor builds #44
Make subfolders bicop/vinecop/tools in src and include
Update documentation everywhere
Update readme (e.g., for R dependencies)

Done:

git_revision.hpp does not get installed with "make install"

expose version number through a header file

Well, we've removed the git_revision.hpp but there is no alternative - shall we include the version number somewhere in vinecopulib.hpp?

bicop::set_family?

there are set_parameters() and set_rotation(), but no set_family() - is it intentional?

Getting rid of R+Rcpp dependencies

I wrote my unit tests in #18 and #19 by hardcoding input values and VineCopula output - mainly because I was to lazy to deal with the R interface (we will remove it at some point anyway).

It is a bit ugly, but there's little reason not to do it. The tests catch 99% of the errors the R interface would and can be used even with R interface removed. A cleaner version of this would be to write an R script that writes inputs and results into a file which is loaded from within the tests.

What do you think?

vinecopulib namespace - should all the library code go there?

graph + thread pool prototypes

Here's a hello-world code demonstrating C++11/Boost.Graph:

graph edge traversal
thread-pool logic: submitting background tasks for each edge, getting result (after waiting if needed)

HTH,
S.

boost_graph.hpp

#pragma once
#include <boost/graph/adjacency_list.hpp>

// to allow for (auto e : boost::edges(g)) notation
namespace std
{
    template <class T>
    T begin(const std::pair<T,T>& eItPair) { return eItPair.first; }

    template <class T>
    T end(const std::pair<T,T>& eItPair) { return eItPair.second; }
}

example.cpp

#include <iostream>

#include "ctpl_stl.h" // https://github.com/vit-vit/CTPL
#include "boost_graph.hpp"  // Boost graph includes + tweaks


// types
struct VertexT {};
struct EdgeT
{
    double data;
    std::shared_future<double> tau;
};

typedef boost::adjacency_list<
  boost::setS, // store neighbors in a std::set
  boost::vecS, // store vertices in a std::vector
  boost::bidirectionalS,
  VertexT,
  EdgeT> GraphT;

typedef unsigned V;
typedef std::pair<V, V> E;


// dummy number crunching function
double rocketscience(double data)
{
    return 44 * data;
}


// example: edge traversal + thread-pool submit task / get output
int main()
{
    // data
    E e[] = { E(1,2), E(2,3), E(1,3) };
    GraphT g(std::begin(e), std::end(e), 5);

    // initialise a thread pool
    ctpl::thread_pool pool(2);

    // do some rocket science for each edge on a thread pool in the background
    for (auto e : boost::edges(g))
        g[e].tau = pool.push([e, g](int) { return rocketscience(g[e].data); });

    // print the results of computation (waiting if any of the results did not finish)
    for (auto e : boost::edges(g))
        std::cout << boost::source(e, g) << ' ' << boost::target(e, g) << " -> " << g[e].tau.get() << std::endl;
}

boost::math::constants::pi instead of M_PI?

what about using:

#include <boost/math/constants/constants.hpp>

and boost::math::constants::pi;

instead of

#ifndef M_PI
#define M_PI       3.14159265358979323846
#endif

in elliptical.cpp?

fix tests on Windows

fix/silence conversion warnings reported by Visual Studio

  C:\projects\vinecopulib\src\bicop_parametric.cpp(68): warning C4244: 'initializing': conversion from 'double' to 'int', possible loss of data 
  C:\projects\vinecopulib\src\bicop_parametric.cpp(98): warning C4244: 'argument': conversion from 'double' to 'int', possible loss of data 
  C:\projects\vinecopulib\src\bicop_parametric.cpp(160): warning C4244: 'argument': conversion from 'double' to 'int', possible loss of data

CMake does not check for presence of RInside

results in build failure:

/home/vagrant/devel/vinecopulib/test/src_test/include/r_instance.hpp:24:21: fatal error: RInside.h: No such file or directory
 #include <RInside.h>
                     ^
compilation terminated.

vcl_matrix instead of vc_matrix in test_vinecop_class.cpp

I think that it should be vcl_matrix instead of vc_matrix at lines 97 and 98 of test_vinecop_class.cpp. However, if we do that, the test fails.

add an Appveyor status badge to README.md :)

https://www.appveyor.com/docs/status-badges/

fix warning on Windows

both 32-bit and 64-bit builds:

test\src_test/include/test_vinecop_class.hpp(64): warning C4244: 'argument': conversion from 'double' to 'int', possible loss of data
test\src_test/include/test_vinecop_class.hpp(91): warning C4244: 'argument': conversion from 'double' to 'int', possible loss of data

64-bit build:

create packages / find packagers for Debian, Macports, Fedora, ...

move all stuff in misc into vinecopulib namespace?

as of now, some code is within vinecopulib, and some within other namespaces - was it intentional?

mat.array() = 0 compilation error (int vs. size_t)?

/home/vagrant/devel/vinecopulib/src/vinecop/class.cpp: In member function ‘void vinecopulib::Vinecop::update_vinecop(std::vector<boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS, tools_structselect::VertexProperties, boost::property<boost::edge_weight_t, double, tools_structselect::EdgeProperties> > >&)’:
/home/vagrant/devel/vinecopulib/src/vinecop/class.cpp:273:23: error: no match for ‘operator=’ (operand types are ‘Eigen::ArrayWrapper<Eigen::Matrix<long unsigned int, -1, -1> >’ and ‘int’)
         mat.array() = 0;

Serialization

We've been mentioning the idea of including serialization in order to be able to save/load objects from this library.

I've been exploring the possibilities and narrowed things to:

a) Boost serialization
b) Cereal

Pros of a): we're already using Boost.
Cons of a): it's complex and arguably more importantly it doesn't seem to be header only (and hence not included in the BH R package).

Pros of b): simple, header only and there is a Rcereal R package.
Cons of b): adding a new dependency.

Any thoughts?

Python bindings

NumPy hello world (depends on fixes in CMake, Debian package and Boost.Python)
should Windows binary have _d suffix in Debug mode?
add license / copyright header
update README.md to cover Python bindings dependencies etc
CMake option to disable bindings compilation + execution of it in Travis
serializability?

should the binaries be placed within "build" directory?

Currently, the binaries are placed outside of the build directory, what is - in my opinion - counterintuitive (i.e., not the default bahaviouor CMake used in most (all) projects I've tried). It is also not possible to generate several builds not overwriting the binaries, e.g.:

build-O3
build-Ofast
build-debug
...

If a user wants to put the binaries in "vinecopulib/bin" this would still be possible by setting CMAKE_PREFIX_PATH and doing "make install"

What about placing the binaries within "build"?

Need for speed

Currently, the speed of the library is comparable to VineCopula at best (e.g., for the pdf evaluation of a vine or the selection of a bivariate copula). I open this issue in order to list the potential areas of improvements.

Avoid copying the data all the time with cut_and_rotate
Implement the analytical gradients
Use something like binaryExpr to compute stuff for Archimedean families instead of calling unaryExpr for the generator and its derivatives separately.

Any ideas?

gtest policy CMP0048

When running cmake, you may have noticed the following warning:

Policy CMP0048 is not set: project() command manages VERSION variables. Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

This is a known issue of gtest and I don't really know what we can do about it.

CMake dev warning on Appveyor

https://ci.appveyor.com/project/tvatter/vinecopulib/build/1.0.319/job/6ioq0nov78veoygi

CMake Warning (dev) at build-release/googletest-src/CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
  The following variable(s) would be set to empty:
    PROJECT_VERSION
    PROJECT_VERSION_MAJOR
    PROJECT_VERSION_MINOR
    PROJECT_VERSION_PATCH
    PROJECT_VERSION_TWEAK
This warning is for project developers.  Use -Wno-dev to suppress it.
CMake Warning (dev) at build-release/googletest-src/googlemock/CMakeLists.txt:40 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
  The following variable(s) would be set to empty:
    PROJECT_VERSION
    PROJECT_VERSION_MAJOR
    PROJECT_VERSION_MINOR
    PROJECT_VERSION_PATCH
    PROJECT_VERSION_TWEAK
This warning is for project developers.  Use -Wno-dev to suppress it.
CMake Warning (dev) at build-release/googletest-src/googletest/CMakeLists.txt:47 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
  The following variable(s) would be set to empty:
    PROJECT_VERSION
    PROJECT_VERSION_MAJOR
    PROJECT_VERSION_MINOR
    PROJECT_VERSION_PATCH
    PROJECT_VERSION_TWEAK
This warning is for project developers.  Use -Wno-dev to suppress it.

0.0.1 tarball does not ship with git_revision.hpp -> "make install" will fail

should we have accessor methods for fit controls?

beef up Appveyor builds

Things that are missing:

building and running the R-dependent tests (needs .lib files for R which are not part of the default distribution)
compilation for x64 (in addition to i386)
properly handling Debug and Release modes
doing "make install" and testing it (this is also not done for Travis actually)

Enumeration of families

For now, it's convenient to use the same numbering as he VineCopula package:

0: Indendence
1: Gauss
2: Student
3: Clayton
4: Gumbel
5: Frank
6: Joe
....

In the foreseeable future, we may want to change this, because we will include several families that are not part of VineCopula. I think it gets very confusing if we just take the smallest free integer whenever we implement a new family. For example, we could use the number as an indication of the number of parameters, like:

family_ < 100 nonparametric
100 <= family_ < 200 one parameter
200 <= family_ < 300 two parameters
and so one

Comments, suggestions?

vinecopulib / vinecopulib Goto Github PK

vinecopulib's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs