GithubHelp home page GithubHelp logo

frovedis / frovedis Goto Github PK

View Code? Open in Web Editor NEW
64.0 11.0 12.0 49.28 MB

Framework of vectorized and distributed data analytics

License: BSD 2-Clause "Simplified" License

Makefile 0.30% C++ 71.57% Python 12.57% Shell 0.23% Scala 11.04% Java 1.20% C 0.47% Assembly 2.61%
sx-aurora-tsubasa machine-learning spark scikit-learn distributed-computing vectorization mpi

frovedis's Introduction

Frovedis:
FRamework Of VEctorized and DIStributed data analytics

1. Introduction

Frovedis is high-performance middleware for data analytics. It is written in C++ and utilizes MPI for communication between the servers.

It provides

  • Spark-like API for distributed processing
  • Matrix library using above API
  • Machine learning algorithm library
  • Dataframe for preprocessing
  • Spark/Python interface for easy utilization

Our primary target architecture is SX-Aurora TSUBASA, which is NEC's vector computer; these libraries are carefully written to support vectorization. However, they are just standard C++ programs and can run efficiently on other architectures like x86.

The machine learning algorithm library performs really well on sparse datasets, especially on SX-Aurora TSUBASA. In the case of logistic regression, it performed more than 10x faster on x86, and more than 100x faster on SX-Aurora TSUBASA, compared to Spark on x86.

In addition, it provides Spark/Python interface that is mostly compatible with Spark MLlib and Python scikit-learn. If you are using these libraries, you can easily utilize it. In the case of SX-Aurora TSUBASA, Spark/Python runs on x86 side and the middleware runs on VE (Vector Engine); therefore, users can enjoy the high-performance without noticing the hardware details.

2. Installation

If you want prebuilt binary, please check "releases", which includes rpm file. If your environment is supported, using rpm is the easiest way to install.

Plaese make sure if your hostname (which can be obtained by hostname command) is in /etc/hosts or registered in DNS to make it work correctly.

If you want to build the framework on SX-Aurora TSUBASA, we recommend to utilize our build tools together with VE version of boost. Please follow the instructions in README.md of the build tools.

On other platforms, please follow the instructions.

3. Getting started

Here, we assume that you have installed Frovedis from prebuilt binary. Please also refer to /opt/nec/frovedis/getting_started.md.

If you want to use VE version, you can set up your environment variables by

$ source /opt/nec/frovedis/ve/bin/veenv.sh

If you want to use x86 version, please use following:

$ source /opt/nec/frovedis/x86/bin/x86env.sh

${INSTALLPATH} below is /opt/nec/frovedis/ve/ in the case of VE, and /opt/nec/frovedis/x86/ in the case of x86.

3.1 C++ interface

Tutorial is here in source code tree and installed in ${INSTALLPATH}/doc/tutorial/tutorial.[md,pdf]. The directory also contains small programs that are explained in the tutorial. You can copy the source files into your home directory and compile them by yourself. The Makefile and Makefile.in.[x86, etc.] contains configurations for compilation, like compilation options, path to include files, libraries, etc. You can re-use it for your own programs.

The small programs in the tutorial directory looks like this:

#include <frovedis.hpp>

int two_times(int i) {return i*2;}

int main(int argc, char* argv[]){
  frovedis::use_frovedis use(argc, argv);

  std::vector<int> v = {1,2,3,4,5,6,7,8};
  auto d1 = frovedis::make_dvector_scatter(v);
  auto d2 = d1.map(two_times);
  auto r = d2.gather();
  for(auto i: r) std::cout << i << std::endl;
}

This program creates distributed vector from std::vector, and doubles its elements in a distribited way; then gathers to std::vector again. As you can see, you can write distributed program quite easily and consicely compared to MPI program.

In addition, there are also sample programs installed in ${INSTALLPATH}/samples directory. You can also use them as reference when you write your own programs.

Manuals are installed in ${INSTALLPATH}/doc/manual/manual_cpp.[md,pdf], which are more in detail than the tutorial. Manuals for man command is also installed; you can do like man dvector.

3.2. Spark/Python interface

You can utilize the predefined functionalities from Spark/Python, which includes machine learning algorithms, matrix operations, and dataframe operations.

This is implemented as a server; the server accepts RPC (remote procedure call) to provide the above functionalities from Spark or Python. The server can run on both VE and x86; if veenv.sh is sourced, VE version of the server is used, if x86env.sh is sourced, x86 version of the server is used.

To use the functionalities, you just need to modify importing packages or modules and add few lines of codes from your Spark or Python scikit-learn program.

For example, in the case of Spark, if the original program is:

import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
...
val model = LogisticRegressionWithSGD.train(data)

Then, the modified program is:

import com.nec.frovedis.mllib.classificaiton.LogisticRegressionWithSGD 
...
FrovedisServer.initialize(...)
val model = LogisticRegressionWithSGD.train(data)
FrovedisServer.shut_down()

Here, importing package is changed, and server initialization and shutdown is added. Other than that, you do not have to change the program.

In the case of Python / scikit-learn, if the original program is:

from sklearn.linear_model import LogisticRegression
...
clf = LogisticRegression(...).fit(X,y)

Then, the modified program is:

from frovedis.mllib.linear_model import LogisticRegression
...
FrovedisServer.initialize(...)
clf = LogisticRegression(...).fit(X,y)
FrovedisServer.shut_down()

Similary, only importing module is changed and server initialization and shutdown is added.

Tutorial for spark is here in source code tree and installed in ${INSTALLPATH}/doc/tutorial_spark/tutorial.[md,pdf]. Tutorial for python is here and installed in ${INSTALLPATH}/doc/tutorial_python/tutorial.[md,pdf].

The directory also contains small programs that are explained in the tutorial. You can copy the source files into your home directory and run them by yourself.

There are other Spark demo programs installed in ${X86_INSTALLPATH}/foreign_if_demo/spark, and Python demo programs installed in ${X86_INSTALLPATH}/foreign_if_demo/spark, where ${X86_INSTALLPATH} is /opt/nec/frovedis/x86/.

To try them, please copy these directories into your home directory (since it creates files). The scripts ./foreign_if_demo/spark/run_demo.sh and ./foreign_if_demo/python/run_demo.sh run demos.

Manuals for Spark and manuals for Python are installed in ${INSTALLPATH}/doc/manual/manual_spark.[md,pdf] and ${INSTALLPATH}/doc/manual/manual_python.[md,pdf]. Manuals for man command is also installed; you can do like man -s 3s logistic_regression or man -s 3p logistic_regression. Here, with -s 3s option, you can see Spark manual; with -s 3p option, you can see Python manual.

4. Other documents

Below links would be useful to understand the framework in more detail:

5. License

License of this software is in LICENSE file.

This software includes third party software. The third_party directory includes third party software together with their licenses.

frovedis's People

Contributors

daijiro avatar harumichi avatar shuohm1 avatar takuya-araki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frovedis's Issues

Error in building sample graph algorithms

Hi.

I am trying build sample graph algorithms but getting the following error:

[user@ws-067 graph] ls
cc_BFS.cc  Makefile  pagerank.cc  sssp.cc
[user@ws-067 graph] make
mpic++ -c -fPIC -g -Wall -O3 -std=c++11 -Wno-unknown-pragmas -Wno-sign-compare -pthread -I/opt/nec/nosupport/frovedis/x86/include sssp.cc -o sssp.o
In file included from /opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/graph.hpp:6:0,
                 from sssp.cc:27:
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:227:35: fatal error: ./set_operations.incl1: No such file or directory
  #include "./set_operations.incl1"
                                   ^
compilation terminated.
make: *** [sssp.o] Error 1

Looking at /opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp, I can see that the missing headers are actually located at /opt/nec/nosupport/frovedis/dataframe/. Acorrdingly I uncommented the corresponding include statements in set_union_multivec.hpp. Now, compilation moves to the next step with different set of errors:

mpic++ -c -fPIC -g -Wall -O3 -std=c++11 -Wno-unknown-pragmas -Wno-sign-compare -pthread -I/opt/nec/nosupport/frovedis/x86/include sssp.cc -o sssp.o
In file included from /opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/graph.hpp:6:0,
                 from sssp.cc:27:
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp: In function ‘size_t frovedis::set_union_vertical_unrolled(std::vector<_RealType>&, std::vector<_RealType>&, std::vector<_RealType>&, std::vector<_RealType>&)’:
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:310:10: error: ‘valid_3’ was not declared in this scope
       if(valid_3[j]) {
          ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:311:19: error: ‘leftelm3’ was not declared in this scope
         bool eq = leftelm3[j] == rightelm3[j];
                   ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:311:34: error: ‘rightelm3’ was not declared in this scope
         bool eq = leftelm3[j] == rightelm3[j];
                                  ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:314:14: error: ‘out_idx_3’ was not declared in this scope
           op[out_idx_3[j]++] = leftelm3[j];
              ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:315:11: error: ‘left_idx_3’ was not declared in this scope
           left_idx_3[j]++;
           ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:317:14: error: ‘out_idx_3’ was not declared in this scope
           op[out_idx_3[j]++] = rightelm3[j];
              ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:318:11: error: ‘right_idx_3’ was not declared in this scope
           right_idx_3[j]++;
           ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:321:11: error: ‘right_idx_3’ was not declared in this scope
           right_idx_3[j]++;
           ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:323:12: error: ‘left_idx_3’ was not declared in this scope
         if(left_idx_3[j] == left_idx_stop_3[j] ||
            ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:323:29: error: ‘left_idx_stop_3’ was not declared in this scope
         if(left_idx_3[j] == left_idx_stop_3[j] ||
                             ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:324:12: error: ‘right_idx_3’ was not declared in this scope
            right_idx_3[j] == right_idx_stop_3[j]) {
            ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:324:30: error: ‘right_idx_stop_3’ was not declared in this scope
            right_idx_3[j] == right_idx_stop_3[j]) {
                              ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:337:52: error: ‘valid_3’ was not declared in this scope
       if(valid_0[i] || valid_1[i] || valid_2[i] || valid_3[i])
                                                    ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:373:44: error: ‘out_idx_3’ was not declared in this scope
     out_idx[GRAPH_SET_VLEN_EACH * 3 + i] = out_idx_3[i];
                                            ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:374:45: error: ‘left_idx_3’ was not declared in this scope
     left_idx[GRAPH_SET_VLEN_EACH * 3 + i] = left_idx_3[i];
                                             ^
/opt/nec/nosupport/frovedis/x86/include/frovedis/ml/graph/set_union_multivec.hpp:375:46: error: ‘right_idx_3’ was not declared in this scope
     right_idx[GRAPH_SET_VLEN_EACH * 3 + i] = right_idx_3[i];
                                              ^
make: *** [sssp.o] Error 1

I am wondering if you can help with compiling this sample program.

frovedis-0.8.0-1.x86_64.rpm might be corrupt

yum reports the following error.

  Updating   : frovedis-0.8.0-1.x86_64                                                                                                                                                                                                     1/2
Error unpacking rpm package frovedis-0.8.0-1.x86_64
error: unpacking of archive failed on file /opt/nec/nosupport/frovedis/x86/samples/wikipedia2matrix/sample.txt;5bffa74c: cpio: read

The file corruption can be confirmed using cpio as well.

$ rpm2cpio frovedis-0.8.0-1.x86_64.rpm | cpio --verbose -id
./opt/nec/nosupport/frovedis
./opt/nec/nosupport/frovedis/data
./opt/nec/nosupport/frovedis/data/demo
 :
./opt/nec/nosupport/frovedis/x86/samples/wikipedia2matrix
./opt/nec/nosupport/frovedis/x86/samples/wikipedia2matrix/Makefile
./opt/nec/nosupport/frovedis/x86/samples/wikipedia2matrix/README
cpio: premature end of file

Other rpm files (frovedis-0.8.0-0.x86_64.rpm, frovedis-0.8.0-3.x86_64.rpm) are not corrupt, though.

BTW, I always appreciate this great software.

Rpm install and conda

Hello,

just wondering how to combine/linked the rpm
install with conda environnement ?

in other words, can we install into a conda env ?

thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.