GithubHelp home page GithubHelp logo

intel-bigdata / spark-pmof Goto Github PK

View Code? Open in Web Editor NEW
30.0 30.0 22.0 17.64 MB

Spark Shuffle Optimization with RDMA+AEP

License: Apache License 2.0

Scala 10.08% Java 2.89% C++ 85.04% Makefile 0.09% CMake 0.37% C 1.53%
aep rdma shuffle spark

spark-pmof's People

Contributors

dependabot[bot] avatar eugene-mark avatar jian-zhang avatar kelvin-qin avatar xuechendi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-pmof's Issues

Enable PMoF run in fsdax mode

Now Spark-PMoF doesn't work in AEP's FSDAX mode.
This should be optional, and users should have the option to run on FSDAX when they are not using RDMA NIC(RDMA is too complex to use).
Therefore, it is necessary to make appropriate modifications to the code to run in FSDAX mode.

  1. Create the poolfile of fsdax and specify the size in Scala (this is different from devdax)
  2. When creating a pool, identify whether the device type is fsdax or devdax, and add judgment conditions to NATIVE code.
  3. Other potential risks

met libhpnl exception when disabled rdma

Spark job is able to start, at map stage(stage 1), it is terminated by below error:
java:229613 terminated with signal 11 at PC=7f1dec8dfeab SP=7f1dc4ba23a0. Backtrace:
/usr/local/lib/libhpnl.so(ZN23CQExternalDemultiplexer10wait_eventEPP6fid_eqPiS3+0x2f)[0x7f1dec8dfeab]
/usr/local/lib/libhpnl.so(ZN17ExternalCqService13wait_cq_eventEiPP6fid_eqPiS3+0x66)[0x7f1dec8e2804]
/usr/local/lib/libhpnl.so(Java_com_intel_hpnl_core_CqService_wait_1cq_1event+0x56)[0x7f1dec8e23e9]

Failed to open pmem pool

Hello guys.

I am trying to run your project (Release v1.0.2) into a Spark standalone over Intel Optane persistent memory, but I have some problems with the deploy.I followed this guide (https://github.com/Intel-bigdata/Spark-PMoF/blob/master/doc/Spark-PMoF-enabling-guide.pdf) but I found some differences between the master branch:

  • spark.shuffle.manager org.apache.spark.shuffle.pmof.RdmaShuffleManager ( I can´t found this class inside the project.
    I use spark.shuffle.manager org.apache.spark.shuffle.pmof.PmofShuffleManager instead RdmaShuffleManager.

But when run databricks TPC benchmark I received this error:

Metastore DB connected: jdbc:sqlite:/tmp/spark-e2ba2c50-4d03-4bf1-aac5-430c740ef8ab/executor-c2b3dd32-9736-42e7-b7f5-71bf1b0820e7/spark_shuffle_meta.db
UPDATE devices SET mount_count = 4 WHERE device = '/dev/dax0.0'

Metastore DB: get unused device, should be /dev/dax0.0.
**failed to open pmem pool, errmsg: invalid major version (0)**

Previously, format my namespace as you said in your document:
Install and configure DCPM

  1. Please install ipmctl and ndctl according to your OS version 2) Run ipmctl show -dimm to check whether dimms can be recognized 3) Run ipmctl create -goal PersistentMemoryType=AppDirect to create AD mode 4) Run ndctl list -R , you will see region0 and region1 in screen
  2. Suppose we have 4x DCPM on two sockets. a) Run ndctl create-namespace –m devdax -r region0 -s 120g
    e) Then we will see /dev/dax0.0

My spark-defaults configuration is ( only for test pmem no RDMA):

spark.executor.extraClassPath      /opt/benchmarks_directory/Spark-PMoF/core/target/java-1.0-jar-with-dependencies.jar:/opt/benchmarks_directory/s
park-sql-perf/target/scala-2.11/spark-sql-perf_2.11-0.5.1-SNAPSHOT.jar
spark.driver.extraClassPath        /opt/benchmarks_directory/Spark-PMoF/core/target/java-1.0-jar-with-dependencies.jar:/opt/benchmarks_directory/s
park-sql-perf/target/scala-2.11/spark-sql-perf_2.11-0.5.1-SNAPSHOT.jar

spark.shuffle.manager org.apache.spark.shuffle.pmof.PmofShuffleManager

#new version
#spark.shuffle.manager org.apache.spark.shuffle.pmof.RdmaShuffleManager
spark.shuffle.pmof.enable_rdma false
spark.shuffle.pmof.enable_pmem true
spark.shuffle.pmof.max_stage_num 1
spark.shuffle.pmof.max_task_num 50000
spark.shuffle.spill.pmof.MemoryThreshold 16777216
spark.shuffle.pmof.pmem_capacity 100340914688
spark.shuffle.pmof.pmem_list /dev/dax0.0
spark.shuffle.pmof.dev_core_set dax0:0-71,dax0:0-71,dax1:0-71,dax1:0-71,dax0:0-71,dax0:0-71
spark.shuffle.pmof.server_buffer_nums 64
spark.shuffle.pmof.client_buffer_nums 64
spark.shuffle.pmof.map_serializer_buffer_size 262144
spark.shuffle.pmof.reduce_serializer_buffer_size 262144
spark.shuffle.pmof.chunk_size 262144
spark.shuffle.pmof.server_pool_size 3
spark.shuffle.pmof.client_pool_size 3
spark.shuffle.pmof.shuffle_block_size 2097152

My third party stack of libraries are ( I use this versions according with https://github.com/Intel-bigdata/Spark-PMoF/blob/master/docker/ubuntu18/DockerFile documentation):

  • spark-2.3.0-bin-hadoop2.7
  • pmdk 1.6
  • libfabric v1.8.0
  • HPNL spark-pmof-test branch

Can you help me?. And if you have one stack of libraries that you recommended, I would appreciate it.

RDMA Enabling cannot allocate memory

ENV:

  • Spark 2.3.1(hdp)
  • Hadoop 3.1.0
  • HiBench Terasort workload 500G data
  • RDMA nic CX4 ( or CX3 Pro)
    When I started rdma, I encountered an error,it said that fi_mr_reg: cannot allocate memory, and subsequently caused a NPE exception. When using the CX4 NIC, there is also an ArrayIndexOutOfBoundsException. The rping test has passed.
    The early configuration of RDMA is very complicated, is there any simpler solution to enable PMoF with rdma?

Delete the residual file of fsdax

see Enable PMoF run in fsdax mode
It seems that there is a new problem. After the Job is finished, the shuffle file is not automatically deleted.
But pmempool info --stats <file> finds that the utilization rate is almost zero.
Need to add a file delete operation in fsdax mode.

Tips:
When using fsdax mode, you can adjust the number of executors more freely, and the program may run faster.

Client connection and RPMP data server connection failure issue

In one proxy and one data server deployment on my side, all things are normal before any RPMP client request comes. Data server periodically sends heartbeat to proxy as expected. But after client requests data write/read one or more times (put_and_get test is used by me), data server will fail to send heartbeat to proxy. Henceforth, client write/read failure will occur. I found some threads in proxy exit which at least causes no response for heartbeat message from data server.

The below commit is involved in this bug. Please help fix it.
Persist data put job status for future potential job recovery. (#118)

Pass IP address to RPMP server process from start script

The start script can get server IP address from config. And it will go to that host by ssh to launch the server. In the launch, the corresponding IP address can be passed to server process. This looks more straightforward and can avoid some potential issues.

Free the devdax pool Error

In devdax mode, an unknown error occurred in the pool cleanup process while running a large data volume task, causing the process to be killed.
This does not affect the accuracy of the current job, but may result in an exception to the next job, such as a devdax busy or unavailable device.
It needs to be fixed.

Delete fsdax files independently

When running PMOF jobs with large volumes of data, it is common that FSDAX files cannot be deleted,thus affecting the next Job running.
For example, when running a 2TB Terasort test, the FSDAX file cannot always be deleted.
The potential problem might be in cleaning up the pool, but FSDAX does not need to clean up the pool and can directly delete files using POSIX operations.
Therefore, it is recommended to separate the fsdax file deletion operation from devdax.

PMEM 2M aligned issue

If the pmem is mounted as 2M aligned, the pmem obj address can't be registered as rdma buffer. We need to mount pmem as 4K aligned, but the pmem write performance is worse than it with 2M aligned.

2M aligned (default)
ndctl create-namespace -r region0 -f -e namespace0.0 -m devdax
4K aligned
ndctl create-namespace -r region0 -f -e namespace0.0 -m devdax -a 4k

merge with upstream Spark ?

Not sure if this was discussed, but is this possible to merge this work with upstream Spark?
Or the plan is to continue to maintain Spark-PMoF as a separate project?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.