hpc / spindle Goto Github PK
View Code? Open in Web Editor NEWScalable dynamic library and python loading in HPC environments
License: Other
Scalable dynamic library and python loading in HPC environments
License: Other
============================================================================= == SPINDLE: Scalable Parallel Input Network for Dynamic Load Environments == ============================================================================= Authors: SPINDLE: Matthew LeGendre (legendre1 at llnl dot gov) W.Frings <W.Frings at fz-juelich dot de> COBO: Adam Moody <moody20 at llnl dot gov> Version: 0.13 (Aug 2020) Summary: =========== Spindle is a tool for improving the performance of dynamic library and python loading in HPC enviornments. Documentation: ============ https://computing.llnl.gov/projects/spindle/software Overview: ============ Using dynamically-linked libraries is common in most computational environments, but they can cause serious problem when used on large clusters and supercomputers. Shared libraries are frequently stored on shared file systems, such as NFS. When thousands of processes simultaneously start and attempt to search for and load libraries, it resembles a denial-of-service attack against the shared file system. This "attack" doesn't just slow down the application, but impacts every user on the system. We encountered cases where it took over ten hours for a dynamically-linked MPI application running on 16K processes to reach main. Spindle presents a novel solution to this problem. It transparently runs alongside your distributed application and takes over its library loading mechanism. When processes start to load a new library, Spindle intercepts the operation, designates one process to read the file from the shared file system, then distributes the library's contents to every process with a scalable broadcast operation. Spindle is very scalable. On a cluster at LLNL the Pynamic benchmark (which measures library loading performance) was unable to scale much past 100 nodes. Even at that small scale it was causing significant performance problems that were impacting everyone on the cluster. When running Pynamic under Spindle, we were able to scale up to the max job size at 1,280 nodes without showing any signs of file-system stress or library-related slowdowns. Unlike competing solutions, Spindle does not require any special hardware, and libraries do not have to be staged into any special locations. Applications can work out-of-the-box do not need any special compile or link flags. Spindle is completely userspace and does not require kernel patches or root privileges. Spindle can trigger scalable loading of dlopened libraries, dependent library, executables, python modules and specified application data files. Compilation: ============ Please see INSTALL file in the Spindle source tree. Usage: ====== Put 'spindle' before your job launch command. E.g: spindle mpirun -n 128 mpi_hello_world
The spindle with application executable built with BIND_NOW option occur segmentation fault. I saw the fault on a x86 cluster and an aarch64 cluster.
I confirmed the following reproduce steps on the x86 cluster.
The linker version in x86 cluster.
$ LC_ALL=C ldd --version
ldd (GNU libc) 2.17
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
I downloaded v0.12 from https://github.com/hpc/Spindle/releases/tag/v0.12 and built it.
Prepare the simple application built with BIND_NOW and run with Spindle like the following.
$ cat hello.c
#include <stdio.h>
int main (int argc, char* argv[])
{
printf ("Hello world!\n");
return 0;
}
$ gcc -Wl,-z,now -o hello_bind_now hello.c
SPINDLE_DEBUG=3 TMPDIR='/tmp' spindle --location='/tmp' mpiexec -np 1 spindlemarker $(pwd)/hello_bind_now
<Aug 31 16:19:45> <Launchmon> (INFO): The RM process has just been forked and exec'ed.
<Aug 31 16:19:45> <Launchmon> (INFO): Just continued the RM process out of the first trap
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 247311 RUNNING AT 10.xx.yy.zz
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
Without BIND_NOW option, the application can run with Spindle.
$ gcc -o hello hello.c
SPINDLE_DEBUG=3 TMPDIR='/tmp' spindle --location='/tmp' mpiexec -np 1 spindlemarker $(pwd)/hello
<Aug 31 16:20:26> <Launchmon> (INFO): The RM process has just been forked and exec'ed.
<Aug 31 16:20:26> <Launchmon> (INFO): Just continued the RM process out of the first trap
Hello world!
In the debug output, the SPINDLE client looks stop with the following log.
[Client.0.252100@auditclient_common.c:92] la_objopen - la_objopen(): loading /lib64/libc.so.6, link_map = 0x2b60c23859c8, lmid = LM_ID_BASE, cookie = 0x2b60c2385e30
[Client.0.252100@auditclient_common.c:116] la_activity - la_activity(): cookie = 0x2b60c25685c0; flag = LA_ACT_CONSISTENT
[[email protected]:30] remove_lib_rogot - Checking whether /lib64/libc.so.6 has R GOT
[[email protected]:41] remove_lib_rogot - Changing /lib64/libc.so.6 R GOT to RW GOT from 2b60c2b40000 to 2b60c2b44000
[[email protected]:30] remove_lib_rogot - Checking whether /lib64/ld-linux-x86-64.so.2 has R GOT
[[email protected]:41] remove_lib_rogot - Changing /lib64/ld-linux-x86-64.so.2 R GOT to RW GOT from 2b60c2566000 to 2b60c2567000
[[email protected]:39] spindle_la_activity - la_activity(): cookie = 0x2b60c25685c0; flag = LA_ACT_CONSISTENT
[Server.252113@ldcs_api_listen.c:174] ldcs_listen - Select returned data. Calling callback for fd 14 id=0
[Server.252113@ldcs_audit_server_client_cb.c:61] _ldcs_client_CB - Receiving message from client 0 on fd 14
[Server.252113@ldcs_api_pipe.c:387] _ldcs_read_pipe - before read from fifo 14, bytes_to_read = 8
[Server.252113@ldcs_api_pipe.c:398] _ldcs_read_pipe - read from fifo: 0 bytes ...
[Server.252113@ldcs_api_pipe.c:338] ldcs_recv_msg_static_pipe - Client disconnected. Returning END message
The result of the readelf -d for each application binary.
$ LC_ALL=C readelf -d hello_bind_now
Dynamic section at offset 0xdd8 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x4003e0
0x000000000000000d (FINI) 0x4005c4
0x0000000000000019 (INIT_ARRAY) 0x600dc0
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x600dc8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400298
0x0000000000000005 (STRTAB) 0x400318
0x0000000000000006 (SYMTAB) 0x4002b8
0x000000000000000a (STRSZ) 61 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x600fc8
0x0000000000000002 (PLTRELSZ) 72 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x400398
0x0000000000000007 (RELA) 0x400380
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x0000000000000018 (BIND_NOW)
0x000000006ffffffb (FLAGS_1) Flags: NOW
0x000000006ffffffe (VERNEED) 0x400360
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x400356
0x0000000000000000 (NULL) 0x0
$
$ LC_ALL=C readelf -d hello
Dynamic section at offset 0xe28 contains 24 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x4003e0
0x000000000000000d (FINI) 0x4005c4
0x0000000000000019 (INIT_ARRAY) 0x600e10
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x600e18
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400298
0x0000000000000005 (STRTAB) 0x400318
0x0000000000000006 (SYMTAB) 0x4002b8
0x000000000000000a (STRSZ) 61 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x601000
0x0000000000000002 (PLTRELSZ) 72 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x400398
0x0000000000000007 (RELA) 0x400380
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x400360
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x400356
0x0000000000000000 (NULL) 0x0
$
Both of the options in the above title change Spindle to invoke the application executable from global storage rather than local. The execv in the bootstrapper can fail when pointed at a relative path'd global executable. This doesn't happen for local executables since we construct the path and make sure it's absolute.
The execv in spindle_bootstrap should thus be changed to an execvp.
The currently available spack installation for Spindle pulls release 0.8.1 and fails due to narrowing error in compilation of spindle_logd.cc in the logging directory:
133 CXX spindle_logd.o
>> 134 spindle_logd.cc:65:76: error: narrowing conversion of '255' from 'int' to 'char' inside { } [-Wnarrowing]
135 static char exitcode[8] = { 0x01, 0xff, 0x03, 0xdf, 0x05, 0xbf, 0x07, '\n' };
136 ^
>> 137 spindle_logd.cc:65:76: error: narrowing conversion of '223' from 'int' to 'char' inside { } [-Wnarrowing]
>> 138 spindle_logd.cc:65:76: error: narrowing conversion of '191' from 'int' to 'char' inside { } [-Wnarrowing]
139 CCLD libspindlelogc.la
140 make[2]: *** [Makefile:386: spindle_logd.o] Error 1
141 make[2]: Leaving directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-u6g66hhvbkxfa7n32x2gzferzpurspf3/spack-src/logging'
142 make[1]: *** [Makefile:319: all-recursive] Error 1
143 make[1]: Leaving directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-u6g66hhvbkxfa7n32x2gzferzpurspf3/spack-src'
144 make: *** [Makefile:248: all] Error 2
I can work around this by using ./bin/spack install spindle cxxflags="-Wno-narrowing"
but likely the spack package should be updated and this flag fixed for the older tarball for manual installations.
When I try SPINDLE, I found that $ORIGIN in $RPATH in nested dependency is not handled correctly and the process cannot load some libraries.
Example: When my python script on my environment imports matplotlib
,
/path/to/lib/python2.7/site-packages/numpy/core/multiarray.so
/tmp/spindle.PIDNUM/b0-_path_to_lib_python2.7_site-packages_numpy_core_multiarray.so
$ORIGIN/../.libs/tls/x86_64/libopenblasp.so
In this case, $ORIGIN/../.libs/tls/x86_64/libopenblasp.so
should be expands as /path/to/lib/python2.7/site-packages/numpy/core/../.libs/libopenblasp.so
.
However, SPINDLE expands as /tmp/spindle.PIDNUM/../.libs/tls/x86_64/libopenblasp.so
.
i.e. SPINDLE expands $ORIGIN
as /tmp/spindle.PIDNUM/
instead of /path/to/lib/python2.7/site-packages/numpy/core/
As a result, the process cannot load multiarray.so.
This issue may be similar to #17, but current SPINDLE runs with --debug=yes
in default.
A plain spack installation or manual installation without an active MPI variant defined throws an error with a missing mpi.h
in the testsuite, as below:
352 make[3]: Entering directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-av65uymhbjk5xlot4r7o7zrdplcrathu/spack-src/testsuite'
353 CC test_driver-test_driver.o
354 CC test_driver_libs-test_driver.o
>> 355 test_driver.c:17:10: fatal error: mpi.h: No such file or directory
356 #include <mpi.h>
357 ^~~~~~~
358 compilation terminated.
>> 359 test_driver.c:17:10: fatal error: mpi.h: No such file or directory
360 #include <mpi.h>
361 ^~~~~~~
362 compilation terminated.
363 make[3]: *** [Makefile:340: test_driver-test_driver.o] Error 1
364 make[3]: *** Waiting for unfinished jobs....
365 make[3]: *** [Makefile:356: test_driver_libs-test_driver.o] Error 1
Does Spindle require a specific MPI package to be set up to address the missing mpi.h
and if so, is a separate Spindle instance required for each MPI variant to be used? We have many MPI variants in use, of course, so the latter would definitely be a hassle to use, but I suspect I am missing something obvious here.
./configure --enable-sec-none --with-hostbin=/scratch/pmpi/dsolt/WORKSPACE/spindle/myscript.sh
make
make[4]: Entering directory /scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/client/beboot' CC spindle_bootstrap-spindle_bootstrap.o CC spindle_bootstrap-parseloc.o CC spindle_bootstrap-spindle_mkdir.o make[4]: *** No rule to make target
../auditclient/exec_util.c', needed by spindle_bootstrap-exec_util.o'. Stop. make[4]: Leaving directory
/scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/client/beboot'
CentOS 7.6 on Intel Westmere
After building and installing LaunchMON and then trying to build Spindle from the current sources via git clone from their current GitHub source locations, I encounter the following problem in the "make" step. It appears to be having trouble finding and using libfuncdict.so in the testsuite area:
Making all in testsuite
make[2]: Entering directory `/root/spindle-test/Spindle/testsuite'
CCLD libfuncdict.so
/bin/sh: not-found: command not found
make[2]: *** [libfuncdict.so] Error 127
make[2]: Leaving directory `/root/spindle-test/Spindle/testsuite'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/spindle-test/Spindle'
make: *** [all] Error 2
Separately installing via spack, it seems to hang during the build and if I look for recently modified files, I see the following:
# find /tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/ -mmin -10
/tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/spack-src/testsuite
/tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/spack-src/testsuite/libtest4000.so
Checking the shared library that is open, it appears also to lack libfunctdict.so:
# ldd /tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/spack-src/testsuite/libtest4000.so
linux-vdso.so.1 => (0x00007ffd0472d000)
libfuncdict.so => not found
libc.so.6 => /lib64/libc.so.6 (0x00007f993a6e5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f993afb3000)
I have question about SPINDLE with OpenMPI launcher.
Is there any way to use SPINDLE with OpenMPI launcher without MPIR?
For example, can SPINDLE run with PMIx instead of MPIR?
If there is no way currently, is there any plan to support PMIx?
I'm trying to install spindle and the make is failing with:
/bin/sh: not-found: command not found
make[2]: *** [libfuncdict.so] Error 127
make[2]: Leaving directory `/spindle/testsuite'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/spindle'
make: *** [all] Error 2
Is this something provided by an mpi library? I hadn't installed one in the container yet (is this required for spindle, or does it help with other kinds of loads outside of MPI?)
A new page on the LLNL software portal (https://software.llnl.gov/radiuss/) will soon dynamically pull in RADIUSS repos. To achieve this, we need RADIUSS-related repos outside the LLNL org to be tagged with relevant topics. See LLNL/llnl.github.io#17, LLNL/llnl.github.io#151, & https://github.com/LLNL/llnl.github.io/blob/new-home-page/radiuss/README.md for additional context.
For Spindle, please add performance
and radiuss
.
Also, you may want to update the computation.llnl.gov in README to https://computation.llnl.gov/projects/spindle/.
This feature is currently available by running with --debug=yes
. It is also leaving the first page of the global file mapped while the remaining pages are mapped to the local file. During testing, determine if the entire local file can be used.
Also, determine whether core dumping will work as expected with the text and data remapped in this way.
I'm getting errors in testing and attempted usage that Spindle cannot connect to some session. I'm installing as follows:
./configure --with-munge-dir=/etc/munge --enable-sec-munge --with-slurm-dir=/etc/slurm --with-testrm=slurm
make
make install
And I've tried that with both slurm and openmpi as the "testrm" And then I make the tests
cd testsuite
make
./runTests
but no matter what I do (using the slurm or openmpi template, both of which I have) I see this error:
Running: ./run_driver --partial --session
ERROR: Spindle could not connect to session tn2VYQ
I saw this same error in trying to just use spindle so I've gone back to the tests to debug. Note that I do have a /tmp area:
ls /tmp/
ccFjQGLR.s ks-script-eC059Y spin.kT6PPu spin.tn2VYQ spin.Un7RTL yum.log
Update: I think it could possibly be that they need to see the same /tmp area - so I'm rebuilding the containers with a shared /tmp area and will report back.
This is discussed some in pull request #9.
I discovered yesterday that when I used prefix=/foo/bar/linux-rhel7 for configure, the C preprocessor was expanding that to "/foo/bar/1-rhel7". This is because "linux" was itself a macro which expanded to "1" in the code when $SRC/src/logging/spindle_logc.c was pre-processed.
It has been noticed that python will sometimes perform an fstat() operations on local .py files while performing stat() operations on global .pyc files which may yield unexpected results when comparing modification times of .py and .pyc files.
The Spack package needs to be updated to v0.13 - adding [email protected]
to a Spack environment and running spack install
gives the warning:
==> Warning: There is no checksum on file to fetch [email protected] safely.
The license files in Spindle have the incorrect street address for the Free Software Foundation:
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
This should be updated to the new address:
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
https://fedoraproject.org/wiki/Common_Rpmlint_issues#incorrect-fsf-address
Missing support for ppc64le
(Power8 and later) processors. Architectures ppc64
and x86_64
are currently supported.
Are there any plans on adding support for this architecture in the future?
/scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/fe/hostbin/launch_hostbin.cc:297: undefined reference to pthread_join' ../hostbin/.libs/libhostbin.a(libhostbin_la-launch_hostbin.o): In function
IOThread::~IOThread()':
/scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/fe/hostbin/launch_hostbin.cc:297: undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.