parres / isx Goto Github PK
View Code? Open in Web Editor NEWScalable Integer Sort application for co-design in the exascale era
License: Other
Scalable Integer Sort application for co-design in the exascale era
License: Other
In the NPB Integer sort key's are not only in range of PE's bucket boundary but also they are in sorted order within the bucket. If the ISx is inspired by NPB Integer sort, then why it just implementing sending of the keys to respective PE's bucket and NOT doing actual sorting of those keys within the bucket. Or is the ISx just implemented to understand the all-to-all communication pattern of bucket sort algorithm.
I am having problems trying to build the SHMEM port of the ISx benchmark on our Cray XC40 using PrgEnv-gnu
(GCC 6.3.0). The error I get is:
$ make
cc -Wall -Wextra -std=c99 -D SCALING_OPTION=1 -c isx.c -o obj/isx.o_s
cc -Wall -Wextra -std=c99 -D SCALING_OPTION=1 -c pcg_basic.c -o obj/pcg_basic.o_s
cc -Wall -Wextra -std=c99 -D SCALING_OPTION=1 -c timer.c -o obj/timer.o_s
cc obj/isx.o_s obj/pcg_basic.o_s obj/timer.o_s -o bin/isx.strong -lrt -lm
obj/isx.o_s: In function `verify_results':
isx.c:(.text+0xd34): relocation truncated to fit: R_X86_64_32S against symbol `llWrk' defined in COMMON section in obj/isx.o_s
obj/isx.o_s: In function `log_times':
isx.c:(.text+0xe00): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0xe23): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0xe3e): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0xe61): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
obj/isx.o_s: In function `print_timer_names':
isx.c:(.text+0x10b2): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0x10f2): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
obj/isx.o_s: In function `print_timer_values':
isx.c:(.text+0x125b): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0x127d): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0x12c3): relocation truncated to fit: R_X86_64_32S against symbol `timers' defined in COMMON section in obj/timer.o_s
isx.c:(.text+0x12e5): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
make: *** [isx.strong] Error 1
I assume this is due to the large static array my_bucket_keys
defined in isx.c
so I tried to compile with -mcmodel=large
, which only shifts the problem to some PMI function, so compilation still fails.
The MPI version of ISx builds just fine. Is this a known problem? Could it be a system configuration issue? Any help is much appreciated.
Building with OpenMPI on GCC:
timer.c:100:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
mpicc obj/pcg_basic.o_s obj/isx.o_s obj/timer.o_s -o bin/isx.strong -lrt
/usr/lib64/gcc/x86_64-suse-linux/4.6/../../../../x86_64-suse-linux/bin/ld: obj/isx.o_s: undefined reference to symbol 'ceil@@GLIBC_2.2.5'
/usr/lib64/gcc/x86_64-suse-linux/4.6/../../../../x86_64-suse-linux/bin/ld: note: 'ceil@@GLIBC_2.2.5' is defined in DSO /lib64/libm.so.6 so try adding it to the linker command line
/lib64/libm.so.6: could not read symbols: Invalid operation
Need to add -lm to Makefile
In the SHMEM version:
558 // Verify the final number of keys equals the initial number of keys
559 static long long int total_num_keys = 0;
560 shmem_longlong_sum_to_all(&total_num_keys, &my_bucket_size, 1, 0, 0, NUM_PES, llWrk, pSync);
561 shmem_barrier_all();
my_bucket_size should be symmetric (spec has both source and dest of collectives as symmetric)
"atoi" function is used to get command line value for "TOTAL_KEYS" or "NUM_KEYS_PER_PER" and then it is type cast to "uint64_t", problem with "atoi" is that it cannot convert value more than "2^31" it will truncate that value. Instead "strtoull" can be used.
Also please comment whether the application can scale beyond "2^31" KEYS_PER_PE for weak_scaling experiment or not, considering sufficient memory is available.
Particularly for SHMEM version where symmetric heap memory is limited to 2^28 elements, can increasing this symmetric heap memory will solve the scaling limitation and is there any limitation of memory to be allocated on symmetric heap??
MPI/isx.c contains two instances of MPI_Barrier
before MPI_Allgather
. MPI_Allgather
has barrier semantics for finite count due to data dependencies and does not need this unless the MPI implementation sucks, in which case it should be a runtime option to barrier in such cases (Cray MPI already supports this via an environment variable, by the way).
In both MPI and SHMEM subdirectories:
make
cc -Wall -Wextra -std=c99 -D SCALING_OPTION=1 -c pcg_basic.c -o obj/pcg_basic.o_s
Assembler messages:
Fatal error: can't create obj/pcg_basic.o_s: No such file or directory
make: *** [obj/pcg_basic.o_s] Error 1
The "obj" subdirectory is not created by make. Also "bin" needs to be created during link.
There are several instances of deprecated routines in the SHMEM version ISx (the atomics, sum_to_all, fcollect, etc.) so a bunch of warnings. No big deal, just tracking it here.
Suggestion: allow people to override OPTFLAGS etc. in Makefile through optional local make.def (or better name, perhaps) file, so that a pristine Makefile can always be pulled/merged.
I think normal distributions (or Gaussian) are used in the natural and social sciences to represent real-valued random variables. At source version, the number of keys of each rank is tried to keep roughly equal.
/* Determine Redistibution of keys: accumulate the bucket size totals
till this number surpasses NUM_KEYS (which the average number of keys
per processor). Then all keys in these buckets go to processor 0.
Continue accumulating again until supassing 2*NUM_KEYS. All keys
in these buckets go to processor 1, etc.
......
*/
If "NUM_BUCKETS = NUM_PES", I think the amount of computation will be unbalance. Why did you do that? Can you describe the reason for doing this.
I'm curious and want to learn from you. Thanks very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.