parres / kernels Goto Github PK

This is a set of simple programs that can be used to explore the features of a parallel platform.

Home Page: https://groups.google.com/forum/#!forum/parallel-research-kernels

License: Other

C 44.36% C++ 28.35% Makefile 3.01% Shell 1.56% Fortran 12.08% Python 4.53% Julia 0.70% MATLAB 0.54% Rust 1.75% Cuda 1.89% Java 0.26% Go 0.26% Lua 0.07% C# 0.17% Ruby 0.07% Scala 0.24% Ada 0.15%

parallel-programming c c-plus-plus mpi fortran2008 python3 julia pgas openmp shmem

kernels's People

Contributors

Stargazers

Watchers

Forkers

yoyz jbreitbart uwsampa beginzero nchaimov ben-albrecht jdinan jeffhammond rfvander apokayi kempj stjohnt davidozog afanfa npadmana chenghangyu gentryx elliottslaughter marcgamell davidpoliakoff phillipaugustus jcownie-intel nazpyro caizixian mjklemm reble rbuch yoann01 keipertk zhimingwang36 srinathv jungwonkim jokeren sergiorg-hpc stillwater-sc illuhad tisaac ldalessa taihulight gaurav274 alvarovm hachembensalem sailfish009 jeffamstutz faasm jhashweta1 wrrobin coneyliu intellabs achoora vlkale rodburns mandarcthorat d4tocchini atlantapepsi dalcinl jbrodman rabie01 khuck rohany hattom beaujoh yangyang14641 kazutomo vzakhari amamory-ampere tangdehong 5l1v3r1 miguelraz jyoung3131 mfkiwl j-stephan manuelburger bretalfieri kky-fury hpcgarage dbaaha icodein aditirm vvdimako g-angstrom loftyhauser skphy mhoemmen brenhinkeller fruitbox12 s-sajid-ali kim-jeongyoung samaid josefruzicka caschb mrogowski weedge webclinic017 diegocao abc99lr

kernels's Issues

do not use Travis compiler feature; just use env var

See travis-ci/travis-ci#6273 for details. Because CC=icc is impossible with Travis today, and because Python and Julia aren't using any C/C++ compilers, it makes sense to just roll compiler choice into regular environment variables. This will eliminate all of our compiler-oriented exclusions (e.g. do run GCC UPC with Clang; e.g. only run Python and Julia when CC=clang) and thus simplify Travis infrastructure.

See https://github.com/boostorg/hana/blob/master/.travis.yml for useful example.

This will also play nicer with compiler versions, since we can just set e.g. CC=gcc-5 directly rather than detect CC=gcc and then overwrite it or change PATH.

support more extensible command-line argument parsing

The current method of providing arguments is brittle, because arguments are parsed in order and one cannot make argument N-1 optional if argument N is required (see MPI-3 RMA transpose for an example where there are multiple sets of optional arguments).

Here are some potential solutions:

getopt
getopt_long
argtable - This is LGPL.
...

This change would also mitigate my concern in #17.

Support mpich.org failover for Hydra

This is low priority...

mpich.org went down this morning and it broke a number of our builds. I worked around this using my Github fork, but only for MPICH, not Hydra. This means that Sandia OpenSHMEM, which depends on Hydra, is still vulnerable. We can fix this, but it is ever so slightly more involved because Hydra isn't hosted as a standalone Git repo that I can mirror. We will need to download MPICH from the mirror and just build Hydra.

pic.c isn't strict C99

pic.c assumes that M_PI is provided by math.h, which is only true when compiling with -std=gnu99, but not -std=c99.

The solution is trivial. I will push a fix as soon as I test it.

pic.c(174): error: identifier "M_PI" is undefined
    double      step = M_PI / (L-1);
                       ^

test with Open-MPI in Travis

The failure to detect the bug fixed by 75f9b59 for 9 months indicates that testing only with MPICH is not optimal, because all MPI handles are typedef'd to int and thus compilers cannot catch stupid errors like transposed arguments in MPI_Abort.

MPI-2 backport

In order to run on some platforms (e.g. Blue Gene/Q), we need to back-port MPI-3 features to MPI-2 when it is reasonable to do so (e.g. MPIRMA but not MPISHM).

This is currently WIP: https://github.com/ParRes/Kernels/tree/mpi2-workarounds

Coarray fortran stencil code fails for certain numbers of images

@afanfa
Even though I request only 5 images, the stencil kernel thinks there are 6.

[rfvander@esgmonster Stencil]$ ./stencil-coarray 3 50 
Parallel Research Kernels version 2.16
CAF stencil execution on 2D grid
Number of images     =        5
Grid size            =       50
Radius of stencil    =        2
Type of stencil      = star
Data type            = double precision
Compact representation of stencil loop body
Untiled
Number of iterations =        3
forrtl: severe (772): Image number 6 is not a valid image number; valid numbers are 1 to 5
In coarray image 5
Image              PC                Routine            Line        Source             
libicaf.so         00007FC70DC73AFA  Unknown               Unknown  Unknown
stencil-coarray    0000000000407233  Unknown               Unknown  Unknown
stencil-coarray    0000000000403F1E  Unknown               Unknown  Unknown
libc.so.6          000000308521ED5D  Unknown               Unknown  Unknown
stencil-coarray    0000000000403DA9  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000000, 3) - process 4
forrtl: error (69): process interrupted (SIGINT)
In coarray image 1
Image              PC                Routine            Line        Source             
stencil-coarray    000000000047DD61  Unknown               Unknown  Unknown
stencil-coarray    000000000047BBC7  Unknown               Unknown  Unknown
stencil-coarray    000000000044C594  Unknown               Unknown  Unknown
stencil-coarray    000000000044C3A6  Unknown               Unknown  Unknown
stencil-coarray    000000000042FCF4  Unknown               Unknown  Unknown
stencil-coarray    0000000000408764  Unknown               Unknown  Unknown
Unknown            0000003085E0F7E0  Unknown               Unknown  Unknown
libpthread.so.0    0000003085E0B68A  Unknown               Unknown  Unknown
libmpi.so.12       00007F46461225A5  Unknown               Unknown  Unknown
libmpi.so.12       00007F4645F8158A  Unknown               Unknown  Unknown
libmpi.so.12       00007F464608CC29  Unknown               Unknown  Unknown
libmpi.so.12       00007F464608D26A  Unknown               Unknown  Unknown
libmpi.so.12       00007F4645F65EEF  Unknown               Unknown  Unknown
libmpi.so.12       00007F4645F65875  Unknown               Unknown  Unknown
libmpi.so.12       00007F4645F656CC  Unknown               Unknown  Unknown
libmpi.so.12       00007F4645F655FC  Unknown               Unknown  Unknown
libmpi.so.12       00007F4645F67482  Unknown               Unknown  Unknown
libicaf.so         00007F4645BF839E  Unknown               Unknown  Unknown
stencil-coarray    00000000004057D4  Unknown               Unknown  Unknown
stencil-coarray    0000000000403F1E  Unknown               Unknown  Unknown
libc.so.6          000000308521ED5D  Unknown               Unknown  Unknown
stencil-coarray    0000000000403DA9  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000000, 3) - process 0
forrtl: error (69): process interrupted (SIGINT)
In coarray image 2
Image              PC                Routine            Line        Source             
stencil-coarray    000000000047DD61  Unknown               Unknown  Unknown
stencil-coarray    000000000047BBC7  Unknown               Unknown  Unknown
stencil-coarray    000000000044C594  Unknown               Unknown  Unknown
stencil-coarray    000000000044C3A6  Unknown               Unknown  Unknown
stencil-coarray    000000000042FCF4  Unknown               Unknown  Unknown
stencil-coarray    0000000000408764  Unknown               Unknown  Unknown
Unknown            0000003085E0F7E0  Unknown               Unknown  Unknown
libpthread.so.0    0000003085E0B68A  Unknown               Unknown  Unknown
libmpi.so.12       00007F528AA355A5  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A89458A  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A99FC29  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A99F922  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A878E55  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A878875  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A8786CC  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A8785FC  Unknown               Unknown  Unknown
libmpi.so.12       00007F528A87A482  Unknown               Unknown  Unknown
libicaf.so         00007F528A50B39E  Unknown               Unknown  Unknown
stencil-coarray    00000000004057D4  Unknown               Unknown  Unknown
stencil-coarray    0000000000403F1E  Unknown               Unknown  Unknown
libc.so.6          000000308521ED5D  Unknown               Unknown  Unknown
stencil-coarray    0000000000403DA9  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000000, 3) - process 1
forrtl: error (69): process interrupted (SIGINT)
In coarray image 3
Image              PC                Routine            Line        Source             
stencil-coarray    000000000047DD61  Unknown               Unknown  Unknown
stencil-coarray    000000000047BBC7  Unknown               Unknown  Unknown
stencil-coarray    000000000044C594  Unknown               Unknown  Unknown
stencil-coarray    000000000044C3A6  Unknown               Unknown  Unknown
stencil-coarray    000000000042FCF4  Unknown               Unknown  Unknown
stencil-coarray    0000000000408764  Unknown               Unknown  Unknown
Unknown            0000003085E0F7E0  Unknown               Unknown  Unknown
libpthread.so.0    0000003085E0B68A  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DDC15A5  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DC2058A  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DD2BC29  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DD2B922  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DC04E55  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DC04875  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DC046CC  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DC045FC  Unknown               Unknown  Unknown
libmpi.so.12       00007F808DC06482  Unknown               Unknown  Unknown
libicaf.so         00007F808D89739E  Unknown               Unknown  Unknown
stencil-coarray    00000000004057D4  Unknown               Unknown  Unknown
stencil-coarray    0000000000403F1E  Unknown               Unknown  Unknown
libc.so.6          000000308521ED5D  Unknown               Unknown  Unknown
stencil-coarray    0000000000403DA9  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000000, 3) - process 2
forrtl: error (69): process interrupted (SIGINT)
In coarray image 4
Image              PC                Routine            Line        Source             
stencil-coarray    000000000047DD61  Unknown               Unknown  Unknown
stencil-coarray    000000000047BBC7  Unknown               Unknown  Unknown
stencil-coarray    000000000044C594  Unknown               Unknown  Unknown
stencil-coarray    000000000044C3A6  Unknown               Unknown  Unknown
stencil-coarray    000000000042FCF4  Unknown               Unknown  Unknown
stencil-coarray    0000000000408764  Unknown               Unknown  Unknown
Unknown            0000003085E0F7E0  Unknown               Unknown  Unknown
libpthread.so.0    0000003085E0B68A  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A59775A5  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A57D658A  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A58E1C29  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A58E1922  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A57BAE55  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A57BA875  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A57BA6CC  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A57BA5FC  Unknown               Unknown  Unknown
libmpi.so.12       00007FA8A57BC482  Unknown               Unknown  Unknown
libicaf.so         00007FA8A544D39E  Unknown               Unknown  Unknown
stencil-coarray    00000000004057D4  Unknown               Unknown  Unknown
stencil-coarray    0000000000403F1E  Unknown               Unknown  Unknown
libc.so.6          000000308521ED5D  Unknown               Unknown  Unknown
stencil-coarray    0000000000403DA9  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000000, 3) - process 3

implement verbose mode

We should agree on the type of thing that gets reported when we enable VERBOSE mode for the kernels, and then implement that mode for all kernels. - @rfvander

support Intel compilers in Travis

This depends on the outcome of travis-ci/travis-ci#4604.

implement get-driven transpose

We currently use Put in the PGAS versions of Transpose. We can use Get instead. This shouldn't make a huge difference, but it is worth having both.

RFC: support PGI compilers in Travis?

https://github.com/nemequ/pgi-travis shows how one can do it.

I have not decided if it is worthwhile to do this.

Pro

Validate against yet another C compiler toolchain.
Provides a second Fortran compiler toolchain (since Clang has none).

Con

I am not aware of any novel features this provides relative to GCC.
Lacks Fortran 2008 support, including do concurrent, norm2 and coarrays, all of which are used in the Fortran PRKs. I will summarize these in a second ticket.

add Fortran coarray PRKs

There are two issues here:

All PRK code is in C. This issue requires us to convert them Fortran (non-exclusively, obviously).
Fortran coarray semantics match SHMEM pretty well. Those are probably the best PRK set to port.

better documentation - explain make.deps

add README.md for make.deps and explain the examples we are providing already.

use Travis caching

I am a terrible human being for not implementing this sooner. Many polar bears have died due to the hundreds of unnecessary MPICH builds I've run in AWS because of not caching them.

See https://docs.travis-ci.com/user/caching/#Fetching-and-storing-caches.

use zero-copy send-recv in FG-MPI

FG-MPI provides zero-copy send-recv, which we can/should use within a node.

This might require some changes to memory allocation, especially in transpose, but I need to think about this more.

int MPIX_Zsend(const void **, int, MPI_Datatype, int, int, MPI_Comm);
int MPIX_Izsend(const void **, int, MPI_Datatype, int, int, MPI_Comm, MPI_Request *);
int MPIX_Zrecv(void **, int, MPI_Datatype, int, int, MPI_Comm, MPI_Status *);
int MPIX_Izrecv(void **, int, MPI_Datatype, int, int, MPI_Comm, MPI_Request *);

Particularly since I am working on ownership-passing as a feature for MPI-4 (mpi-forum/mpi-issues#32), it would be good to show the value of this using the PRK.

Legion: Default mapper failed allocation for region requirement

This machine has 64 GB. What sort of nonsensical runtime cannot do a 7200x7200 transpose with that?

[jrhammon@esgmonster Transpose]$ ./transpose 1 10 $((36*100)) 1 32
Parallel Research Kernels version 2.17
Legion matrix transpose: B = A^T
Number of threads    = 1
Matrix order         = 3600
Number of iterations = 10
Tile size            = 1
Rate (MB/s): 1516.739482 Avg time (s): 0.136714
Solution validates
[jrhammon@esgmonster Transpose]$ ./transpose 1 10 $((36*200)) 1 32
Parallel Research Kernels version 2.17
Legion matrix transpose: B = A^T
Number of threads    = 1
Matrix order         = 7200
Number of iterations = 10
Tile size            = 1
[0 - 7f8381de8700] {5}{default_mapper}: Default mapper failed allocation for region requirement 0 of task unnamed_task_3 (UID 14) in memory 1e00000000000000 for processor 1d00000000000001. This means the working set of your application is too big for the allotted capacity of the given memory under the default mapper's mapping scheme. You have three choices: ask Realm to allocate more memory, write a custom mapper to better manage working sets, or find a bigger machine. Good luck!
transpose: /opt/legion/git/runtime/mappers/default_mapper.cc:2149: void Legion::Mapping::DefaultMapper::default_report_failed_instance_creation(const Legion::Task&, unsigned int, Legion::Processor, Legion::Memory) const: Assertion `false' failed.
Aborted (core dumped)

use MPI_Accumulate in RMA Transpose

Because we are doing B+=transpose(A), we can and should implement a variant based upon MPI_Accumulate.

need libm for sqrt in MPI-1

Identical to #100 except for MPI instead of SHMEM.

use better solution for overflow in transpose

@rfvander In looking more at the code where we handle integer overflows in transpose, I think we did it wrong. Of course, this is my fault, since I was the one who started promoting 32b integers to 64b integers.

The overflow issue emerges only when we multiply two integers, since our square matrices will never be anywhere near 2B by 2B. In Fortran, we will never have any issue indexing with 32b integers, because we only index the dimensions independently. However, in the C code when we do A[i*order+j], we overflow.

I propose that we go back to 32b integers and handle the overflow by explicitly casting inside of multiplied expressions like A[i*order+j], which will become A[(size_t)i*(size_t)order+(size_t)j] or by using C99 VLAs.

One motivation for this is that modern processors are still better at handling 32b loop indices.

OpenMP kernels should use OMP_NUM_THREADS not argv[1] to get thread count

It is nicer for the user if Serial and OpenMP have the same arguments, rather than add what is already supported by OpenMP runtime calls and OMP_NUM_THREADS.

print kernel parameters before initialization

We should print all the kernel’s parameters before (possible time-consuming) initialization. That way we’ll know why a code may be hanging. - @rfvander

implement prk_malloc

We need a way to control alignment transparently. We should replace malloc with prk_malloc and implement the latter in terms of posix_memalign, _mm_malloc, or malloc, depending on context.

We can make the alignment a runtime option via an environment variable PRK_ALIGNMENT, at least for C. For Fortran, it will have to be a compile-time option.

port to Chapel

@bradcray Please let me know how we can help here.

Fortran 2008 pretty stencil

With Fortran array notation, we should be able to do the equivalent of the Numpy stencil implementation that doesn't require explicit loops on the domain.

This is low priority, but very easy to do, given how similar Numpy and Fortran are with respect to array notation.

Legion breaks 'make clean'

When not using Legion, make clean is broken:

cd LEGION/Stencil;          /Applications/Xcode.app/Contents/Developer/usr/bin/make clean
Makefile:58: /runtime/runtime.mk: No such file or directory
make[1]: *** No rule to make target `/runtime/runtime.mk'.  Stop.

add HPX kernels

These are being implemented. Add to the repo when appropriate.

HPX = HPX-5 from IU

Avoid inconsistencies in kernels that contain code written by scripts

In Stencil, AMR, and Branch some code snippets are written by scripts. These are included in the main program. When you build the code with new command line parameters (for example, RADIUS for stencil kernels), the auto-generated code snippets do not get changed, which may lead to inconsistencies. When you first "make clean" in that directory, you may think you are starting with a clean slate, but the snippets are only removed using "make veryclean." This looked like a convenience, since in Branch it may take a long time for the script to write that code. But it can easily lead to confusion. I will fix this.

Fortran OpenMP stencil is wrong

On monster. Need to debug.

implement OpenCL kernels

Implement flat OpenCL along the lines of serial or OpenMP.
Implement MPI1+OpenCL as a baseline for D+X implementations (where D=distributed model, e.g. MPI, SHMEM, etc.).

new build system

I want to write a totally awesome autotools build system some day.

In case any lurkers ask, "Have you thought about CMake?", the answer is "Yes, and every one of those thoughts is a moment I'd like to have back."

add HPX++ kernels

There has been some effort to implement these. This issue is to remind us to integrate it as appropriate.

implement OpenMP target kernels

This is already in progress (https://github.com/jeffhammond/PRK/tree/openmp-offload/OPENMP) but limited compiler issues.

Charm++/AMPI SMP-build runtime parameters missing

SMP builds of Charm++ and AMPI currently require that users provide runtime parameters for the numbers of worker threads and communication threads desired per node, for best performance on a variety of node architectures.

SMP mode means that the runtime has a dedicated communication thread for each process. So in addition to '+p N' which represents the total number of processing elements available for application chares to run on, you will want to add '++ppn' to specify the number of processing elements (cores/hyperthreads) per node.

Additionally you can set the mapping of worker and communication threads within a process using '+pemap' to specify the mapping of worker PEs within a node and '+commap' to specify the mapping of communication threads within a node.

A simple SMP build run command on a node with 8 cores, using 1 process with 7 worker threads and 1 dedicated communication thread would be:
./charmrun ./program +p 7 ++ppn 7

Or with explicit PE and comm. thread mapping:
./charmrun ./program +p 7 ++ppn 7 +pemap 0-6 +commap 7

I see in the travis build script that you are building Charm++/AMPI in SMP mode, but that the test scripts do not use these options. Running in SMP mode without passing these parameters should generate some warnings at the top of charmrun's output about oversubscribing threads to your available resources if you run with a large enough '+p' option compared your machine's core count. The current default will assume that the '+p N' option is how many processes you want if there is no '++ppn' option, and then assign one worker thread and one comm. thread for each of those processes, generating 2*N threads in total.

We are working on an automated/portable command line option that will pick a reasonable configuration for a given machine. Here is some more information on the current runtime parameters:
http://charm.cs.illinois.edu/manuals/html/charm++/C.html#SECTION05322000000000000000

add OpenCoarrays to Travis for Fortran 2008 testing

Currently we are only able to test -fcoarray=single because GCC does not ship with multi-image coarray support enabled.

add OCR kernels

There has been work on this already. This issue is to remind us to integrate it as appropriate.

better documentation - explain how to build dependencies

We worked very hard to get everything working in Travis. this demonstrates how to build and run everything, but it would be ideal if we translated this information into documentation for how to build the dependencies.

port to Intel TBB

There was some community interest in this: http://sourceforge.net/p/chapel/mailman/message/34744301/.

This will be low priority for the foreseeable future. A third-party contribution would be much appreciated.

support Clang for OpenMP testing

@jcownie-intel would like us to do this.

If Clang 3.8 (most recent package in the whitelist supports OpenMP, then this is easy. If we have to build from source, it's not worth it. There may be a middle ground (package install from userspace).

add some make.defs examples

New users would benefit from example make.defs files for different platforms.

merge/systematize MPI1, AMPI and FG_MPI

The MPI-1 oriented implementations that use standard MPI, AMPI and FG-MPI are not different enough to justify separate source files. We should figure out a way to merge them and support the minor differences with preprocessor macros, etc.

The changes encouraged in #83 might be large enough to justify a separate code, although they may be rather similar to the MPI+MPI aka MPISHM implementation. Need to think about this more...

add OCR to Travis

Prerequisite for adding kernels.

print kernel name and version before parsing input

We should print the name and version of the kernel even before processing the input parameters. That way, when we report an error in those parameters that lands in a script’s output, we know to which kernel it belongs. - @rfvander

use preprocessor-friendly OpenMP syntax

We should use C99/C++11 _Pragma to hoist out any inline preprocessing of OpenMP constructs.

See https://github.com/jeffhammond/nwchem-tce-triples-kernels/blob/master/src/pragma_openmp.h for how this works.

Among other things, this will allow us to use OpenMP 4 features like omp simd when compilers support it (as the Intel compiler does).

SHMEM Transpose Kernel on 1 PE

The SHMEM transpose kernel fails spectacularly when run on 1 PE. The error is a result of free(Work_in_p), which is not allocated in single PE runs.

$ mpiexec.hydra -np 1 ./SHMEM/Transpose/transpose 10 1000
Parallel Research Kernels version 2.15
SHMEM matrix transpose: B = A^T
Number of ranks      = 1
Matrix order         = 1000
Number of iterations = 10
Tile size            = 32
Solution validates
Rate (MB/s): 6252.165050 Avg time (s): 0.002559
transpose(43428,0x7fff78246000) malloc: *** error for object 0x1: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

support Travis CI

This is already in-progress (https://github.com/jeffhammond/PRK/tree/travis-ci).

Fortran coarray Stencil incorrect (sometimes)

Correctness depends on the problem size...

Right

[jrhammon@esgmonster Stencil]$ ./stencil-coarray 10 300
Parallel Research Kernels version 2.16
CAF stencil execution on 2D grid
Number of images     =       36
Grid size            =      300
Radius of stencil    =        2
Type of stencil      = star
Data type            = double precision
Compact representation of stencil loop body
Untiled
Number of iterations =       10
Solution validates
Rate (MFlops/s):    133.669145 Avg time (s):      0.012454

Wrong

[jrhammon@esgmonster Stencil]$ ./stencil-coarray 10 400
Parallel Research Kernels version 2.16
CAF stencil execution on 2D grid
Number of images     =       36
Grid size            =      400
Radius of stencil    =        2
Type of stencil      = star
Data type            = double precision
Compact representation of stencil loop body
Untiled
Number of iterations =       10
ERROR: L1 norm =     21.557800 Reference L1 norm =     22.000000
Rate (MFlops/s):    216.659568 Avg time (s):      0.013752

@afanfa Did you ever see this with OpenCoarrays? I don't test with that, and wonder if I need to investigate if this is an Intel compiler bug.

`norm2` intrinsic

This is a known and acknowledged issue (https://www.pgroup.com/userforum/viewtopic.php?t=5321&sid=6fd5d9a3a6077568d9e0c3b5929d8846)

pgfortran -Mpreprocess -O2 -DPRKVERSION="'2.16'" -DRADIUS=  transpose-pretty.f90 -o transpose-pretty
PGF90-S-0038-Symbol, norm2, has not been explicitly declared (transpose-pretty.f90)
  0 inform,   0 warnings,   1 severes, 0 fatal for main
make: *** [transpose-pretty] Error 2

`do concurrent`

pgfortran -Mpreprocess -O2 -DPRKVERSION="'2.16'" -DRADIUS=  transpose.f90 -o transpose
PGF90-S-0034-Syntax error at or near : (transpose.f90: 181)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 182)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 201)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 202)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 243)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 244)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 263)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 264)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 309)
PGF90-S-0034-Syntax error at or near : (transpose.f90: 310)
  0 inform,   0 warnings,  10 severes, 0 fatal for main
make: *** [transpose] Error 2

Coarrays

For example:

pgfortran -Mpreprocess -O2 -DPRKVERSION="'2.16'" -DRADIUS=  transpose-coarray.f90  -o transpose-coarray
PGF90-S-0034-Syntax error at or near '(/' (transpose-coarray.f90: 76)
PGF90-S-0134-Illegal attribute - duplicate allocatable (transpose-coarray.f90: 77)
PGF90-S-0034-Syntax error at or near '(/' (transpose-coarray.f90: 77)
PGF90-S-0134-Illegal attribute - duplicate allocatable (transpose-coarray.f90: 78)
PGF90-S-0034-Syntax error at or near '(/' (transpose-coarray.f90: 162)
PGF90-S-0034-Syntax error at or near '(/' (transpose-coarray.f90: 168)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 192)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 193)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 203)
PGF90-S-0034-Syntax error at or near identifier all (transpose-coarray.f90: 210)
PGF90-S-0034-Syntax error at or near identifier all (transpose-coarray.f90: 215)
PGF90-S-0034-Syntax error at or near '(/' (transpose-coarray.f90: 238)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 243)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 244)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 260)
PGF90-S-0034-Syntax error at or near identifier all (transpose-coarray.f90: 265)
PGF90-S-0034-Syntax error at or near : (transpose-coarray.f90: 274)
PGF90-S-0034-Syntax error at or near identifier all (transpose-coarray.f90: 279)
PGF90-S-0034-Syntax error at or near identifier all (transpose-coarray.f90: 283)
  0 inform,   0 warnings,  19 severes, 0 fatal for main
make: *** [transpose-coarray] Error 2

parres / kernels Goto Github PK

kernels's People

Contributors

Stargazers

Watchers

Forkers

kernels's Issues

Pro

Con

Right

Wrong

norm2 intrinsic

do concurrent

Coarrays

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`norm2` intrinsic

`do concurrent`