GithubHelp home page GithubHelp logo

rodarima / cpic Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 22.18 MB

Particle in Cell simulation of plasma in C

License: GNU General Public License v3.0

Makefile 2.87% C 64.52% C++ 1.36% MATLAB 1.21% Python 3.50% Gnuplot 0.01% Shell 24.75% Assembly 1.78%

cpic's People

Contributors

rodarima avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

bbw7561135

cpic's Issues

plist_close_remove: Assertion `p->magic[iv] == MAGIC_PARTICLE' failed

cpic: src/plist.c:353: void plist_sanity_check(plist_t *): Assertion `p->magic[iv] == MAGIC_PARTICLE' failed.

Thread 1 "cpic" received signal SIGABRT, Aborted.
0x00007ffff755cce5 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff755cce5 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7546857 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7546727 in __assert_fail_base.cold () from /usr/lib/libc.so.6
#3  0x00007ffff7555426 in __assert_fail () from /usr/lib/libc.so.6
#4  0x000055555556777c in plist_sanity_check (l=0x55555571bf78) at src/plist.c:353
#5  0x00005555555681a0 in plist_close_remove (l=0x55555571bf78) at src/plist.c:1178
#6  0x0000555555567c5d in plist_close (l=0x55555571bf78) at src/plist.c:1213
#7  0x00005555555597ce in collect_pass (ex=0x7fffffffdcc0, dim=0) at src/comm_plasma.c:514
#8  0x0000555555559547 in collect_pset (c=0x55555571a360, set=0x55555571bf70, dim=0) at src/comm_plasma.c:566
#9  0x00005555555592e7 in collect_pchunk (c=0x55555571a360, dim=0) at src/comm_plasma.c:590
#10 0x0000555555558ce0 in collect_plasma (sim=0x555555717560, dim=0) at src/comm_plasma.c:614
#11 0x000055555555891b in comm_plasma_x (sim=0x555555717560, global_exchange=1) at src/comm_plasma.c:847
#12 0x0000555555558718 in comm_plasma (sim=0x555555717560, global_exchange=1) at src/comm_plasma.c:1103
#13 0x0000555555565987 in particle_comm_initial (sim=0x555555717560) at src/particle.c:266
#14 0x000055555556c736 in sim_pre_step (sim=0x555555717560) at src/sim.c:196
#15 0x000055555556bd89 in sim_init (conf=0x7fffffffe1e0, quiet=0) at src/sim.c:283
#16 0x000055555555cc44 in main (argc=2, argv=0x7fffffffe338) at src/cpic.c:169

collect_pset: Assertion `plist_isempty(ex.q1)' failed

(gdb) bt
#0  0x00007f66a6977ce5 in raise () from /usr/lib/libc.so.6
#1  0x00007f66a6961857 in abort () from /usr/lib/libc.so.6
#2  0x00007f66a6961727 in __assert_fail_base.cold () from /usr/lib/libc.so.6
#3  0x00007f66a6970426 in __assert_fail () from /usr/lib/libc.so.6
#4  0x0000000000405738 in collect_pset (c=0x7f669c14d200, set=0x7f669c02ca50, dim=1) at src/comm_plasma.c:564
#5  0x00000000004054e7 in collect_pchunk (c=0x7f669c14d200, dim=1) at src/comm_plasma.c:590
#6  0x000000000040826c in nanos6_unpacked_task_region_collect_plasma_fast0 ()
#7  0x00000000004082bc in nanos6_ol_task_region_collect_plasma_fast0 ()
#8  0x00007f66a640b11e in ExecutionWorkflow::executeTask(Task*, ComputePlace*, MemoryPlace*) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#9  0x00007f66a63e84b8 in WorkerThread::handleTask(CPU*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#10 0x00007f66a6408fa8 in TaskBlocking::taskBlocks(WorkerThread*, Task*, ThreadManagerPolicy::thread_run_inline_policy_t) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#11 0x00007f66a6409a61 in nanos6_taskwait () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#12 0x00007f66a6b34641 in nanos6_taskwait (invocation_source=0x422198 "src/sim.c:536:10") at loader/indirect-symbols/taskwait.c:21
#13 0x000000000041af32 in sim_step (sim=0x7f669c147240) at src/sim.c:536
#14 0x000000000041b2f8 in sim_run (sim=0x7f669c147240) at src/sim.c:608
#15 0x0000000000408e14 in main (argc=2, argv=0x7ffe0229e6b8) at src/cpic.c:178
(gdb) f 4
#4  0x0000000000405738 in collect_pset (c=0x7f669c14d200, set=0x7f669c02ca50, dim=1) at src/comm_plasma.c:564
564             assert(plist_isempty(ex.q1));
(gdb) p ex.q1
$1 = (plist_t *) 0x7f669c02cc10
(gdb) p ex.q1->b
$2 = (struct pblock *) 0x7f668e200000
(gdb) p ex.q1->b[0]
$3 = {{{n = 1024, npacks = 256, nfpacks = 256, next = 0x7f6657600000, prev = 0x7f665a200000},
    _pad = "\000\004\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000`Wf\177\000\000\000\000 Zf\177", '\000' <repeats 89 times>}, p = 0x7f668e200080}
(gdb) p ex.q1->b[1]
$4 = {{{n = 1611526157, npacks = 1611526157, nfpacks = 1611526157, next = 0x600df00d, prev = 0x79e82},
    _pad = "\r\360\r`\000\000\000\000\r\360\r`\000\000\000\000\r\360\r`\000\000\000\000\r\360\r`\000\000\000\000\202\236\a\000\000\000\000\000\302\227\a\000\000\000\000\000\202\227\a\000\000\000\000\000\336\305\006\000\000\000\000\000\000\000\000^\213\261\030@\000\000\000 \032b\027@\000\000\000\374\266\331\026@\000\000\000\b\220\344\030@\000\000\300;\323!@@\000\000\000\375\212\330G@\000\000@\217\254\206O@\000\000\000\340\244gF@"}, p = 0x7f668e200100}
(gdb) p ex.q1->b->next
$5 = (struct pblock *) 0x7f6657600000
(gdb) p ex.q1->b->next[0]
$6 = {{{n = 1024, npacks = 256, nfpacks = 256, next = 0x7f6657200000, prev = 0x7f668e200000},
    _pad = "\000\004\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000 Wf\177\000\000\000\000 \216f\177", '\000' <repeats 89 times>}, p = 0x7f6657600080}
(gdb)

Cyclotron hard test fails

The electron is moving in a larger orbit, with the biggest error, with the 10m radius test, is 0.00625m in the +X direction. The angular velocity is therefore slightly lower, so the hard check has an linear accumulation error.

Particle out of chunk with 16 processes

src/interpolate.c:185: void interpolate_p2f(pchunk_t *, vi64 *, vf64 *, vf64 *, vf64 *, vf64 *, vf64, mat_t *): Assertion `x[Y][iv] >= c->x0[Y][iv]' failed

(gdb) p c->x0[Y]
$1 = {12, 12, 12, 12}
(gdb) p x[Y]
$2 = {0, 5.0058335984118427e-317, 5.0058335984118427e-317, 5.0093750609612327e-317}
(gdb) p c->x1[Y]
$3 = {16, 16, 16, 16}

The ppack is bad:

(gdb) p/x p->magic
$20 = {0x7fa8a70d5738, 0x92fc20, 0x1, 0x0}

Deadlock with TAMPI enabled

(gdb) thr 21
[Switching to thread 21 (Thread 0x7efc477fd740 (LWP 1835745))]
#0  0x00007efca2441cf5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0x00007efca2441cf5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007efca2113ea1 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /build/gcc/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007efca1d38efc in TaskBlocking::taskBlocks(WorkerThread*, Task*, ThreadManagerPolicy::thread_run_inline_policy_t) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#4  0x00007efca1d2e774 in nanos6_block_current_task () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#5  0x00007efca245eb89 in nanos6_block_current_task (blocking_context=0x7efc981a65d0) at loader/indirect-symbols/blocking.c:34
#6  0x00007efca2d314bb in ?? () from /usr/lib/libtampi-c.so.0
#7  0x00007efca2d36a53 in MPI_Recv () from /usr/lib/libtampi-c.so.0
#8  0x0000000000408dd5 in recv_plist_y (sim=0x7efc98147240, l=0x7efc9814f858, src=0, ic=13) at src/comm_plasma.c:944
#9  0x000000000040872e in recv_pchunk_y (sim=0x7efc98147240, c=0x7efc9814e580) at src/comm_plasma.c:1009
#10 0x0000000000409568 in nanos6_unpacked_task_region_exchange_plasma_y1 ()
#11 0x000000000040959c in nanos6_ol_task_region_exchange_plasma_y1 ()
#12 0x00007efca1d3b11e in ExecutionWorkflow::executeTask(Task*, ComputePlace*, MemoryPlace*) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#13 0x00007efca1d184b8 in WorkerThread::handleTask(CPU*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#14 0x00007efca1d18c1b in WorkerThread::body() () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#15 0x00007efca1d38cd1 in kernel_level_thread_body_wrapper(void*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#16 0x00007efca243b46f in start_thread () from /usr/lib/libpthread.so.0
#17 0x00007efca236b3d3 in clone () from /usr/lib/libc.so.6

interpolate_p2f: Assertion `x[Y][iv] >= c->x0[Y][iv]' failed

Likely, we have not wrapped some particle, as running with 1 MPI process is not going through comm_plasma_y.

(gdb) p x[1][iv]
$2 = -0.00022396883585063811

The boundary conditions are applied in comm_plasma_x, which is a bit weird, but allows us to only use 2 if's per particle to determine if they must move outside the chunk.

We could move the boundary condition outside the comm_plasma, but then we couldn't check again we have no lost particles left.

Specify output directory

In the configuration it should be possible to specify where the simulation data is stored, and the frequency of the sampling.

Harmonic motion test with 2 electron fails

The two electrons move in the opposite direction. By using -dtqm2 instead, the harmonic motion is observed. However, the frequency of oscillation is not close to the expected value.

Deadlock with 4 processes

Stuck in recv_plist_y

(gdb) ea
Thr  #   Function               Source
1    2   term_handler()         src/cpic.c:47
3    8   recv_plist_y()         src/comm_plasma.c:957
6    8   recv_plist_y()         src/comm_plasma.c:957
7    8   recv_plist_y()         src/comm_plasma.c:957
9    8   recv_plist_y()         src/comm_plasma.c:957
12   8   recv_plist_y()         src/comm_plasma.c:957
13   8   recv_plist_y()         src/comm_plasma.c:957
14   6   sim_pre_step()         src/sim.c:201
19   8   recv_plist_y()         src/comm_plasma.c:957
22   8   recv_plist_y()         src/comm_plasma.c:957
23   8   recv_plist_y()         src/comm_plasma.c:957
24   8   recv_plist_y()         src/comm_plasma.c:957
28   8   recv_plist_y()         src/comm_plasma.c:957
31   8   recv_plist_y()         src/comm_plasma.c:957
34   8   recv_plist_y()         src/comm_plasma.c:957
36   8   recv_plist_y()         src/comm_plasma.c:957
38   8   recv_plist_y()         src/comm_plasma.c:957
39   8   recv_plist_y()         src/comm_plasma.c:957

(gdb) thread 3
[Switching to thread 3 (Thread 0x7f9865ffa740 (LWP 1875018))]
#0  0x00007f9876144cf5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0

(gdb) bt
#0  0x00007f9876144cf5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007f9875e16ea1 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /build/gcc/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007f9875a3befc in TaskBlocking::taskBlocks(WorkerThread*, Task*, ThreadManagerPolicy::thread_run_inline_policy_t) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#4  0x00007f9875a31774 in nanos6_block_current_task () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#5  0x00007f9876161b89 in nanos6_block_current_task (blocking_context=0x7f97f4206790) at loader/indirect-symbols/blocking.c:34
#6  0x00007f9876a344bb in ?? () from /usr/lib/libtampi-c.so.0
#7  0x00007f9876a39a53 in MPI_Recv () from /usr/lib/libtampi-c.so.0
#8  0x0000000000407ff2 in recv_plist_y (sim=0x7f97f416a8c0, l=0x7f97f4172dd0, src=3, ic=11) at src/comm_plasma.c:957
#9  0x000000000040792e in recv_pchunk_y (sim=0x7f97f416a8c0, c=0x7f97f416e080) at src/comm_plasma.c:1021
#10 0x0000000000408768 in nanos6_unpacked_task_region_exchange_plasma_y1 ()
#11 0x000000000040879c in nanos6_ol_task_region_exchange_plasma_y1 ()
#12 0x00007f9875a3e11e in ExecutionWorkflow::executeTask(Task*, ComputePlace*, MemoryPlace*) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#13 0x00007f9875a1b4b8 in WorkerThread::handleTask(CPU*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#14 0x00007f9875a1bc1b in WorkerThread::body() () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#15 0x00007f9875a11c91 in kernel_level_thread_body_wrapper(void*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#16 0x00007f987613e46f in start_thread () from /usr/lib/libpthread.so.0
#17 0x00007f987606e3d3 in clone () from /usr/lib/libc.so.6
(gdb)

Wrong harmonic frequency

The simple electron pair bounce at a different frequency than expected. The theoretical period is of 14.2 seconds, but the experimental is 10.4 seconds.

iter=   1 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000000e+00 1.600000e+01)  r1=(2.800000e+01 1.600000e+01)
iter= 207 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000363e+00 1.600000e+01)  r1=(2.799964e+01 1.600000e+01)
iter= 208 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000656e+00 1.600000e+01)  r1=(2.799934e+01 1.600000e+01)
iter= 414 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000043e+00 1.600000e+01)  r1=(2.799996e+01 1.600000e+01)
iter= 620 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000156e+00 1.600000e+01)  r1=(2.799984e+01 1.600000e+01)
iter= 826 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000978e+00 1.600000e+01)  r1=(2.799902e+01 1.600000e+01)
iter= 827 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000180e+00 1.600000e+01)  r1=(2.799982e+01 1.600000e+01)
iter=1033 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000033e+00 1.600000e+01)  r1=(2.799997e+01 1.600000e+01)
iter=1239 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000611e+00 1.600000e+01)  r1=(2.799939e+01 1.600000e+01)
iter=1240 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000399e+00 1.600000e+01)  r1=(2.799960e+01 1.600000e+01)
iter=1446 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000001e+00 1.600000e+01)  r1=(2.800000e+01 1.600000e+01)
iter=1652 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000328e+00 1.600000e+01)  r1=(2.799967e+01 1.600000e+01)
iter=1653 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000702e+00 1.600000e+01)  r1=(2.799930e+01 1.600000e+01)
iter=1859 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000057e+00 1.600000e+01)  r1=(2.799994e+01 1.600000e+01)
iter=2065 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000134e+00 1.600000e+01)  r1=(2.799987e+01 1.600000e+01)
iter=2271 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000922e+00 1.600000e+01)  r1=(2.799908e+01 1.600000e+01)
iter=2272 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000204e+00 1.600000e+01)  r1=(2.799980e+01 1.600000e+01)
iter=2478 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000023e+00 1.600000e+01)  r1=(2.799998e+01 1.600000e+01)
iter=2684 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000568e+00 1.599996e+01)  r1=(2.799943e+01 1.600004e+01)
iter=2685 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(4.000436e+00 1.599996e+01)  r1=(2.799956e+01 1.600004e+01)
iter=2891 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.999934e+00 1.599419e+01)  r1=(2.800007e+01 1.600581e+01)
iter=3095 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.988828e+00 1.512723e+01)  r1=(2.801117e+01 1.687277e+01)
iter=3096 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.978425e+00 1.510344e+01)  r1=(2.802157e+01 1.689656e+01)
iter=3097 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.971761e+00 1.507877e+01)  r1=(2.802824e+01 1.692123e+01)
iter=3098 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.968834e+00 1.505317e+01)  r1=(2.803117e+01 1.694683e+01)
iter=3099 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.969634e+00 1.502664e+01)  r1=(2.803037e+01 1.697336e+01)
iter=3100 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.974149e+00 1.499913e+01)  r1=(2.802585e+01 1.700087e+01)
iter=3101 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.982358e+00 1.497064e+01)  r1=(2.801764e+01 1.702936e+01)
iter=3102 wp=4.419417e-01 T=1.421723e+01 ni=284.3  r0=(3.994234e+00 1.494114e+01)  r1=(2.800577e+01 1.705886e+01)

TAMPI is intercepting FFTW MPI calls

(gdb) bt
#0  0x00007f92ea4accf5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007f92ea17eea1 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /build/gcc/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007f92e9da3efc in TaskBlocking::taskBlocks(WorkerThread*, Task*, ThreadManagerPolicy::thread_run_inline_policy_t) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#4  0x00007f92e9d99774 in nanos6_block_current_task () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#5  0x00007f92ea4c9b89 in nanos6_block_current_task (blocking_context=0x7f92601b24d0) at loader/indirect-symbols/blocking.c:34
#6  0x00007f92eada3f29 in MPI_Sendrecv () from /usr/lib/libtampi-c.so.0
#7  0x00007f92eaaf9d5d in ?? () from /usr/lib/libfftw3_mpi.so.3
#8  0x00007f92eaaf9ddf in ?? () from /usr/lib/libfftw3_mpi.so.3
#9  0x00007f92eaafd6e3 in ?? () from /usr/lib/libfftw3_mpi.so.3
#10 0x00007f92eaaffe76 in ?? () from /usr/lib/libfftw3_mpi.so.3
#11 0x000000000041c4b6 in MFT_solve (sim=0x7f92601526d0, s=0x7f9260153e40, x=0x7f9260154030, b=0x7f926010c340) at src/solver.c:468
#12 0x000000000041c0ed in solve_xy (sim=0x7f92601526d0, s=0x7f9260153e40, phi=0x7f9260154030, rho=0x7f926010c340) at src/solver.c:592
#13 0x000000000040adda in field_phi_solve (sim=0x7f92601526d0) at src/field.c:439
#14 0x000000000040b876 in nanos6_unpacked_task_region_stage_field_E1 ()
#15 0x000000000040b8ac in nanos6_ol_task_region_stage_field_E1 ()
#16 0x00007f92e9da611e in ExecutionWorkflow::executeTask(Task*, ComputePlace*, MemoryPlace*) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#17 0x00007f92e9d834b8 in WorkerThread::handleTask(CPU*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#18 0x00007f92e9d83c1b in WorkerThread::body() () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#19 0x00007f92e9d79c91 in kernel_level_thread_body_wrapper(void*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#20 0x00007f92ea4a646f in start_thread () from /usr/lib/libpthread.so.0
#21 0x00007f92ea3d63d3 in clone () from /usr/lib/libc.so.6

Inconsistent scaling of FFTW

When testing the FFTW in a dummy experiment, the scaling is not very bad with 4096x4096 points:

np=1 n=4096 mean=0.148971 std=0.00015471 sem=6.91884e-05
np=2 n=4096 mean=0.205583 std=0.000514282 sem=0.000229994
np=4 n=4096 mean=0.104661 std=0.00162727 sem=0.000727735
np=8 n=4096 mean=0.064036 std=0.00103701 sem=0.000463764
np=16 n=4096 mean=0.0428545 std=0.000838381 sem=0.000374935

But in the simulation, as the number of processes increases, the FFT execution doesn't decrease proportionally.

All time in seconds, only the first iteration. fftf=fft forward, fftr=fft reverse
np=1 Solver fftf/fftr/comp/total: 1.490792e-01 / 1.433561e-01 / 7.316658e-01 / 1.303466e+01
np=2 Solver fftf/fftr/comp/total: 2.164974e-01 / 2.000124e-01 / 6.416426e-01 / 7.061025e+00
np=4 Solver fftf/fftr/comp/total: 1.755040e-01 / 1.683762e-01 / 5.648114e-01 / 9.163356e+00
np=8 Solver fftf/fftr/comp/total: 1.804677e-01 / 1.837146e-01 / 5.759790e-01 / 1.688759e+01
np=16 Solver fftf/fftr/comp/total: 2.637597e-01 / 2.404443e-01 / 6.811108e-01 / 3.733283e+01

MFT solver takes most of the time with more than 1 process

It seems the communications in the FFT may cause this issue. Can we investigate the create plan overhead compared to the FFT computation?

$ mpirun -n 2 ./cpic conf/simd.conf
...
stats iter=47 last=4.125259e-01 mean=4.302998e-01 std=1.540215e-02 sem=2.246635e-03 rsem=5.221092e-03 mem=46628 solver=3.916919e-01
stats iter=48 last=3.771588e-01 mean=4.299295e-01 std=1.545187e-02 sem=2.230286e-03 rsem=5.187562e-03 mem=46628 solver=3.912867e-01
stats iter=49 last=3.663834e-01 mean=4.288526e-01 std=1.704752e-02 sem=2.435359e-03 rsem=5.678780e-03 mem=46628 solver=3.902378e-01
Total time: 2.138171e+01 s
1.963117e+01 91.8% field_E
1.704385e-01  0.8% particle_x
1.004637e+00  4.7% particle_wrap
3.134956e-01  1.5% field_rho
2.668671e-01  1.2% particle_E
0.000000e+00  0.0% output_particles
0.000000e+00  0.0% output_fields
Simulation ends
Total time: 2.138590e+01 s
1.946058e+01 91.0% field_E
1.763841e-01  0.8% particle_x
1.109353e+00  5.2% particle_wrap
2.112792e-01  1.0% field_rho
2.796531e-01  1.3% particle_E
0.000000e+00  0.0% output_particles
0.000000e+00  0.0% output_fields
Simulation ends

$ mpirun -n 1 ./cpic conf/simd.conf
stats iter=47 last=7.367235e-02 mean=7.470654e-02 std=4.401337e-03 sem=6.420010e-04 rsem=8.593639e-03 mem=50224 solver=1.516419e-03
stats iter=48 last=7.382230e-02 mean=7.468499e-02 std=4.356821e-03 sem=6.288529e-04 rsem=8.420071e-03 mem=50224 solver=1.513608e-03
stats iter=49 last=7.412300e-02 mean=7.466739e-02 std=4.312960e-03 sem=6.161371e-04 rsem=8.251756e-03 mem=50224 solver=1.510907e-03
Total time: 3.733411e+00 s
1.557126e-01  4.2% field_E
2.507990e-01  6.7% particle_x
2.550165e+00 68.3% particle_wrap
3.151033e-01  8.4% field_rho
4.681872e-01 12.5% particle_E
0.000000e+00  0.0% output_particles
0.000000e+00  0.0% output_fields
Simulation ends

ppack is empty after pwin_step in MODIFY mode

cpic: src/plist.c:1425: int pwin_step_modify(pwin_t *): Assertion `!ppack_isempty(&w->b->p[w->ip])' failed.

Thread 1 "cpic" received signal SIGABRT, Aborted.
0x00007ffff755cce5 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff755cce5 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7546857 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7546727 in __assert_fail_base.cold () from /usr/lib/libc.so.6
#3  0x00007ffff7555426 in __assert_fail () from /usr/lib/libc.so.6
#4  0x000055555556885a in pwin_step_modify (w=0x7fffffffdea0) at src/plist.c:1425
#5  0x00005555555683b4 in pwin_step (w=0x7fffffffdea0) at src/plist.c:1444
#6  0x000055555555b375 in periodic_boundary_plist (sim=0x555555717460, l=0x555555632e98, d=0) at src/comm_plasma.c:786
#7  0x000055555555b279 in periodic_boundary_pchunk (sim=0x555555717460, c=0x555555719a40, d=0) at src/comm_plasma.c:802
#8  0x00005555555590c2 in periodic_boundary (sim=0x555555717460, d=0) at src/comm_plasma.c:837
#9  0x00005555555589df in comm_plasma_x (sim=0x555555717460, global_exchange=0) at src/comm_plasma.c:896
#10 0x0000555555558718 in comm_plasma (sim=0x555555717460, global_exchange=0) at src/comm_plasma.c:1124
#11 0x0000555555562c4a in stage_plasma_r (sim=0x555555717460) at src/mover.c:316
#12 0x000055555556c82d in sim_step (sim=0x555555717460) at src/sim.c:502
#13 0x000055555556cbf8 in sim_run (sim=0x555555717460) at src/sim.c:608
#14 0x000055555555ccc4 in main (argc=2, argv=0x7fffffffe338) at src/cpic.c:178

Garbage particles fail to pass velocity check

The garbage particles are being compared in check_velocity, but at some point they have a velocity grater than the maximum, so the simulation aborts.

We can either stop updating those particles, or fix their velocity before they go into the plist_update_r stage.

Segfault with 2 processes

$ mpirun -n 2 ./cpic conf/simd.conf
...
No output path specified, output will not be saved
begin sim_pre_step
begin sim_pre_step
[xeon07:1832490] *** Process received signal ***
[xeon07:1832490] Signal: Segmentation fault (11)
[xeon07:1832490] Signal code: Address not mapped (1)
[xeon07:1832490] Failing at address: 0x7f5a62a00000
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node xeon07 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Now the journey begins on how to attach a GDB before OpenMPI kills my process.

cyclotron.test: src/plist.c:329: void plist_sanity_check(plist_t *): Assertion `l->b' failed

(gdb) bt
#0  0x00007f1bc8f25ce5 in raise () from /usr/lib/libc.so.6
#1  0x00007f1bc8f0f857 in abort () from /usr/lib/libc.so.6
#2  0x00007f1bc8f0f727 in __assert_fail_base.cold () from /usr/lib/libc.so.6
#3  0x00007f1bc8f1e426 in __assert_fail () from /usr/lib/libc.so.6
#4  0x0000000000414aac in plist_sanity_check (l=0x7f1ac8040728) at src/plist.c:329
#5  0x000000000041b6aa in collect_pset (c=0x7f1ac818d4c0, set=0x7f1ac8040720, dim=0) at src/comm_plasma.c:561
#6  0x000000000041b4e7 in collect_pchunk (c=0x7f1ac818d4c0, dim=0) at src/comm_plasma.c:593
#7  0x000000000041e156 in nanos6_unpacked_task_region_collect_plasma0 ()
#8  0x000000000041e1c3 in nanos6_ol_task_region_collect_plasma0 ()
#9  0x00007f1bc8bda65e in ExecutionWorkflow::executeTask(Task*, ComputePlace*, MemoryPlace*) ()
   from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#10 0x00007f1bc8bb6dd8 in WorkerThread::handleTask(CPU*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#11 0x00007f1bc8bb753b in WorkerThread::body() () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#12 0x00007f1bc8bada21 in kernel_level_thread_body_wrapper(void*) () from /usr/lib/libnanos6-optimized-linear-regions-fragmented.so
#13 0x00007f1bc90b946f in start_thread () from /usr/lib/libpthread.so.0
#14 0x00007f1bc8fe93d3 in clone () from /usr/lib/libc.so.6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.