Hi all, consider the following (pseudo) code running on two PEs:

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Are reductions as safe as intended? about specification HOT 14 CLOSED

openshmem-org commented on September 4, 2024

Are reductions as safe as intended?

from specification.

Comments (14)

jeffhammond commented on September 4, 2024

Reductions return the result at all PEs, so every PE must wait to return until the result is generated. Thus, it is impossible for the shmem_int_put to occur before dest is written with the result of the collective.

from specification.

krzikalla commented on September 4, 2024

Thus there is an implicit barrier at the end of every reduction? Shouldn't this be stated somewhere in the spec to reduce the potential implementation space accordingly?

from specification.

jdinan commented on September 4, 2024

Hi Olaf, This is indeed an ambiguity in OpenSHMEM 1.3. We have ratified a proposal that will resolve the ambiguity in the OpenSHMEM 1.4 specification. The following new text will be added to the reductions section: "Upon return from a reduction routine, the following are true for the local PE: The dest array is updated and the source array may be safely reused." ~Jim.

…

On Tue, Jun 6, 2017 at 9:33 AM, krzikalla ***@***.***> wrote: Thus there is an implicit barrier at the end of every reduction? Shouldn't this be stated somewhere in the spec to reduce the potential implementation space accordingly? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADPX8ukWlPk9qK-AkGtvIKefY_SVGn0rks5sBXFNgaJpZM4NxdGd> .

from specification.

naveen-rn commented on September 4, 2024

I'm not sure, whether the 1.4 spec change resolves this issue. The new change seems to clarify about using the buffer by local PE.

Thus, it is impossible for the shmem_int_put to occur before dest is written with the result of the collective.

In his 2-PE example, this is correct. Let's say we have 4 PEs participating in the reduction and if PE-1 and PE-3 returns from the reduction, while PE-0 and PE-2 still computes the reduction. Then, either PE-1/PE-3 can still modify the source/dest buffer on PE-0/PE-2. FWIU there is no implicit barrier after all-reduce. It is users responsibility to add an active-set based barrier to achieve his usage.

from specification.

jdinan commented on September 4, 2024

Ahh, I didn't read the example closely enough. Yes, according to the specification that is a race. Completion of the reduction at PE 1 does not guarantee completion at any other PE. This is a race even for two PEs, since OpenSHMEM does not define an ordering between operations performed by PE 1 within the reduction (e.g., which could be bounce buffered and converted to non-blocking) and the subsequent put.

from specification.

naveen-rn commented on September 4, 2024

This is a race even for two PEs

Yes, you are correct - even for 2 PEs this is a race.

from specification.

jeffhammond commented on September 4, 2024

If reductions deliver result to all PEs, there's an all-to-all data dependency, which is equivalent to an execution barrier, no? Only way this isn't true is if logical collectives cheat and do early exit after first zero is observed for AND and first non-zero is observed for OR.

from specification.

krzikalla commented on September 4, 2024

@jeffhammond: If the implementation of a reduction calculates it at a particular place, then it's actually an all-to-one followed by an one-to-all data dependency, isn't it? IMHO even then one PE could receive the result and proceed before another PE.

@jdinan: This clarification is a good start. I think, there is still a statement about remote memory accesses needed. Something like this:

"Accessing memory involved in a collective routine while the PE is processing that collective, results in undefined behavior. Since PEs can enter and exit collectives at different times, accessing such memory remotely requires some additional synchronization with the corresponding remote PE."

I guess, someone can rephrase it better.

from specification.

jdinan commented on September 4, 2024

@jeffhammond Exit from all-reduce implies that all processes have reached the call to all-reduce, but it doesn't carry the barrier guarantee of ordering/completion of pending RMA operations.

@krzikalla This would be a good change. We likely need similar verbiage for all of the collectives.

from specification.

jeffhammond commented on September 4, 2024

@jdinan I meant execution barrier in the abstract sense of synchronizing all processes, not shmem_barrier. My previous post has been edited for clarity.

Do you really think we need to explicitly tell users that reductions do not synchronize RMA? If we are going to clarify anything, we should list all of the (few) operations that actually synchronize RMA, rather than note ones that do not.

AFAIK, the list of operations that remotely synchronize RMA in some way are shmem_fence, shmem_quiet, and shmem_barrier(_all).

from specification.

jdinan commented on September 4, 2024

I would hope that's not necessary. I think the change that @krzikalla suggested, which clarifies that no completion guarantees are made with respect to remote buffers, should cover it.

from specification.

jdinan commented on September 4, 2024

Collectives section committee, please review and determine if any clarifications should be added.

from specification.

davidozog commented on September 4, 2024

I think @krzikalla's clarifying statement is a very good one. I'd like to propose a few minor edits if that's ok:

Accessing symmetric memory involved in a collective routine while the PE
is processing that collective results in undefined behavior. Since PEs can
enter and exit collectives at different times, accessing such memory remotely
may requires some additional synchronization with the
corresponding remote PE between communicating PEs.

"symmetric" memory because that's the problem this statement is tackling (right...?).
"may" because some implementation-specific collective algorithms are indeed synchronizing.
The last sentence was a little hard for me to parse as originally written, but I'm not quite sure the above is much of an improvement... 🤷‍♂

from specification.

nspark commented on September 4, 2024

Closed by 1.5rc1

from specification.

Are reductions as safe as intended? about specification HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs