GithubHelp home page GithubHelp logo

Comments (14)

jeffhammond avatar jeffhammond commented on September 4, 2024

Reductions return the result at all PEs, so every PE must wait to return until the result is generated. Thus, it is impossible for the shmem_int_put to occur before dest is written with the result of the collective.

from specification.

krzikalla avatar krzikalla commented on September 4, 2024

Thus there is an implicit barrier at the end of every reduction? Shouldn't this be stated somewhere in the spec to reduce the potential implementation space accordingly?

from specification.

jdinan avatar jdinan commented on September 4, 2024

from specification.

naveen-rn avatar naveen-rn commented on September 4, 2024

I'm not sure, whether the 1.4 spec change resolves this issue. The new change seems to clarify about using the buffer by local PE.

Thus, it is impossible for the shmem_int_put to occur before dest is written with the result of the collective.

In his 2-PE example, this is correct. Let's say we have 4 PEs participating in the reduction and if PE-1 and PE-3 returns from the reduction, while PE-0 and PE-2 still computes the reduction. Then, either PE-1/PE-3 can still modify the source/dest buffer on PE-0/PE-2. FWIU there is no implicit barrier after all-reduce. It is users responsibility to add an active-set based barrier to achieve his usage.

from specification.

jdinan avatar jdinan commented on September 4, 2024

Ahh, I didn't read the example closely enough. Yes, according to the specification that is a race. Completion of the reduction at PE 1 does not guarantee completion at any other PE. This is a race even for two PEs, since OpenSHMEM does not define an ordering between operations performed by PE 1 within the reduction (e.g., which could be bounce buffered and converted to non-blocking) and the subsequent put.

from specification.

naveen-rn avatar naveen-rn commented on September 4, 2024

This is a race even for two PEs

Yes, you are correct - even for 2 PEs this is a race.

from specification.

jeffhammond avatar jeffhammond commented on September 4, 2024

from specification.

krzikalla avatar krzikalla commented on September 4, 2024

@jeffhammond: If the implementation of a reduction calculates it at a particular place, then it's actually an all-to-one followed by an one-to-all data dependency, isn't it? IMHO even then one PE could receive the result and proceed before another PE.

@jdinan: This clarification is a good start. I think, there is still a statement about remote memory accesses needed. Something like this:

"Accessing memory involved in a collective routine while the PE is processing that collective, results in undefined behavior. Since PEs can enter and exit collectives at different times, accessing such memory remotely requires some additional synchronization with the corresponding remote PE."

I guess, someone can rephrase it better.

from specification.

jdinan avatar jdinan commented on September 4, 2024

@jeffhammond Exit from all-reduce implies that all processes have reached the call to all-reduce, but it doesn't carry the barrier guarantee of ordering/completion of pending RMA operations.

@krzikalla This would be a good change. We likely need similar verbiage for all of the collectives.

from specification.

jeffhammond avatar jeffhammond commented on September 4, 2024

@jdinan I meant execution barrier in the abstract sense of synchronizing all processes, not shmem_barrier. My previous post has been edited for clarity.

Do you really think we need to explicitly tell users that reductions do not synchronize RMA? If we are going to clarify anything, we should list all of the (few) operations that actually synchronize RMA, rather than note ones that do not.

AFAIK, the list of operations that remotely synchronize RMA in some way are shmem_fence, shmem_quiet, and shmem_barrier(_all).

from specification.

jdinan avatar jdinan commented on September 4, 2024

I would hope that's not necessary. I think the change that @krzikalla suggested, which clarifies that no completion guarantees are made with respect to remote buffers, should cover it.

from specification.

jdinan avatar jdinan commented on September 4, 2024

Collectives section committee, please review and determine if any clarifications should be added.

from specification.

davidozog avatar davidozog commented on September 4, 2024

I think @krzikalla's clarifying statement is a very good one. I'd like to propose a few minor edits if that's ok:

Accessing symmetric memory involved in a collective routine while the PE
is processing that collective results in undefined behavior. Since PEs can
enter and exit collectives at different times, accessing such memory remotely
may requires some additional synchronization with the
corresponding remote PE between communicating PEs.

"symmetric" memory because that's the problem this statement is tackling (right...?).
"may" because some implementation-specific collective algorithms are indeed synchronizing.
The last sentence was a little hard for me to parse as originally written, but I'm not quite sure the above is much of an improvement... 🤷‍♂

from specification.

nspark avatar nspark commented on September 4, 2024

Closed by 1.5rc1

from specification.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.