GithubHelp home page GithubHelp logo

Comments (9)

lindsayad avatar lindsayad commented on August 25, 2024

@roystgnr, probably you or I should profile this at some point

from moose.

roystgnr avatar roystgnr commented on August 25, 2024

With the file as downloaded I get 23.3s runtime on one core, 18.2 on 2, 14.7 on 4, 13.4 on 8, 14.3 on 16. Obviously not optimal scaling. With a distributed mesh I get 28.6s on one core, 17.5 on 2, 10.0 on 4, 6.7 on 8, 5.2 on 16; not great, albeit much better.

And ... qualitatively, none of that really surprises me? The key word in ReplicatedMesh is Replicated; there's a whole separate copy on every rank. On a distributed mesh, if you use twice as many ranks, each has a little over half as many elements (or a lot over; on 16 processors we're at ~6500 each, and I'll bet there's a non-negligible ratio ghosted at that point), so modifying the mesh means every processor is doing a little over half as much work, so in the best case scenario that happens twice as fast, though Amdahl's Law is real and eventually bites you somewhere.

But on a non-distributed (ReplicatedMesh in this case, but serialized DistributedMesh has the same problem), every processor still has to work on all 100000 elements, and in the best case scenario using twice as many processors means that happens ... exactly as fast. No speedup at all. Even if some part of the calculation is O(N_elem/N_procs), anything that has to iterate over the whole mesh will be O(N_elem) and that's what dominates.

We changed the name SerialMesh to ReplicatedMesh at one point to avoid misleading users into thinking that you couldn't still do parallel computations on one, but perhaps that name now overstates the amount of parallelism possible, especially when users want to change the mesh frequently? I originally read this issue title as worrying about scalability with a distributed mesh, because that's what makes sense to worry about; scalable modification of a replicated data structure is mathematically impossible.

While I was worrying, I noticed there's a ton we might still do on DistributedMesh performance, though:

We should probably benchmark the libMesh --with-mapvector-chunk-size=64 (or 40, 16; we'd try a few) option with MOOSE. I get a few percent speedup on libMesh DistributedMesh benchmarks with it, and if memory benchmarking and larger scale benchmarking agrees with my timing benchmarks then it'd make a better default than =1.

We really need (this is on my TODO list but won't be done before fall) a better way to say "this mesh has changed, but only in these ways". We're spending about 1/3 of our distributed-mesh runtime on repartitioning, not because we're doing that at every time step, but because our mesh generators can't currently seem to so much as set a subdomain without calling prepare_the_kitchen_sink_for_use() afterwards! (this will help ReplicatedMesh performance noticeably too)

libMesh build_cube() should be done fully-distributed on a distributed mesh. There's a distributed Cartesian mesh builder in MOOSE but we don't need its features for most users, we just need drop-in-replacement simplicity with scalability.

I've never tried to optimize DistributedMesh::renumber_nodes_and_elements(), but maybe that was a mistake? I've never seen it show up on perf logging before but here it's like 20% in the 16-rank case.

from moose.

hugary1995 avatar hugary1995 commented on August 25, 2024

Thanks @roystgnr , you response is always very thorough!

I think I follow most of what you said. In our use case, we are able to use a userobject to figure out, on each rank, which local elements should change subdomain ID. In the ideal world, it would be the most optimal to be able to tell the replicated mesh by saying "only these local elements have been changed" in a meshChanged() call. But I gather that's not possible as of now. Even if that's possible, a parallel mesh sync will probably dominate the overhead.

So I guess the conclusion for us is that we really should use distributed mesh...

from moose.

hugary1995 avatar hugary1995 commented on August 25, 2024

If you have profiled a replicated mesh case, do you mind sharing with us the profiling result like top 20?

from moose.

roystgnr avatar roystgnr commented on August 25, 2024

But I gather that's not possible as of now.

It's totally possible! The trouble is that, either you sync the changes to ghosted elements (i.e. all elements, on a ReplicatedMesh) on every other rank, or what you have afterward is neither Replicated nor Distributed, it's just Broken.

So I guess the conclusion for us is that we really should use distributed mesh...

Yup. I get that that's a scary conclusion, though. Distributed mesh code for mesh modification is usually harder to write, and if you want the full scalability you need your runs to only (or at least only at full scale) be using distributed-aware code, whereas a lot of code is still either incompatible with a distributed mesh or is only "compatible" by temporarily serializing and thereby breaking your scalability (except where it's done on a small scale, e.g. on a 2D mesh that'll later be extruded). In my runs I think the only non-distributed code I saw was build_cube(), and on just 16 cores that ended up taking up like 30% of CPU time!

from moose.

roystgnr avatar roystgnr commented on August 25, 2024

I haven't properly profiled, just looked at libMesh PerfLog and glanced at MOOSE output, attached here:
replicated16.txt

from moose.

hugary1995 avatar hugary1995 commented on August 25, 2024

Thanks! I hope 10 years from now MOOSE will be fully, fully compatible with distributed mesh.

I am okay with closing this issue. Or if you want to keep it open, feel free to change the label to mark it as a feature request.

from moose.

lindsayad avatar lindsayad commented on August 25, 2024

I would say it's pretty compatible. But the core MOOSE team is unlikely to be able to add distributed mesh support to all the features provided by the community. It would be great if distributed mesh support for CoupledVarThresholdElementSubdomainModifier came from the community

from moose.

hugary1995 avatar hugary1995 commented on August 25, 2024

It would be great if distributed mesh support for CoupledVarThresholdElementSubdomainModifier came from the community

Most test cases for CoupledVarThresholdElementSubdomainModifier run with distributed mesh, except those with AMR. See https://github.com/idaholab/moose/blob/next/test/tests/userobjects/element_subdomain_modifier/tests

We did look at the scaling of individual functions in CoupledVarThresholdElementSubdomainModifier: All timed sections look okay, with a replicated mesh, except the meshChanged() call -- hence this discussion.

from moose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.