inducer / meshmode Goto Github PK

High-order unstructured mesh representation and discrete function spaces

Home Page: https://documen.tician.de/meshmode/

Python 99.73% Shell 0.27%

python finite-element-methods finite-elements discretization mesh meshes discontinuous-galerkin opencl scientific-computing

meshmode's Introduction

meshmode: High-Order Meshes and Discontinuous Function Spaces

Meshmode provides the "boring bits" of high-order unstructured discretization, for simplices (triangles, tetrahedra) and tensor products (quads, hexahedra). Features:

1/2/3D, line/surface/volume discretizations in each, curvilinear supported.
"Everything is a (separate) discretization." (mesh boundaries are, element surfaces are, refined versions of the same mesh are) "Connections" transfer information between discretizations.
Periodic connectivity.
Mesh partitioning (not just) for distributed execution (e.g. via MPI).
Interpolatory, quadrature (overintegration), and modal element-local discretizations.
Independent of execution environment (GPU/CPU, numpy, ...) via array contexts.
Simple mesh refinement (via bisection). Adjacency currently only maintained if uniform.
Input from Gmsh, Visualization to Vtk (both high-order curvilinear).
Easy data exchange with Firedrake.

Meshmode emerged as the shared discretization layer for pytential (layer potentials) and grudge (discontinous Galerkin).

Places on the web related to meshmode:

meshmode's People

Contributors

Stargazers

Watchers

meshmode's Issues

Fails with current loopy

cc @nchristensen

https://github.com/inducer/meshmode/pull/76/checks?check_run_id=1408151437

Check consistency during FacialAdjacencyGroup construction

boundary groups may only contain boundary tags (all negative neighbors)
non-boundary groups may not contain boundary tags (all positive neighbors)

Better support moving geometry

Some problems that currently crop up:

Recreating meshes is awkward. @alexfikl proposes Discretization.copy(mesh=...) (or .replace(mesh=...)). (which?) That seems easy and useful, but it requires remembering the group factory (probably OK).
Meshes can't be created efficiently. Their data structure involves numpy at the moment. Maybe meshes should start using array contexts, too?
Group factories need to recompute discretization node arrays. If the mesh already uses the right unit nodes, this can probably be avoided. (#135 )

cc @alexfikl

Teach PyOpenCLArrayContext how to deal with loopy programs without loops

The equivalent to inducer/arraycontext#105 also needs to happen in meshmode's transforming pyopencl actx.

cc @majosm

Test MPI-dependent functionality in Github CI

Pointed out by @nchristensen in #76

ValueError: could not broadcast input array from shape (3,3) into shape (3,4) in test_quad_mesh_2d[3-ball-radius-1.step]

https://gitlab.tiker.net/inducer/meshmode/-/jobs/178270

This seems related to #56. @alexfikl, could you take a look?

MPI setup currently broken

As described by @majosm here: #148 (comment)

ArrayContext.np.sqrt fails for device scalars

Doing a norm-like operation

actx.np.sqrt(actx.np.sum(x**2))

doesn't work after the transformations from #236. It seems like actx.np.sqrt doesn't like getting a scalar from sum and fails in this check

meshmode/meshmode/array_context.py

Lines 93 to 96 in d95cb2a

 if not isinstance(assignee, Subscript): 

 raise ValueError("assignees in " 

 "ElementwiseMapKernelTag-tagged kernels must be " 

 "subscripts")

The same probably holds for any other elementwise operation in actx.np.

Move away from `pytools.Record` types

Towards dataclasses, maybe.

Further discussion: #252 (comment)

cc @thomasgibson @majosm

Accelerate resample_by_picking through better ordering

resample_by_picking routinely shows up at the top of our GPU profiles. Here's an example from a run (mirgecom wave-eager, nel_1d = 24, 3D, order 3):

 GPU activities:   15.95%  1.72898s     28160  61.398us  3.4240us  133.22us  resample_by_picking
                   14.57%  1.57934s     26499  59.599us  4.4480us  117.47us  multiply
                   14.41%  1.56223s      9698  161.09us  16.000us  543.23us  diff
                   12.79%  1.38640s     21601  64.182us  4.4150us  118.56us  axpbyz
                   11.66%  1.26342s      5283  239.15us  120.54us  529.85us  grudge_assign_0
                    8.81%  954.48ms     23428  40.741us  1.6640us  81.888us  axpb
                    7.93%  859.20ms     10560  81.363us  60.831us  135.04us  resample_by_mat
                    7.84%  849.96ms      1760  482.93us  481.79us  541.57us  face_mass
                    2.16%  233.96ms        62  3.7735ms  1.3440us  11.178ms  [CUDA memcpy DtoH]
                    1.58%  171.51ms      2235  76.738us  1.6640us  87.391us  [CUDA memcpy DtoD]
                    1.19%  128.99ms      3523  36.612us  19.328us  37.952us  [CUDA memset]
                    0.49%  53.375ms       440  121.31us  120.67us  136.42us  grudge_assign_1
                    0.49%  53.364ms       440  121.28us  120.67us  136.19us  grudge_assign_2
                    0.07%  7.2597ms       127  57.162us  1.1520us  847.90us  [CUDA memcpy HtoD]
                    0.03%  3.5939ms        12  299.49us  13.696us  529.66us  nodes_0
                    0.01%  1.4677ms         6  244.62us  14.112us  528.73us  actx_special_sqrt
                    0.00%  359.81us         6  59.967us  5.0880us  115.14us  divide
                    0.00%  136.06us         1  136.06us  136.06us  136.06us  actx_special_exp

It's especially striking that it's at the top of the list because it touches lower-dimensional data. (surface vs volume) multiply has a similar number of calls, but it touches volume data, and it completes more quickly.

I think there are two opportunities here that we could try:

Currently, the kernel has an indirection on the read and the write end (see the (very simple) source). For surjective/onto connections, we can do away with the indirection on write by appropriately sorting the source indices.
Even for non-surjective connections, it's likely that we would benefit by sorting by the write index, to try to keep the writes as coalesced as possible.

IMO it's likely that this will have a benefit (but I obviously can't guarantee it). I think it's worth trying.

cc @lukeolson

Updating how we flatten DOF arrays in anticipation for lazy evaluation

This issue is specifically addressing how to update the flatten routine for DOFArrays, found here:

meshmode/meshmode/dof_array.py

Lines 381 to 412 in bbec89a

 def _flatten_dof_array(ary: Any, strict: bool = True): 

 if not isinstance(ary, DOFArray): 

 if strict: 

 raise TypeError(f"non-DOFArray type '{type(ary).__name__}' cannot " 

 "be flattened; use 'strict=False' to allow other types") 

 else: 

 return ary 

 actx = ary.array_context 

 if actx is None: 

 raise ValueError("cannot flatten frozen DOFArrays") 

 @memoize_in(actx, (_flatten_dof_array, "flatten_prg")) 

 def prg(): 

 return make_loopy_program( 

 "{[iel,idof]: 0<=iel<nelements and 0<=idof<ndofs_per_element}", 

 """result[grp_start + iel*ndofs_per_element + idof] \ 

  = grp_ary[iel, idof]""", 

 name="flatten") 

 group_sizes = [grp_ary.shape[0] * grp_ary.shape[1] for grp_ary in ary] 

 group_starts = np.cumsum([0] + group_sizes) 

 result = actx.empty(group_starts[-1], dtype=ary.entry_dtype) 

 for grp_start, grp_ary in zip(group_starts, ary): 

 actx.call_loopy(prg(), 

 grp_ary=grp_ary, 

 result=result, 

 grp_start=grp_start) 

 return result

Basically, the summary of the problem is that we need to remove stateful updates of the array since this will break lazy-evaluation. The intent of #192 is to hunt down and update all instances of stateful array updates in meshmode. Specifically for the flattening routines, this bit of code is the main culprit we need to get ride of:

meshmode/meshmode/dof_array.py

Lines 404 to 410 in bbec89a

 result = actx.empty(group_starts[-1], dtype=ary.entry_dtype) 

 for grp_start, grp_ary in zip(group_starts, ary): 

 actx.call_loopy(prg(), 

 grp_ary=grp_ary, 

 result=result, 

 grp_start=grp_start)

What was originally proposed in #192 (which I really liked 😞) was:

def _flatten_dof_array(ary: Any, strict: bool = True):
    if not isinstance(ary, DOFArray):
        if strict:
            raise TypeError(f"non-DOFArray type '{type(ary).__name__}' cannot "
                    "be flattened; use 'strict=False' to allow other types")
        else:
            return ary

    actx = ary.array_context
    if actx is None:
        raise ValueError("cannot flatten frozen DOFArrays")

    return actx.np.concatenate([actx.np.reshape(grp_ary, -1)
                                for grp_ary in ary])

But, turns out this isn't what we want to do since we want to maintain flexibility with regards to the memory layout of the array (x-ref: #192 (comment)).

I would appreciate some guidance on how to proceed forward now that #196 is in. cc @alexfikl, @inducer

Make a way to create a sub-boundary

cc @MTCam

Multiple boundary tags in meshmode export

meshmode/meshmode/interop/firedrake/mesh.py

Lines 802 to 805 in 6b2bd65

  .. warning:: 

  Currently, no custom boundary tags are exported along with the mesh. 

  :mod:`firedrake` seems to only allow one marker on each facet, whereas 

  :mod:`meshmode` allows many.

FWIW, since firedrakeproject/firedrake#2007 entities can be marked with multiple tags and it all works. After building the plex, create a "Face Sets" label (or if you don't like that literal, a firedrake.cython.dmcommon.FACE_SETS_LABEL) (or CELL_SETS_LABEL for cells) and mark the points.

Investigate binary Vtk files

When working on illinois-ceesd/mirgecom#139, I noticed that base64 Vtk XML files are shockingly inefficient. For equivalent data, Vtk takes 1.6M when a pickle with equivalent data takes 150K.

😱

dof_array.flat_norm is CL-specific

cf.

meshmode/meshmode/dof_array.py

Lines 645 to 646 in f87576d

 import pyopencl.array as cl 

 assert isinstance(ary, cl.Array)

Two `resample_by_picking` in simple-dg TranslationUnit

I was just playing with (and being amazed by!) https://github.com/inducer/meshmode/pull/216.(https://github.com/inducer/meshmode/pull/216/commits/1de7f8f8fc3518ead4cd0191adf2238078636543 specifically, and inducer/pytato@585576a)

If I run python simple.dg --lazy and set a break point at https://github.com/inducer/arraycontext/blob/82b03d20849d9aac0f54cb4e9c00ba0f5908911c/arraycontext/impl/pytato/compile.py#L306, and then go

print(self.pytato_program.program)

the resulting TranslationUnit comes out with two copies of resample_by_picking that aren't visibly different:

>>> print(self.pytato_program.program)
---------------------------------------------------------------------------
KERNEL: resample_by_picking
---------------------------------------------------------------------------
ARGUMENTS:
ary: type: np:dtype('float64'), shape: (nelements_vec, n_from_nodes), dim_tags: (N1:stride:n_from_nodes, N0:stride:1), offset:  aspace: global
from_element_indices: type: np:dtype('int64'), shape: (nelements), dim_tags: (N0:stride:1), offset:  aspace: global
n_from_nodes: ValueArg, type: np:dtype('int32')
n_to_nodes: ValueArg, type: np:dtype('int64')
nelements: ValueArg, type: np:dtype('int64')
nelements_vec: ValueArg, type: np:dtype('int32')
pick_list: type: np:dtype('int32'), shape: (n_to_nodes), dim_tags: (N0:stride:1), offset:  aspace: global
result: type: np:dtype('float64'), shape: (nelements, n_to_nodes), dim_tags: (N1:stride:n_to_nodes, N0:stride:1), offset:  aspace: global
---------------------------------------------------------------------------
DOMAINS:
[nelements] -> { [iel] : 0 <= iel < nelements }
[n_to_nodes] -> { [idof] : 0 <= idof < n_to_nodes }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
idof: ConcurrentDOFInameTag()
iel: ConcurrentElementInameTag()
---------------------------------------------------------------------------
INSTRUCTIONS:
for idof, iel
    result[iel, idof] = ary[from_element_indices[iel], pick_list[idof]] if from_element_indices[iel] != -1 else 0  {id=insn}
end idof, iel
---------------------------------------------------------------------------
---------------------------------------------------------------------------
KERNEL: resample_by_picking_0
---------------------------------------------------------------------------
ARGUMENTS:
ary: type: np:dtype('float64'), shape: (nelements_vec, n_from_nodes), dim_tags: (N1:stride:n_from_nodes, N0:stride:1), offset:  aspace: global
from_element_indices: type: np:dtype('int32'), shape: (nelements), dim_tags: (N0:stride:1), offset:  aspace: global
n_from_nodes: ValueArg, type: np:dtype('int32')
n_to_nodes: ValueArg, type: np:dtype('int64')
nelements: ValueArg, type: np:dtype('int64')
nelements_vec: ValueArg, type: np:dtype('int32')
pick_list: type: np:dtype('int32'), shape: (n_to_nodes), dim_tags: (N0:stride:1), offset:  aspace: global
result: type: np:dtype('float64'), shape: (nelements, n_to_nodes), dim_tags: (N1:stride:n_to_nodes, N0:stride:1), offset:  aspace: global
---------------------------------------------------------------------------
DOMAINS:
[nelements] -> { [iel] : 0 <= iel < nelements }
[n_to_nodes] -> { [idof] : 0 <= idof < n_to_nodes }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
idof: ConcurrentDOFInameTag()
iel: ConcurrentElementInameTag()
---------------------------------------------------------------------------
INSTRUCTIONS:
for idof, iel
    result[iel, idof] = ary[from_element_indices[iel], pick_list[idof]] if from_element_indices[iel] != -1 else 0  {id=insn}
end idof, iel
---------------------------------------------------------------------------
(...SNIP...)

Any thoughts on why that might be?

cc @kaushikcfd @matthiasdiener

Adapt Firedrake interop to Firedrake complex branch

meshmode/meshmode/interop/firedrake/connection.py

Lines 513 to 515 in 6e0dffc

 # All firedrake functions are the same dtype 

 dtype = self.firedrake_fspace().mesh().coordinates.dat.data.dtype 

 self._validate_field(mm_field, "mm_field", dtype=dtype)

seems to require all fields to agree with the geometry dtype, which seems overly restrictive given that, as of the complex merge, Firedrake supports complex data. Florian Bruckner reported this on Slack, with a backtrace like the following:

doublelayer_sphere.py:4: DeprecationWarning: meshmode.array_context is deprecated. Import this functionality from the arraycontext top-level package instead. This shim will remain working until 2022.
  from meshmode.array_context import PyOpenCLArrayContext
/firedrake/src/pytential/pytential/symbolic/primitives.py:1584: UserWarning: specified the name of the direction vector
  return DirectionalSourceDerivative(
/firedrake/src/sumpy/sumpy/kernel.py:1246: UserWarning: specified the name of the direction vector
  return type(kernel)(
/firedrake/src/firedrake/firedrake/external_operators/potential_evaluation/potentials.py:196: UserWarning: Functions in continuous function space will be projected to/from a 'Discontinuous Lagrange' space. Make sure any operators evaluated are continuous. Pass warn_if_cg=False to suppress this warning.
  warn("Functions in continuous function space will be projected "
/firedrake/src/loopy/loopy/target/execution.py:185: ParameterFinderWarning: Unable to generate code to automatically find 'nunit_dofs' from the shape of 'dst':
division with remainder in linear solve for 'nunit_dofs'
  warn("Unable to generate code to automatically "

Traceback (most recent call last):
  File "doublelayer_sphere.py", line 20, in <module>
    K = DoubleLayerPotential(u1, LaplaceKernel(dim=3), places, actx=actx, function_space=V, op_kwargs={'qbx_forced_limit': None})
  File "/firedrake/src/firedrake/firedrake/external_operators/abstract_external_operators.py", line 133, in evaluate
    return self._evaluate_action(x, *args, **kwargs)
  File "/firedrake/src/firedrake/firedrake/external_operators/potential_evaluation/potentials.py", line 144, in _evaluate_action
    return self._evaluate()
  File "/firedrake/src/firedrake/firedrake/external_operators/potential_evaluation/potentials.py", line 105, in _evaluate
    return self._eval_potential_operator(self.density, out=self)
  File "/firedrake/src/firedrake/firedrake/external_operators/potential_evaluation/potentials.py", line 99, in _eval_potential_operator
    return self.connection.to_firedrake(potential, out=out)
  File "/firedrake/src/firedrake/firedrake/external_operators/potential_evaluation/pytential.py", line 190, in to_firedrake
    self.target_to_meshmode_connection.from_meshmode(evaluated_potential)
  File "/firedrake/src/meshmode/meshmode/interop/firedrake/connection.py", line 515, in from_meshmode
    self._validate_field(mm_field, "mm_field", dtype=dtype)
  File "/firedrake/src/meshmode/meshmode/interop/firedrake/connection.py", line 356, in _validate_field
    check_dof_array(field, field_name)
  File "/firedrake/src/meshmode/meshmode/interop/firedrake/connection.py", line 349, in check_dof_array
    raise ValueError(f"'{arr_name}.entry_dtype' must be {dtype},"
ValueError: 'mm_field.entry_dtype' must be float64, not 'complex128'

cc @benSepanski

Grid generation fails with dimensionally disparate npoints

It would be good if we could make rectangular grids with uniform (or arbitrary) spacing. The example here is a rectangle [0, 10] x [0 , 1] wherein uniform spacing is attempted, i.e. npts = (101, 11). The grid generator appears to fail to produce this grid.

x0 = 0.0
y0 = 0.0
x1 = 10.0
y1 = 1.0
nx = 101
ny = 11

npts = (nx, ny)
a = (x0, y0)
b = (x1, y1)

from meshmode.mesh.generation import generate_regular_rect_mesh
mesh = generate_regular_rect_mesh(a=a, b=b, n=npts)

Enable mirgecom downstream CI

Leftover from #114.

Reenable mirgecom downstream CI after illinois-ceesd/mirgecom#212

Probably should happen at the same time as inducer/grudge#44

@matthiasdiener Could you remember to do that once there's a resolution to illinois-ceesd/mirgecom#212?

`make_face_restriction` will operate without complaint on non-existent boundary tags

Run downstream CI

Like inducer/loopy#220

make_face_restriction/DG face mass matrix: Handling elements with different-shaped faces (e.g. pyramids)

Way out there, I know, but @thomasgibson, @lukeolson and I were thinking about it today. Currently, the DG face mass matrix code in grudge assumes that for each volume element, one element per face is generated, winding up with nfaces * nelements elements, and that the elements for each face are numbered contiguously. This is not feasible for a pyramid, or any other element type that has more than one shape of face. (Because element groups are numbered contiguously, and the different-shaped face element must be in a different group.) With time, we should probably remove this assumption, and instead move to "face-major" numbering (from "volume-element-major" numbering currently). Unfortunately, this has an efficiency impact in DG, as the face mass matrix must then find its face data from (for e.g. tetrahedra) four different places in memory rather than one contiguous one.

Relevant source:

meshmode/meshmode/discretization/connection/face.py

Line 151 in 960d740

def make_face_restriction(actx, discr, group_factory, boundary_tag,

Modal / nodal tracking

Right now, it's possible to feed modal data to things that expect nodal data and vice versa. Everything is just a bunch of numbers in a DOFArray. Especially if @thomasgibson's entropy-stable work manages to stick, this will be a persistent source of user error. We should perhaps have the DOFArray remember the discretization_key for each group and check those on input to make sure they're as expected.

Optimization opportunity: choose face discretization so that OppositeFaceSwap is single-batch

Tolerance too strict in _test_node_vertex_consistency_resampling

We have seen lots of errors like these:

"/Users/mdiener/Work/efuse/meshmode/meshmode/mesh/__init__.py", line 1207, in _test_node_vertex_consistency
    assert _test_node_vertex_consistency_resampling(mesh, mgrp, tol)
  File "/Users/mdiener/Work/efuse/meshmode/meshmode/mesh/__init__.py", line 1195, in _test_node_vertex_consistency_resampling
    assert max_el_vertex_error < tol*size, max_el_vertex_error
AssertionError: 1.144170793882154e-15

This is the code:

meshmode/meshmode/mesh/__init__.py

Line 1178 in 48abe4b

def _test_node_vertex_consistency_resampling(mesh, mgrp, tol):

Connections should view themselves as doing projection, not interpolation

Some of the terminology is annoyingly interp-focused.

Method -> function refactoring

Discretization.num_reference_derivative for sure
Maybe: InterpolatoryElementGroup.{diff_matrix,mass_matrices}.

cc @thomasgibson

x-ref:

Firedrake connection: upcoming changes to the way we handle orientations

Just FYI, we have a (longish, perhaps by October) plan to change the way we handle orientation of meshes on the Firedrake side. Rather than globally orienting the mesh such that every physical element maps to the same reference element, each physical element will be mapped onto the reference element through some member of the relevant dihedral group. This will then entail changes to the way one grabs dofs in canonical reference element order.

We haven't worked out all the details yet. The rationale for this is (in the short term somewhat prosaically) wanting to support round-tripping checkpoint files from P to Q processes. Longer term it will enable mixed cell types and even just unstructured hex meshes (where the globally consistent orientation tricks we pull are not always possible).

Sampling at a point in space

@dshtey2 is looking for a way to evaluate DOFArray-based fields at given spatial positions. This issue is intended for discussion of that functionality: what code might already exist for this, what needs to be added, etc.

cc @inducer

Crashes when mesh partitioning produces empty partitions

Sample run exhibiting the problem: https://github.com/inducer/meshmode/pull/25/checks?check_run_id=951397070

I ran into this as part of #25, but I decided to punt for now.

cc @lukeolson @majosm

L2ProjectionInverseDiscretizationConnection should allow modal output

This could avoid the forced (and possibly un-needed) modal-to-nodal conversion at the end of the connection:

meshmode/meshmode/discretization/connection/projection.py

Lines 219 to 229 in cdcc7f5

 return DOFArray( 

 actx, 

 data=tuple( 

 actx.einsum("ij,ej->ei", 

 vandermonde_matrix(grp), 

 c_i, 

 arg_names=("vdm", "coeffs"), 

 tagged=(FirstAxisIsElementsTag(),)) 

 for grp, c_i in zip(self.to_discr.groups, coefficients) 

 ) 

 )

I think this is easy to do since it receives a connection as input. It just needs to check whether the output side of the connection is modal, and if so, skip the extra conversion to nodal.

I noticed this while reviewing #192.

cc @alexfikl @thomasgibson

Not all pure-permutation connections are recognized as such

This leads (for example) to us doing unnecessary matvecs (resample_by_mat) instead of indirect memory access (resample_by_picking) in nodal DG.

Solution started in #105

cc @nchristensen

Should node finding use Newton or Gauss-Newton?

Upsides of Gauss-Newton:

More robust?
Gives us "the closest" if the exact point can't be found

Downside:

Conditioning penalty via the normal equations

meshmode/meshmode/discretization/connection/opposite_face.py

Lines 136 to 153 in dd24703

 def _find_src_unit_nodes_via_gauss_newton( 

 tgt_bdry_nodes, 

 src_bdry_nodes, 

 src_grp, src_mesh_grp, 

 tgt_bdry_discr, src_bdry_discr, 

 tol): 

 dim = src_grp.dim 

 _, nelements, ntgt_unit_nodes = tgt_bdry_nodes.shape 

 initial_guess = np.mean(src_mesh_grp.vertex_unit_coordinates(), axis=0) 

 src_unit_nodes = np.empty((dim, nelements, ntgt_unit_nodes)) 

 src_unit_nodes[:] = initial_guess.reshape(-1, 1, 1) 

 import modepy as mp 

 src_grp_basis_fcts = src_grp.basis_obj().functions 

 vdm = mp.vandermonde(src_grp_basis_fcts, src_grp.unit_nodes) 

 inv_t_vdm = la.inv(vdm.T) 

 nsrc_funcs = len(src_grp_basis_fcts)

cc @xywei @alexfikl @majosm

Provide a pytest fixture for array context creation

To avoid ubiquitous code duplication and gratuitous pyopencl imports.

meshmode.mesh.generation talks about group_factory, but means something different than what's defined in poly_element

#56 (comment)

Missing transforms on compose_index_maps in DirectDiscretizationConnection

See this FIXME:

meshmode/meshmode/discretization/connection/direct.py

Lines 126 to 132 in e179d35

 # FIXME: Current arraycontext (2021-06-17, 9e5fb5d) does not map 

 # the iel_init iname to a GPU axis, leading this kernel to likely 

 # be very slow. Fortunately, this should only run during 

 # problem setup. 

 # 

 # cf. https://github.com/inducer/arraycontext/pull/29 

 # for a strategy that could/should be used instead.

Left over from #220.

x-ref: inducer/arraycontext#29, 03d883a

cc @matthiasdiener @thomasgibson

Spurious, (seemingly) very occasional NaN in

Last seen in https://github.com/inducer/meshmode/pull/213/checks?check_run_id=2770076958:

=================================== FAILURES ===================================
_ test_flatten_unflatten[<array context factory for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz' on 'Portable Computing Language'>] _
[gw0] linux -- Python 3.9.4 /home/runner/work/meshmode/meshmode/.miniforge3/envs/testing/bin/python3
Traceback (most recent call last):
  File "/home/runner/work/meshmode/meshmode/test/test_array.py", line 93, in test_flatten_unflatten
    assert flat_norm(c - c_round_trip) < 1.0e-8
AssertionError: assert nan < 1e-08
 +  where nan = flat_norm((MyContainer(name='flatten', mass=DOFArray((cl.Array([[0.00000000e+000],\n       [            nan],\n       [4.65971693e-...  1.38196601e-01,\n               -1.11022302e-16]]),))                                          ],\n      dtype=object)) - MyContainer(name='flatten', mass=DOFArray((cl.Array([[0.00000000e+000],\n       [            nan],\n       [4.65971693e-...  1.38196601e-01,\n               -1.11022302e-16]]),))                                          ],\n      dtype=object))))
=============================== warnings summary ===============================

Element groups shouldn't have to change when joining a mesh/discretization

That means that things like index and element_nr_base and such shouldn't exist. They belong in the mesh or discretization.

We should deprecate and eventually remove these fields.

Interface with Nico Schlömer's meshio

https://github.com/nschloe/meshio

Surface refinement node-vertex consistency error on Icelake-client

Continued from the end of #118.

test$  rm *vtu;pycl test_refinement.py 'test_refine_surfaces(_acf,  "icosphere", True)' 
Traceback (most recent call last):
  File "/home/andreas/src/meshmode/test/test_refinement.py", line 360, in <module>
    exec(sys.argv[1])
  File "<string>", line 1, in <module>
  File "/home/andreas/src/meshmode/test/test_refinement.py", line 349, in test_refine_surfaces
    refined_mesh = refine_uniformly(mesh, 1)
  File "/home/andreas/src/meshmode/meshmode/mesh/refinement/__init__.py", line 991, in refine_uniformly
    refiner.refine_uniformly()
  File "/home/andreas/src/meshmode/meshmode/mesh/refinement/no_adjacency.py", line 61, in refine_uniformly
    return self.refine(flags)
  File "/home/andreas/src/meshmode/meshmode/mesh/refinement/no_adjacency.py", line 227, in refine
    new_mesh = Mesh(new_vertices, new_el_groups, is_conforming=(
  File "/home/andreas/src/meshmode/meshmode/mesh/__init__.py", line 779, in __init__
    assert _test_node_vertex_consistency(
  File "/home/andreas/src/meshmode/meshmode/mesh/__init__.py", line 983, in _test_node_vertex_consistency
    assert _test_node_vertex_consistency_resampling(mesh, mgrp, tol)
  File "/home/andreas/src/meshmode/meshmode/mesh/__init__.py", line 971, in _test_node_vertex_consistency_resampling
    assert max_el_vertex_error < tol*size, max_el_vertex_error
AssertionError: 9.901371953343942

Quite likely a numpy behavior change.

numpy 1.20 wheel: fine
numpy 1.20.1 wheel: fine
pip install --no-binary ":all:" numpy cython: (resulting in 1.20.1) error (only on my laptop, not on dunkel)

cc @alexfikl because it feels less lonely that way. :)

Mesh distribution fails when number of ranks >= 29

When distributing a mesh generated by generate_regular_rect_mesh, I'm encountering the following error when running on 29 or more MPI ranks:

(dg) bash-4.2$ srun -n 29 python -m mpi4py example.py
Traceback (most recent call last):
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/site-packages/mpi4py/run.py", line 196, in main
    run_command_line(args)
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/runpy.py", line 263, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "<removed>/.miniconda3/envs/dg/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "example.py", line 21, in <module>
    local_mesh = mesh_dist.send_mesh_parts(mesh, part_per_element, num_parts)
  File "<removed>/code/meshmode/meshmode/distributed.py", line 85, in send_mesh_parts
    parts = [partition_mesh(mesh, part_per_element, i)[0]
  File "<removed>/code/meshmode/meshmode/distributed.py", line 85, in <listcomp>
    parts = [partition_mesh(mesh, part_per_element, i)[0]
  File "<removed>/code/meshmode/meshmode/mesh/processing.py", line 163, in partition_mesh
    part_mesh = Mesh(
  File "<removed>/code/meshmode/meshmode/mesh/__init__.py", line 718, in __init__
    raise ValueError("too few bits in element_id_dtype to represent all "
ValueError: too few bits in element_id_dtype to represent all boundary tags

Below is a simplified version of the code that can be used to reproduce the error:

from mpi4py import MPI


comm = MPI.COMM_WORLD

num_parts = comm.Get_size()

from meshmode.distributed import MPIMeshDistributor, get_partition_by_pymetis
mesh_dist = MPIMeshDistributor(comm)

dim = 2

if mesh_dist.is_mananger_rank():
    from meshmode.mesh.generation import generate_regular_rect_mesh
    mesh = generate_regular_rect_mesh(a=(-0.5,)*dim,
                                      b=(0.5,)*dim,
                                      n=(32,)*dim)

    part_per_element = get_partition_by_pymetis(mesh, num_parts)

    local_mesh = mesh_dist.send_mesh_parts(mesh, part_per_element, num_parts)
else:
    local_mesh = mesh_dist.receive_mesh_part()

I also have the serialized mesh output from an attempted run with 29 ranks (make_mesh.py).

Reductions in `ArrayContext.np` behave differently from `numpy`

When the reduction result is a scalar,

reductions in numpy, e.g., numpy.sum, numpy.max, return a scalar value, while
reductions in ArrayContext.np return an array of size one.

This difference affects, for example, gmres residual reporting in pytential, where the formatting .8e will cause an error if norm is an array.

Making `quadrature_rule` a public method on relevant element groups

This is based on a discussion in inducer/grudge#62.

Currently NodalElementGroupBase has a public user-facing method weights. This is used internally in grudge when integrating on these elements, using the unit_nodes as quadrature points.

Should we move _quadrature_rule (found in here and similar places) out and add quadrature_rule as a public method to NodalElementGroupBase. This would return a modepy.Quadrature object that can be used for reasoning about quadrature accuracy (and integration obviously 😄 ).

I am in favor of this. However, we have a problem that we need to address.

Problem:
As briefly described in this issue, some element groups do not have a concrete subclass of modepy.Quadrature (e.g. JacobiGaussQuadrature, LegendreGaussQuadrature, etc.) As a concrete example, consider subclasses of _MassMatrixQuadratureElementGroup. Grudge uses the weights method via the mass matrix and the interpolation nodes (unit_nodes) to compute using "mass matrix quadrature" (using @inducer's terminology here). It's easy enough to create a modepy.Quadrature object with the relevant weights/nodes, BUT when we have code that wants to check the order of accuracy via quad.exact_to, we run into problems.

Quick summary:
Creating a generic Quadrature object as a bag of nodes and weights lacks sufficient information to determine what exact_to needs to be. Currently, you can create a generic quadrature rule, but you cannot ask quad.exact_to (because it's NOT defined!)

If we want quadrature_rule to be a public-facing method, we should resolve the problem.

How to realize indirect accesses (reads or writes)

For the benefit of lazy eval/pytato, all our indirect accesses are predominantly expressed as indirect reads. Indirect writes are also possible. Which of the two has higher throughput on a GPU? It might be worth studying that.

cc @kaushikcfd

Refiner.get_current_mesh() raises TypeError when called

Eg.

>>> r = Refiner(m)
>>> r.refine(np.ones(m.nelements))
32
>>> r.get_current_mesh()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/matt/src/meshmode/meshmode/mesh/refinement/__init__.py", line 239, in get_current_mesh
    len(self.last_mesh.vertices[0])),
TypeError: generate_nodal_adjacency() missing 1 required positional argument: 'groups'

Distributed setup calls _make_cross_face_batches once per face

Here. It definitely shouldn't. As the name possibly suggests, _make_cross_face_batches is supposed to be a batched routine. (called once per rank pair) @MTCam spotted this when doing some parallel runs.

This is actually pretty bad, since it creates lots of small interpolation batches rather than one big one.

Array Context: do inputs to call_loopy need to be thawed?

Not currently defined.

cc @kaushikcfd

Refiner dies on application mesh with an index error

Public version of
https://gitlab.tiker.net/papers/2015-mesh-refinement/issues/1

cc @guptashvm

Should store per-interpolation-batch surjectivity information

In the (relatively common) surjective case, we could skip emitting these global barriers:

meshmode/meshmode/discretization/connection/direct.py

Lines 310 to 311 in cdcc7f5

  result[iel_init, idof_init] = 0 {id=init} 

  ... gbarrier {id=barrier, dep=init}

)

meshmode/meshmode/discretization/connection/direct.py

Lines 342 to 343 in cdcc7f5

  result[iel_init, idof_init] = 0 {id=init} 

  ... gbarrier {id=barrier, dep=init}

I noticed this while reviewing #192.

cc @thomasgibson

Make refiner work in 1D

@guptashvm: The refiner should also work in 1D, i.e. just some line segments strung together. Fortunately, the connectivity computation there should be effectively trivial. @mattwala is currently trying to make that happen. Any comments on gotchas and things that may not be straightforward?

	if not isinstance(assignee, Subscript):
	raise ValueError("assignees in "
	"ElementwiseMapKernelTag-tagged kernels must be "
	"subscripts")

	def _flatten_dof_array(ary: Any, strict: bool = True):
	if not isinstance(ary, DOFArray):
	if strict:
	raise TypeError(f"non-DOFArray type '{type(ary).__name__}' cannot "
	"be flattened; use 'strict=False' to allow other types")
	else:
	return ary

	actx = ary.array_context
	if actx is None:
	raise ValueError("cannot flatten frozen DOFArrays")

	@memoize_in(actx, (_flatten_dof_array, "flatten_prg"))
	def prg():
	return make_loopy_program(
	"{[iel,idof]: 0<=iel<nelements and 0<=idof<ndofs_per_element}",
	"""result[grp_start + iel*ndofs_per_element + idof] \
	= grp_ary[iel, idof]""",
	name="flatten")

	group_sizes = [grp_ary.shape[0] * grp_ary.shape[1] for grp_ary in ary]
	group_starts = np.cumsum([0] + group_sizes)

	result = actx.empty(group_starts[-1], dtype=ary.entry_dtype)

	for grp_start, grp_ary in zip(group_starts, ary):
	actx.call_loopy(prg(),
	grp_ary=grp_ary,
	result=result,
	grp_start=grp_start)

	return result

	.. warning::
	Currently, no custom boundary tags are exported along with the mesh.
	:mod:`firedrake` seems to only allow one marker on each facet, whereas
	:mod:`meshmode` allows many.

	# All firedrake functions are the same dtype
	dtype = self.firedrake_fspace().mesh().coordinates.dat.data.dtype
	self._validate_field(mm_field, "mm_field", dtype=dtype)

	return DOFArray(
	actx,
	data=tuple(
	actx.einsum("ij,ej->ei",
	vandermonde_matrix(grp),
	c_i,
	arg_names=("vdm", "coeffs"),
	tagged=(FirstAxisIsElementsTag(),))
	for grp, c_i in zip(self.to_discr.groups, coefficients)
	)
	)

	def _find_src_unit_nodes_via_gauss_newton(
	tgt_bdry_nodes,
	src_bdry_nodes,
	src_grp, src_mesh_grp,
	tgt_bdry_discr, src_bdry_discr,
	tol):
	dim = src_grp.dim
	_, nelements, ntgt_unit_nodes = tgt_bdry_nodes.shape

	initial_guess = np.mean(src_mesh_grp.vertex_unit_coordinates(), axis=0)
	src_unit_nodes = np.empty((dim, nelements, ntgt_unit_nodes))
	src_unit_nodes[:] = initial_guess.reshape(-1, 1, 1)

	import modepy as mp
	src_grp_basis_fcts = src_grp.basis_obj().functions
	vdm = mp.vandermonde(src_grp_basis_fcts, src_grp.unit_nodes)
	inv_t_vdm = la.inv(vdm.T)
	nsrc_funcs = len(src_grp_basis_fcts)

	# FIXME: Current arraycontext (2021-06-17, 9e5fb5d) does not map
	# the iel_init iname to a GPU axis, leading this kernel to likely
	# be very slow. Fortunately, this should only run during
	# problem setup.
	#
	# cf. https://github.com/inducer/arraycontext/pull/29
	# for a strategy that could/should be used instead.

	result[iel_init, idof_init] = 0 {id=init}
	... gbarrier {id=barrier, dep=init}

inducer / meshmode Goto Github PK

meshmode's Introduction

meshmode: High-Order Meshes and Discontinuous Function Spaces

meshmode's People

Contributors

Stargazers

Watchers

Forkers

meshmode's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs