mikegrudic / pytreegrav Goto Github PK

Fast N-body gravitational force and potential in Python

License: MIT License

Python 90.52% TeX 9.48%

pytreegrav's Introduction

Introduction

pytreegrav is a package for computing the gravitational potential and/or field of a set of particles. It includes methods for brute-force direction summation and for the fast, approximate Barnes-Hut treecode method. For the Barnes-Hut method we implement an oct-tree as a numba jitclass to achieve much higher peformance than the equivalent pure Python implementation, without writing a single line of C or Cython. Full documentation is available here.

Installation

pip install pytreegrav or clone the repo and run python setup.py install from the repo directory.

Walkthrough

First let's import the stuff we want and generate some particle positions and masses - these would be your particle data for whatever your problem is.

import numpy as np
from pytreegrav import Accel, Potential

N = 10**5 # number of particles
x = np.random.rand(N,3) # positions randomly sampled in the unit cube
m = np.repeat(1./N,N) # masses - let the system have unit mass
h = np.repeat(0.01,N) # softening radii - these are optional, assumed 0 if not provided to the frontend functions

Now we can use the Accel and Potential functions to compute the gravitational field and potential at each particle position:

print(Accel(x,m,h))
print(Potential(x,m,h))

[[-0.1521787   0.2958852  -0.30109005]
 [-0.50678204 -0.37489886 -1.0558666 ]
 [-0.24650087  0.95423467 -0.175074  ]
 ...
 [ 0.87868472 -1.28332176 -0.22718531]
 [-0.41962742  0.32372245 -1.31829084]
 [ 2.45127054  0.38292881  0.05820412]]
[-2.35518057 -2.19299372 -2.28494218 ... -2.11783337 -2.1653377
 -1.80464695]

By default, pytreegrav will try to make the optimal choice between brute-force and tree methods for speed, but we can also force it to use one method or another. Let's try both and compare their runtimes:

from time import time
t = time()
# tree gravitational acceleration
accel_tree = Accel(x,m,h,method='tree')
print("Tree accel runtime: %gs"%(time() - t)); t = time()

accel_bruteforce = Accel(x,m,h,method='bruteforce')
print("Brute force accel runtime: %gs"%(time() - t)); t = time()

phi_tree = Potential(x,m,h,method='tree')
print("Tree potential runtime: %gs"%(time() - t)); t = time()

phi_bruteforce = Potential(x,m,h,method='bruteforce')
print("Brute force potential runtime: %gs"%(time() - t)); t = time()

Tree accel runtime: 0.927745s
Brute force accel runtime: 44.1175s
Tree potential runtime: 0.802386s
Brute force potential runtime: 20.0234s

As you can see, the tree-based methods can be much faster than the brute-force methods, especially for particle counts exceeding 10^4. Here's an example of how much faster the treecode is when run on a Plummer sphere with a variable number of particles, on a single core of an Intel i9 9900k workstation:

But there's no free lunch here: the tree methods are approximate. Let's quantify the RMS errors of the stuff we just computed, compared to the exact brute-force solutions:

acc_error = np.sqrt(np.mean(np.sum((accel_tree-accel_bruteforce)**2,axis=1))) # RMS force error
print("RMS force error: ", acc_error)
phi_error = np.std(phi_tree - phi_bruteforce)
print("RMS potential error: ", phi_error)

RMS force error:  0.006739311224338851
RMS potential error:  0.0003888328578588027

The above errors are typical for default settings: ~1% force error and ~0.1% potential error. The error in the tree approximation is controlled by the Barnes-Hut opening angle theta, set to 0.7 by default. Smaller theta gives higher accuracy, but also runs slower:

thetas = 0.1,0.2,0.4,0.8 # different thetas to try
for theta in thetas:
    t = time()    
    accel_tree = Accel(x,m,h,method='tree',theta=theta)
    acc_error = np.sqrt(np.mean(np.sum((accel_tree-accel_bruteforce)**2,axis=1)))
    print("theta=%g Runtime: %gs RMS force error: %g"%(theta, time()-t, acc_error))

theta=0.1 Runtime: 63.1738s RMS force error: 3.78978e-05
theta=0.2 Runtime: 14.3356s RMS force error: 0.000258755
theta=0.4 Runtime: 2.91292s RMS force error: 0.00148698
theta=0.8 Runtime: 0.724668s RMS force error: 0.0105937

Both brute-force and tree-based calculations can be parallelized across all available logical cores via OpenMP, by specifying parallel=True. This can speed things up considerably, with parallel scaling that will vary with your core and particle number:

from time import time
t = time()
# tree gravitational acceleration
accel_tree = Accel(x,m,h,method='tree',parallel=True)
print("Tree accel runtime in parallel: %gs"%(time() - t)); t = time()

accel_bruteforce = Accel(x,m,h,method='bruteforce',parallel=True)
print("Brute force accel runtime in parallel: %gs"%(time() - t)); t = time()

phi_tree = Potential(x,m,h,method='tree',parallel=True)
print("Tree potential runtime in parallel: %gs"%(time() - t)); t = time()

phi_bruteforce = Potential(x,m,h,method='bruteforce',parallel=True)
print("Brute force potential runtime in parallel: %gs"%(time() - t)); t = time()

Tree accel runtime in parallel: 0.222271s
Brute force accel runtime in parallel: 7.25576s
Tree potential runtime in parallel: 0.181393s
Brute force potential runtime in parallel: 5.72611s

What if I want to evaluate the fields at different points than where the particles are?

We got you covered. The Target methods do exactly this: you specify separate sets of points for the particle positions and the field evaluation, and everything otherwise works exactly the same (including optional parallelization and choice of solver):

from pytreegrav import AccelTarget, PotentialTarget

# generate a separate set of "target" positions where we want to know the potential and field
N_target = 10**4
x_target = np.random.rand(N_target,3)
h_target = np.repeat(0.01,N_target) # optional "target" softening: this sets a floor on the softening length of all forces/potentials computed

accel_tree = AccelTarget(x_target, x,m, h_target=h_target, h_source=h,method='tree') # we provide the points/masses/softenings we generated before as the "source" particles
accel_bruteforce = AccelTarget(x_target,x,m,h_source=h,method='bruteforce')

acc_error = np.sqrt(np.mean(np.sum((accel_tree-accel_bruteforce)**2,axis=1))) # RMS force error
print("RMS force error: ", acc_error)

phi_tree = PotentialTarget(x_target, x,m, h_target=h_target, h_source=h,method='tree') # we provide the points/masses/softenings we generated before as the "source" particles
phi_bruteforce = PotentialTarget(x_target,x,m,h_target=h_target, h_source=h,method='bruteforce')

phi_error = np.std(phi_tree - phi_bruteforce)
print("RMS potential error: ", phi_error)

RMS force error:  0.006719983300560105
RMS potential error:  0.0003873676304955059

Ray-tracing

pytreegrav's octree implementation can be used for efficient tree-based searches for ray-tracing of unstructured data. Currently implemented is the method ColumnDensity, which calculates the integral of the density field to infinity along a grid of rays originating at each particle (defaulting to 6 rays). For example:

columns = ColumnDensity(x, m, h, parallel=True) # shape (N,6) array of column densities in 6 angular bins - this is fastest but least accurate
columns_10 = ColumnDensity(x, m, h, rays=10, parallel=True) # shape (N, 10) array column densities along 10 random rays
columns_random = ColumnDensity(x, m, h, randomize_rays=True, parallel=True) # can randomize the ray grid for each particle so that there are no correlated errors due to the angular discretization
columns_custom = ColumnDensity(x, m, h, rays=np.random.normal(size=(100,3)), parallel=True)  # can also pass an arbitrary set of rays for the raygrid; these need not be normalized
κ = 0.02 # example opacity, in code units
σ = m * κ # total cross-section in each particle is product of mass and opacity
𝛕 = ColumnDensity(x, σ, h, parallel=True) # can pass cross-section instead of mass to get optical depth
𝛕_eff = -np.log(np.exp(-𝛕.clip(-300,300)).mean(axis=1)) # effective optical depth that would give the same radiation flux from a background; note clipping because overflow is not uncommon here
Σ_eff = 𝛕_eff / κ # effective column density *for this opacity* in code mass/code length^2
NH_eff = Σ_eff X_H / m_p  # column density in H nuclei code length^-2

Community

This code is actively developed and maintained by Mike Grudic.

If you would like help using pytreegrav, please ask a question on our Discussions page.

If you have found a bug or an issue using pytreegrav, please open an issue.

pytreegrav's People

Contributors

Stargazers

Watchers

Forkers

austinorr ebortolas quatrope gianuzzi bwkeller agurvich jingyaodou psharda jainary4

pytreegrav's Issues

Segmentation fault with `Potential()`

Hi, I'm very thankful to you for making pytreegrav!
This time, I have come across an issue with pytreegrav. I'm happy if I could get some help.

Problem

I have received Segmentation fault from Potential()

What is mysterious is that larger size of arrays don't lead to Segmentation fault, while smaller size of arrays do lead to.

What I did & received

The code outputs

Kinetic complete.  
Segmentation fault (core dumped)

To identify the place producing Segmentation fault, I put print("Kinetic complete.") and print("Potential complete."). I received the first but not the latter, so I think Potential() causes Segmentation fault.
I added del and gc.collect() and it doesn't make any change.
I tried sys.settrace and gdb python refering to https://stackoverflow.com/questions/10035541/what-causes-a-python-segmentation-fault. However I couldn't use it well.
I just put sys.settrace(None) in the first block of my code, and (gdb) backtrace returned

#0  0x00007fffbd5e6bd0 in pytreegrav::octree::Octree::BuildTree$2415(instance::jitclass::Octree$237fffd6b84c10$3cSizes$3aarray$28float64$2c$201d$2c$20A$29$2cDeltas$3aarray$28float64$2c$201d$2c$20A$29$2cCoordinates$3aarray$28float64$2c$202d$2c$20A$29$2cMasses$3aarray$28float64$2c$201d$2c$20A$29$2cQuadrupoles$3aarray$28float64$2c$203d$2c$20A$29$2cHasQuads$3abool$2cNumParticles$3aint64$2cNumNodes$3aint64$2cSoftenings$3aarray$28float64$2c$201d$2c$20A$29$2cNextBranch$3aarray$28int64$2c$201d$2c$20A$29$2cFirstSubnode$3aarray$28int64$2c$201d$2c$20A$29$2cTreewalkIndices$3aarray$28int64$2c$201d$2c$20A$29$3e, Array<double, 2, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>) ()

#1  0x00007fffbd00a5de in pytreegrav::octree::Octree::__init__$2414(instance::jitclass::Octree$237fffd6b84c10$3cSizes$3aarray$28float64$2c$201d$2c$20A$29$2cDeltas$3aarray$28float64$2c$201d$2c$20A$29$2cCoordinates$3aarray$28float64$2c$202d$2c$20A$29$2cMasses$3aarray$28float64$2c$201d$2c$20A$29$2cQuadrupoles$3aarray$28float64$2c$203d$2c$20A$29$2cHasQuads$3abool$2cNumParticles$3aint64$2cNumNodes$3aint64$2cSoftenings$3aarray$28float64$2c$201d$2c$20A$29$2cNextBranch$3aarray$28int64$2c$201d$2c$20A$29$2cFirstSubnode$3aarray$28int64$2c$201d$2c$20A$29$2cTreewalkIndices$3aarray$28int64$2c$201d$2c$20A$29$3e, Array<double, 2, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, bool, bool) ()

#2  0x00007fffbcfe8151 in $3cdynamic$3e::ctor$2413(Array<double, 2, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, bool, bool) ()

#3  0x00007fffbcfe863c in cpython::$3cdynamic$3e::ctor$2413(Array<double, 2, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, bool, bool) ()

... (long backtrace)  

#46 0x000055555573afe5 in _start () at ../sysdeps/x86_64/elf/start.S:103

(gdb) info locals returned No symbol table info available.

Script

It's long, so I cut off some parts of the code.

from pytreegrav import Potential as ptgPotential

...(cut)

for j,i in enumerate(snaplist):
    with h5py.File(datafile, 'r') as f:
        pos_DM = np.asarray(f["PartType1/Coordinates"])
        vel_DM = np.asarray(f["PartType1/Velocities"], dtype="float64") * 1e-2  # kpc/Myr
        mas_DM = np.asarray(f["PartType1/Masses"]) * 1e8  # M_sun
        IDs_DM = np.asarray(f["PartType1/ParticleIDs"], dtype="int32")
    
    # Withdraw satellite particles having been bound in previous snapshot
#------------------------------#
#    pos_DM_sate_bnd = pos_DM
#    vel_DM_sate_bnd = vel_DM
#    mas_DM_sate_bnd = mas_DM
#    IDs_DM_sate_bnd = IDs_DM
#    num_DM_sate_bnd = len(IDs_DM_sate_bnd)
#------------------------------#
    idxs_DM_sate_bnd = np.empty_like(IDs_DM_sate_bnd)
    libc1.get_bound_idxs(IDs_DM, IDs_DM_sate_bnd, idxs_DM_sate_bnd)
    pos_DM_sate_bnd = pos_DM[idxs_DM_sate_bnd,:]
    vel_DM_sate_bnd = vel_DM[idxs_DM_sate_bnd,:]
    mas_DM_sate_bnd = mas_DM[idxs_DM_sate_bnd]
    IDs_DM_sate_bnd = IDs_DM[idxs_DM_sate_bnd]
    num_DM_sate_bnd = len(idxs_DM_sate_bnd)
#------------------------------#
    del pos_DM; del vel_DM; del mas_DM; del IDs_DM; gc.collect()


# Not important section from here...  
    # Calculate BcV
    BcV_DM_sate_bnd = calc_CoM(vel_DM_sate_bnd, num_DM_sate_bnd)

    # Calculate kinetic energies
    num_DM_sate_bnd_ctp = ctypes.c_int(num_DM_sate_bnd)
    mass_sate_ctp = ctypes.c_double(mas_DM_sate_bnd[0])
    kin_ene = np.empty(num_DM_sate_bnd, dtype="float64")
    libc1.calc_kin_ene( vel_DM_sate_bnd - BcV_DM_sate_bnd, mass_sate_ctp, num_DM_sate_bnd_ctp, kin_ene)
    del vel_DM_sate_bnd; gc.collect()
    print("Kinetic complete.")
# to here.
    

    # Calculate potential energies
    eps_DM = np.repeat(eps, num_DM_sate_bnd)
    pot_ene = ptgPotential(pos_DM_sate_bnd, mas_DM_sate_bnd, softening=eps_DM, G=4.493e-12, theta=.6, parallel=True, method="tree")
    print("Potential complete.")

...

Helps for you to read the script

The function ptgPotential is a calculator of potential energy, and is also the causer of Segmentation fault.
The flow is following:
1. Read the particle data.
2. Pick out a part of the particles using idxs_DM_sate_bnd. If I use the comment-out part (switch the section #-----#), the code uses all particles.
  2.5. (Not important) Calculate the baryocentric velocity (BcV).
3. (Not important) Calculate kinetic energies of each particles.
4. Calculate potential energies of each particles.

Context

This is an analysis code for N-body gravitational simulation.
The number of DM particles (N) = 23075000 (~10^7).

I want to calculate bounding energy (= kinetic energy - potential energy).

I use my own C library to calculate kinetic energy, and it works with no problem.

For potential energy, I use a function of pytreegrav: Potential(). And it seems to cause Segmentation fault.

What is mysterious, is that it doesn't raise Segmentation fault with all (N~10^7) particles while it raises if I pick out and use N=8075000 (~10^6) particles.

I suspect that it's not pytreegrav's fault, but I put this question here just in case.

Environment

The machine is a cluster of an institute. I don't have access to the details of OS.

Linux
Python 3.8.8 (Anaconda3-2022.05-Linux-x86_64)
Numpy 1.20.1
Numba 0.53.1

Large Memory Use Resulting in Crash

Hi,

I've been running into problems with large memory use. The code works well for a test case which I ran where 10 million particles within a unit cube and random masses were used:

Interestingly, the code crashes when I pass in particle positions and masses for approximately 8.6 million particles.

It produces the following error:

The server I am running on has approximately 129Gbs of ram. The thread memory use grows gradually until it has filled up the ram at which point it crashes. I am unsure why this is the case when the problem does not occur with the 10 million particle simulation. I initially thought the problem was to do with the particles not lying within a unit cube, and the particle masses not being between 0 and 1, but even after rescaling them so that this is the case, the error is still produced.

Any tips/suggestions would be greatly appreciated. If there is any important information I have missed don't hesitate to let me know!

Thanks in advance,
Geoff

Units of positions, masses and accelrations?

Hi Mike

What units are the positions and masses expected to be in and what are the units of the returned acceleration?

Kind Regards
Jordan

calculations with nan inputs silently enter an infinite loop

thanks to Alex G for discovering this. will address soon.

octree force walk is not actually parallelizable???

CPU usage is consistent with threads being launched and doing work but timing is 2x worse when you use GetAccelParallel from pykdgrav.pykdgrav.octree

JOSS review (adrn)

Ref: openjournals/joss-reviews#3675

Hi @mikegrudic 👋 ! Overall things look good here -- I'm looking forward to using this package myself! -- but I do have some suggestions and comments below.

Installation

I did not find any instructions for installation. I recommend that the authors add an "Installation" section to the README or documentation (see below), which could either contain the necessary information or link to an INSTALL file that describes how to install the package. For an example: https://github.com/adrn/gala#installation-and-dependencies

(I was able to pip install pytreegrav, which successfully installed a wheel)

Functionality

I successfully ran the walkthrough code on my machine after installing the package.

Performance

I have verified the scaling claims (Figure 1 in JOSS article) on my machine (MacBook Pro laptop).

Documentation

The documentation is in the form of a section of the repository README. My main comment would be to separate the documentation from the README. At maximum, switch to using a documentation engine like Sphinx or MkDocs built and served on a service like Readthedocs, and link to this documentation from your README. At minimum, I would recommend making a docs/ directory, moving your current README.ipynb to docs/walkthrough.ipynb, and link to this from the README.txt file. I would also recommend removing the code/walkthrough from the README.txt itself and instead link to the walkthrough IPython notebook (so you avoid duplicating that text/code).

Other comments:

I don't see any mention of what profile is used to soften the point mass potentials when h is provided. Is there a standard profile to use (i.e. Plummer?), or is this configurable? It would be good to explicitly state this in the documentation.
As mentioned above, at minimum, please add an "Installation" section to the README that links to a file that contains installation instructions (even if it just states explicitly that the way to install is with pip)
Some users might find it useful to have API documentation, so it's worth considering whether you want to write this yourself or build it automatically with an automatic documentation build system.
Please add some documentation of how a user can run tests of the package. It looks like there is a minimal test in tests/test.py, but I would recommend switching to a pytest or nosetests-compatible test layout.
It looks like there is an associated github pages build of the README: Please link to this page (maybe with a badge link?) at the top of your README, as this serves as your main documentation.
Please add some documentation of how other users can contribute or engage with the developers. In particular, how do users 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support. An example statement: http://gala.adrian.pw/en/latest/contributing.html

General comments

Many repositories on GitHub (including GitHub itself) are switching from a default branch named "master" to "main" (see, e.g., https://github.com/github/renaming). You might want to consider renaming the default branch (but it's up to you!).
My understanding from reading the README and walkthrough is that the primary use case for this code is for re-analyzing simulation output (i.e. not for users who actually want to run simulations, as the code operates at the Python level, using numba to accelerate things). It might be worth having a few scientific case studies in the walkthrough that put the utilities in this package in context of some "real world" examples.
Though the details of code style can be subjective, Python packages generally at least follow the convention (laid out in PEP 8) that class names follow the CapWords convention and function names are lower case. Please consider reformatting the functions in this package to follow PEP8 guidelines.

pykdgrav import potential fail

Hi,

Apparently, pykdgrav requires a non-standard module,
numba, which is not automatically installed when installing
pykdgrav via pip, e.g.:

pip install pykdgrav

Unfortunately,

pip install numba

fails.

Any hints on how to get a full, working installation of pykdgrav
would be much appreciated.

Thor.

mikegrudic / pytreegrav Goto Github PK

pytreegrav's Introduction

Introduction

Installation

Walkthrough

What if I want to evaluate the fields at different points than where the particles are?

Ray-tracing

Community

pytreegrav's People

Contributors

Stargazers

Watchers

Forkers

pytreegrav's Issues

Problem

What I did & received

Script

Helps for you to read the script

Context

Environment

Installation

Functionality

Performance

Documentation

General comments

Recommend Projects

Recommend Topics

Recommend Org

Jobs