Comments (6)
If I remember correctly (caveat: I might not be remembering correctly!) I ran into a similar "problem" when I first wrote the ray tracer in CUDA. The issue was that each thread needed to be doing the same things simultaneously. When rays reached their destination (e.g., disk or horizon) those individual threads would stop integrating (so there was a condition on whether or not to take another step, e.g., theta < pi/2, r > r_h) but not quit until all the threads had finished. Can't remember what I used as a global termination condition though (perhaps some ray getting to +"infinity"), or how I dealt with variable time steps. Hmm.
from gradus.jl.
I managed to get a sort of fix working with DiffEqGPU, enough to be able to get it to run... as it currently stands, the bottlenecking is highly problematic, and would require sophisticated batching in order to match similar geodesics together (even then I'm not convinced that would resolve the performance). There's also an issue I think I've found which means the DiffEqGPU methods aren't type stable...
In the time it takes for one of the threads to finish a particularly difficult geodesics, and block the warp from terminating, the CPU methods have already churned through a few thousand more.
CPU: 0.041316 seconds (193.04 k allocations: 22.450 MiB)
GPU: 0.755816 seconds (1.25 M allocations: 110.112 MiB)
This isn't a rigorous benchmark, but indicative of the problem we'd be facing. I think for now just chucking more CPU cores at the problem is probably fine, and we'll save the GPU for doing the analysis. With the render caches, we'll probably end up performing more analysis than rendering anyway, so I think this is a bit of a dead end still.
Shame really, as this is one of those embarassingly parallel problems you'd really expect the GPU to excell at :/
from gradus.jl.
Yes, throwing more CPUs at the problem seems like the best way to go for now. I agree that it feels like GPUs should be fantastic for this, but perhaps it isn't quite simple enough given that the behaviour of different rays can be qualitatively different.
from gradus.jl.
Spent a bit of time today checking out how the new GPUTsit5
integrator performs compared to the multi-threaded CPU method.
There's a number of caveats with the GPU solver, namely that the adaptive step size just seems to fall apart on this problem, so fixed step size is required. These benchmarks are also with all callbacks removed, and boil down to:
m = TestMetric(Float32(1.0), Float32(0.0))
u = SVector{4,Float32}(0.0, 1000.0, π/2, 0.0)
t_domain = Float32.((0.0, 2000.0))
αs = range(6.0f0, 20.0f0, N)
ens_prob = make_problem(m, u, αs, t_domain)
cpu = @timed solve(
ens_prob,
Tsit5(),
EnsembleThreads(),
trajectories = length(αs),
save_everystep = false
);
gpu = @timed solve(
ens_prob,
GPUTsit5(),
EnsembleGPUKernel(),
trajectories = length(αs),
dt=1.0f0,
adaptive=false,
save_everystep = false
);
This is not quite a fair comparison, since they are using different algorithms, but there is no reason why we would want to use the fixed step size CPU solver, so this reflects the state of this issue as of today.
I obtain similar performance on the example problem in the DiffEqGPU.jl readme.
from gradus.jl.
Should benchmark just the geodesic_equation
function and not just the full solver to try and identify if this is an algorithmic overhead, or something more fundemental with our problem formulation.
from gradus.jl.
Using adaptive is much much faster on GPU (faster than CPU even) but from my tests this only works if the geodesics are sufficiently far from the central singularity, else it timesteps at less than 10^-14
and errors. But this is promising, and merrits trying to get the adaptive GPU working on the full domain.
from gradus.jl.
Related Issues (20)
- Thick discs should be parameters of cylindrical radius only
- Rename all equitorial projections and emission radii rho
- Tolerance in geometry intersection
- Transfer functions: specify number of gstar knots
- OpticallyThin and OpticallyThick are reverse
- Interpolation changed in DataInterpolations HOT 2
- Parallelize test suite HOT 1
- fov keyword in rendergeodesics HOT 1
- algorithm, plane keywords in lineprofile in gettingstarted file HOT 3
- Regression in getting-started?
- Redshift interpolation for non-symmetric spacetimes HOT 1
- Speed up transfer function integration HOT 3
- SpectralFitting.jl package extension HOT 1
- Transfer function IO
- Gradus.jl + SpectralFitting.jl = <3 HOT 5
- Fast disc emissivities HOT 1
- Thick disc transfer function calculations are slow and maybe erroneous HOT 2
- Auto-diff through transfer function interpolation and integration
- Regression: thick disc transfer functions
- Paying back technical debt
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gradus.jl.