Comments (22)
I'm thinking whether we should use the old design, i.e. having host and device vector, but use universal vector to replace device vector. That way, we will get the benefit of being able to run the algorithm on the host or on the device depending on the workload (as the device vector is now universal), but not suffer from page fault by having a host vector for most of the host operations.
I think I will try to implement that and compare their performance.
from manifold.
Ooh, that does look promising! I had always thought of CUB as being CUDA-specific and so not usable with other backends, but from the description that looks like just what we need. Thrust implements its CUDA backend with CUB, so I trust it to be as performant as possible.
from manifold.
I did a simple patch to try universal vector, perhaps my implementation is incorrect, but it seems that the performance is not great (actually a lot slower for CUDA). You can have a look at it here: pca006132@933eade
For CPP backend, it has a bit of performance improvement, probably due to less memory copying. For OMP backend, the performance is worse and I don't know why. For TBB backend, there is also a bit of performance improvement (and better than the OMP backend). Apart from performance improvement, it uses less memory as there is only one vector now.
from manifold.
Did some debugging about this, it seems that the reason is having a lot of GPU page faults and causing this slowness (https://stackoverflow.com/questions/39782746/why-is-nvidia-pascal-gpus-slow-on-running-cuda-kernels-when-using-cudamallocmana/40011988#40011988). I tried doing prefetch but it does not work, perhaps I'm getting the wrong pointer or something. I think I will just leave it later.
from manifold.
filed an issue: NVIDIA/cccl#809
from manifold.
Tried to work around this problem by using a little cache and kind of succeeded: Got most of the tests passed on CUDA except the knot example, with significant performance improvement and perfTest no longer causes OOM on my machine. There is also some performance improvement for the CPP and OMP backend, as well as using less memory. All tests passed for other backends.
However I have no idea about the problem with the knot example. I can clean up the changes and open a PR if you are interested in it, or I can wait for thrust to fix the performance issue so no workaround will be needed.
from manifold.
Definitely interested! And let's not wait for thrust; that could easily be a long wait.
from manifold.
This also allows us to do dynamic backend: we can choose to run the algorithm on the host or on the device, and can run on the CPU when there is no GPU on the target machine (without recompiling with another backendk, not yet tested). A prototype implementation is here: https://github.com/pca006132/manifold/blob/dynamic-backend/utilities/include/par.h
I got some performance improvement for CUDA by running small operations on the host, although not much. Slowness due to slow malloc managed memory is a huge problem.
from manifold.
Wow, that's pretty slick! I really like the idea of one compiled version automatically working for both CPU and GPU. Can you show a few numbers regarding the performance hit you're seeing from managed memory?
from manifold.
====== CUDA, Host Device Vector =======
nTri = 512, time = 0.00396219 sec
nTri = 2048, time = 0.00530954 sec
nTri = 8192, time = 0.00963976 sec
nTri = 32768, time = 0.0254835 sec
====== CUDA, Unified Memory =======
nTri = 512, time = 0.0123526 sec
nTri = 2048, time = 0.0141026 sec
nTri = 8192, time = 0.0195818 sec
nTri = 32768, time = 0.0374303 sec
And the tests run significantly slower.
I tried profiling it, it seems that there is a lot of page fault. malloc is actually very quick. IIRC the suggested workaround for this is to do prefetching, but it might not be easy to do prefetching without making the code very complicated.
from manifold.
Using the old design, the performance is something like this:
nTri = 512, time = 0.00869293 sec
nTri = 2048, time = 0.0102029 sec
nTri = 8192, time = 0.0150084 sec
nTri = 32768, time = 0.0343637 sec
nTri = 131072, time = 0.107823 sec
nTri = 524288, time = 0.367812 sec
nTri = 2097152, time = 1.34327 sec
nTri = 8388608, time = 5.8539 sec
Command being timed: "./perfTest"
User time (seconds): 8.11
System time (seconds): 2.70
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.83
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4777648
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 24
Minor (reclaiming a frame) page faults: 2297348
Voluntary context switches: 195
Involuntary context switches: 33
Swaps: 0
File system inputs: 3488
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The performance for CPP and OMP is similar: a bit worse than before.
With a lot less page faults, but the performance is slightly worse than the original version (although it will not OOM when GPU memory is not sufficient). Maybe the best approach would be to allow read/write on the universal vector based on workload? e.g. act directly on the universal vector if there are not much read/write, and duplicate the vector to the host if we need to perform batch processing, But this might be a bit too complicated.
from manifold.
And if I'm not mistaken, it's slower for smaller problems and faster for larger ones? Seems like a good trade to avoid OOM. I wonder if some of this is due to how/where the vector gets initialized?
from manifold.
Indeed, I guess this is the typical tradeoff we get when using a GPU. I think the page fault is due to how CUDA migrates unified memory (at least in default mode): it will migrate to the device when that device touches the memory. I guess the page fault I got was due to frequent switching between GPU and CPU for small buffers, and page fault after the unified memory migrates to another device. Will see if using CPU for smaller problems can eliminate this issue.
from manifold.
Did some profiling and performance tuning, by making small operations running on the CPU + using cudaMemAdvise
to ask cuda to put the unified memory on the CPU for small buffers, I got the following results:
nTri = 512, time = 0.00757743 sec
nTri = 2048, time = 0.0128708 sec
nTri = 8192, time = 0.0375996 sec
nTri = 32768, time = 0.0620162 sec
nTri = 131072, time = 0.266587 sec
nTri = 524288, time = 0.437117 sec
nTri = 2097152, time = 1.76358 sec
nTri = 8388608, time = 3.81932 sec
... which is still suboptimal. Did some profiling with nvprof
, it seems that initialization of universal vectors will call uninitialized_fill_n
on the GPU, which causes page faults + synchronization overhead + migrates the universal vector to the GPU, and then I have to move it back to the CPU... so nTri = 512 took 7ms.
GPU trace (grep page faults
):
282.80ms 172.93us - - - - - - - - - NVIDIA GeForce - - 24 0x7f5a72000000 [Unified Memory GPU page faults]
283.08ms - - - - - - - - - - - - - PC 0x47f393 0x7f5a72000000 [Unified Memory CPU page faults]
283.34ms 62.143us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a72000000 [Unified Memory GPU page faults]
283.41ms - - - - - - - - - - - - - PC 0x4975db 0x7f5a72000000 [Unified Memory CPU page faults]
283.61ms 52.448us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a72000000 [Unified Memory GPU page faults]
283.95ms - - - - - - - - - - - - - PC 0x540b69 0x7f5a72001000 [Unified Memory CPU page faults]
284.04ms 86.623us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a72001000 [Unified Memory GPU page faults]
284.15ms - - - - - - - - - - - - - PC 0xaac9b7db 0x7f5a72001000 [Unified Memory CPU page faults]
284.30ms 46.176us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a72001000 [Unified Memory GPU page faults]
284.55ms - - - - - - - - - - - - - PC 0x4f18a0 0x7f5a72002000 [Unified Memory CPU page faults]
284.63ms 71.008us - - - - - - - - - NVIDIA GeForce - - 21 0x7f5a72009000 [Unified Memory GPU page faults]
284.72ms - - - - - - - - - - - - - PC 0x4f1cea 0x7f5a72009000 [Unified Memory CPU page faults]
284.80ms 38.943us - - - - - - - - - NVIDIA GeForce - - 24 0x7f5a7200a000 [Unified Memory GPU page faults]
284.85ms 55.264us - - - - - - - - - NVIDIA GeForce - - 27 0x7f5a72010000 [Unified Memory GPU page faults]
284.96ms - - - - - - - - - - - - - PC 0x47f3bf 0x7f5a72010000 [Unified Memory CPU page faults]
285.02ms - - - - - - - - - - - - - PC 0x47f3f1 0x7f5a7200b000 [Unified Memory CPU page faults]
285.22ms 55.648us - - - - - - - - - NVIDIA GeForce - - 9 0x7f5a72009000 [Unified Memory GPU page faults]
285.28ms - - - - - - - - - - - - - PC 0x4c5dff 0x7f5a72009000 [Unified Memory CPU page faults]
285.36ms 46.368us - - - - - - - - - NVIDIA GeForce - - 9 0x7f5a72009000 [Unified Memory GPU page faults]
285.42ms - - - - - - - - - - - - - PC 0x4b3450 0x7f5a7200a000 [Unified Memory CPU page faults]
285.53ms - - - - - - - - - - - - - PC 0x4c6796 0x7f5a72002000 [Unified Memory CPU page faults]
285.58ms - - - - - - - - - - - - - PC 0x4c6937 0x7f5a72005000 [Unified Memory CPU page faults]
285.63ms - - - - - - - - - - - - - PC 0x4c67a8 0x7f5a72003000 [Unified Memory CPU page faults]
285.75ms 58.656us - - - - - - - - - NVIDIA GeForce - - 8 0x7f5a72010000 [Unified Memory GPU page faults]
285.87ms 48.800us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a7200a000 [Unified Memory GPU page faults]
285.94ms - - - - - - - - - - - - - PC 0x4c6fc4 0x7f5a72010000 [Unified Memory CPU page faults]
285.98ms - - - - - - - - - - - - - PC 0x4c703b 0x7f5a72011000 [Unified Memory CPU page faults]
286.00ms - - - - - - - - - - - - - PC 0x4c703b 0x7f5a72014000 [Unified Memory CPU page faults]
286.02ms - - - - - - - - - - - - - PC 0x4c6fc4 0x7f5a72016000 [Unified Memory CPU page faults]
286.15ms 59.295us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a7200b000 [Unified Memory GPU page faults]
286.25ms - - - - - - - - - - - - - PC 0x480d42 0x7f5a7200b000 [Unified Memory CPU page faults]
286.32ms 73.055us - - - - - - - - - NVIDIA GeForce - - 15 0x7f5a72017000 [Unified Memory GPU page faults]
286.41ms 56.928us - - - - - - - - - NVIDIA GeForce - - 11 0x7f5a7200b000 [Unified Memory GPU page faults]
286.49ms - - - - - - - - - - - - - PC 0x540b69 0x7f5a7200c000 [Unified Memory CPU page faults]
286.54ms - - - - - - - - - - - - - PC 0x5429a0 0x7f5a72016000 [Unified Memory CPU page faults]
286.72ms - - - - - - - - - - - - - PC 0x46c260 0x7f5a72006000 [Unified Memory CPU page faults]
286.82ms 81.759us - - - - - - - - - NVIDIA GeForce - - 16 0x7f5a72010000 [Unified Memory GPU page faults]
286.90ms 18.048us - - - - - - - - - NVIDIA GeForce - - 6 0x7f5a7201d000 [Unified Memory GPU page faults]
286.92ms 16.480us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a72016000 [Unified Memory GPU page faults]
286.97ms 47.744us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a7200d000 [Unified Memory GPU page faults]
287.01ms 14.464us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a7200e000 [Unified Memory GPU page faults]
287.06ms 48.768us - - - - - - - - - NVIDIA GeForce - - 3 0x7f5a7200b000 [Unified Memory GPU page faults]
287.14ms 56.576us - - - - - - - - - NVIDIA GeForce - - 34 0x7f5a72005000 [Unified Memory GPU page faults]
287.19ms 9.6000us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a72002000 [Unified Memory GPU page faults]
287.20ms 8.8960us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a72003000 [Unified Memory GPU page faults]
287.25ms 28.320us - - - - - - - - - NVIDIA GeForce - - 9 0x7f5a72017000 [Unified Memory GPU page faults]
287.27ms 28.576us - - - - - - - - - NVIDIA GeForce - - 14 0x7f5a72018000 [Unified Memory GPU page faults]
287.34ms 8.8640us - - - - - - - - - NVIDIA GeForce - - 9 0x7f5a7200c000 [Unified Memory GPU page faults]
287.39ms - - - - - - - - - - - - - PC 0x46b98d 0x7f5a72003000 [Unified Memory CPU page faults]
287.45ms - - - - - - - - - - - - - PC 0x46b719 0x7f5a72004000 [Unified Memory CPU page faults]
287.50ms - - - - - - - - - - - - - PC 0x47ff9d 0x7f5a7200e000 [Unified Memory CPU page faults]
287.55ms - - - - - - - - - - - - - PC 0x47ffa6 0x7f5a7200f000 [Unified Memory CPU page faults]
287.57ms - - - - - - - - - - - - - PC 0x542be5 0x7f5a72022000 [Unified Memory CPU page faults]
287.59ms - - - - - - - - - - - - - PC 0x542c06 0x7f5a72023000 [Unified Memory CPU page faults]
287.66ms - - - - - - - - - - - - - PC 0x48785b 0x7f5a72010000 [Unified Memory CPU page faults]
287.67ms - - - - - - - - - - - - - PC 0x487843 0x7f5a7202a000 [Unified Memory CPU page faults]
287.69ms - - - - - - - - - - - - - PC 0x48785b 0x7f5a72011000 [Unified Memory CPU page faults]
287.70ms - - - - - - - - - - - - - PC 0x487866 0x7f5a7202b000 [Unified Memory CPU page faults]
287.72ms - - - - - - - - - - - - - PC 0x48785b 0x7f5a72014000 [Unified Memory CPU page faults]
287.77ms 15.456us - - - - - - - - - NVIDIA GeForce - - 26 0x7f5a7202f000 [Unified Memory GPU page faults]
287.82ms - - - - - - - - - - - - - PC 0x481367 0x7f5a72030000 [Unified Memory CPU page faults]
287.86ms 10.400us - - - - - - - - - NVIDIA GeForce - - 8 0x7f5a72033000 [Unified Memory GPU page faults]
287.92ms - - - - - - - - - - - - - PC 0x5433eb 0x7f5a72029000 [Unified Memory CPU page faults]
287.94ms - - - - - - - - - - - - - PC 0x5437a8 0x7f5a72036000 [Unified Memory CPU page faults]
287.98ms - - - - - - - - - - - - - PC 0x5437f1 0x7f5a72037000 [Unified Memory CPU page faults]
288.05ms 14.720us - - - - - - - - - NVIDIA GeForce - - 8 0x7f5a72010000 [Unified Memory GPU page faults]
288.09ms 15.648us - - - - - - - - - NVIDIA GeForce - - 4 0x7f5a72036000 [Unified Memory GPU page faults]
288.11ms 7.1030us - - - - - - - - - NVIDIA GeForce - - 6 0x7f5a72037000 [Unified Memory GPU page faults]
288.13ms - - - - - - - - - - - - - PC 0x46e280 0x7f5a72010000 [Unified Memory CPU page faults]
288.30ms 9.9200us - - - - - - - - - NVIDIA GeForce - - 4 0x7f5a72010000 [Unified Memory GPU page faults]
288.42ms 49.887us - - - - - - - - - NVIDIA GeForce - - 20 0x7f5a7202b000 [Unified Memory GPU page faults]
288.47ms 12.384us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a7202a000 [Unified Memory GPU page faults]
288.50ms - - - - - - - - - - - - - PC 0x487843 0x7f5a7202a000 [Unified Memory CPU page faults]
288.52ms - - - - - - - - - - - - - PC 0x487866 0x7f5a7202b000 [Unified Memory CPU page faults]
288.57ms 54.912us - - - - - - - - - NVIDIA GeForce - - 24 0x7f5a72030000 [Unified Memory GPU page faults]
288.65ms - - - - - - - - - - - - - PC 0x481367 0x7f5a72030000 [Unified Memory CPU page faults]
288.72ms 56.352us - - - - - - - - - NVIDIA GeForce - - 30 0x7f5a72038000 [Unified Memory GPU page faults]
288.81ms - - - - - - - - - - - - - PC 0x543797 0x7f5a72034000 [Unified Memory CPU page faults]
288.91ms 47.200us - - - - - - - - - NVIDIA GeForce - - 8 0x7f5a72034000 [Unified Memory GPU page faults]
289.00ms - - - - - - - - - - - - - PC 0x46e280 0x7f5a7203a000 [Unified Memory CPU page faults]
289.16ms 46.624us - - - - - - - - - NVIDIA GeForce - - 4 0x7f5a7203a000 [Unified Memory GPU page faults]
289.36ms - - - - - - - - - - - - - PC 0x543c23 0x7f5a72035000 [Unified Memory CPU page faults]
289.41ms - - - - - - - - - - - - - PC 0x543c23 0x7f5a72036000 [Unified Memory CPU page faults]
289.66ms 51.871us - - - - - - - - - NVIDIA GeForce - - 4 0x7f5a72035000 [Unified Memory GPU page faults]
289.71ms 9.9210us - - - - - - - - - NVIDIA GeForce - - 1 0x7f5a72036000 [Unified Memory GPU page faults]
289.87ms - - - - - - - - - - - - - PC 0x543c23 0x7f5a72038000 [Unified Memory CPU page faults]
290.05ms 51.615us - - - - - - - - - NVIDIA GeForce - - 4 0x7f5a72038000 [Unified Memory GPU page faults]
290.19ms 57.823us - - - - - - - - - NVIDIA GeForce - - 18 0x7f5a7202b000 [Unified Memory GPU page faults]
290.25ms 10.144us - - - - - - - - - NVIDIA GeForce - - 4 0x7f5a7202a000 [Unified Memory GPU page faults]
290.29ms - - - - - - - - - - - - - PC 0x50e25e 0x7f5a7202a000 [Unified Memory CPU page faults]
290.35ms - - - - - - - - - - - - - PC 0x50e286 0x7f5a7202d000 [Unified Memory CPU page faults]
290.40ms - - - - - - - - - - - - - PC 0x50e2aa 0x7f5a7202e000 [Unified Memory CPU page faults]
290.51ms 44.287us - - - - - - - - - NVIDIA GeForce - - 2 0x7f5a7202a000 [Unified Memory GPU page faults]
290.84ms 47.808us - - - - - - - - - NVIDIA GeForce - - 8 0x7f5a72030000 [Unified Memory GPU page faults]
290.92ms 39.999us - - - - - - - - - NVIDIA GeForce - - 27 0x7f5a72040000 [Unified Memory GPU page faults]
290.97ms - - - - - - - - - - - - - PC 0x50e862 0x7f5a72030000 [Unified Memory CPU page faults]
291.00ms - - - - - - - - - - - - - PC 0x50eaeb 0x7f5a7203b000 [Unified Memory CPU page faults]
291.21ms 69.056us - - - - - - - - - NVIDIA GeForce - - 75 0x7f5a7203b000 [Unified Memory GPU page faults]
291.27ms 21.087us - - - - - - - - - NVIDIA GeForce - - 20 0x7f5a72030000 [Unified Memory GPU page faults]
292.38ms - - - - - - - - - - - - - PC 0x4fa31b 0x7f5a72032000 [Unified Memory CPU page faults]
292.43ms - - - - - - - - - - - - - PC 0x4fa31e 0x7f5a72039000 [Unified Memory CPU page faults]
292.64ms 39.551us - - - - - - - - - NVIDIA GeForce - - 18 0x7f5a72039000 [Unified Memory GPU page faults]
294.17ms - - - - - - - - - - - - - PC 0x5204df 0x7f5a72042000 [Unified Memory CPU page faults]
294.35ms 52.224us - - - - - - - - - NVIDIA GeForce - - 20 0x7f5a72047000 [Unified Memory GPU page faults]
294.49ms 50.176us - - - - - - - - - NVIDIA GeForce - - 9 0x7f5a72032000 [Unified Memory GPU page faults]
294.57ms - - - - - - - - - - - - - PC 0x531d6f 0x7f5a72006000 [Unified Memory CPU page faults]
294.62ms - - - - - - - - - - - - - PC 0x531e1e 0x7f5a72032000 [Unified Memory CPU page faults]
294.68ms - - - - - - - - - - - - - PC 0x5320fa 0x7f5a72048000 [Unified Memory CPU page faults]
294.72ms 67.648us - - - - - - - - - NVIDIA GeForce - - 10 0x7f5a72048000 [Unified Memory GPU page faults]
294.89ms 36.896us - - - - - - - - - NVIDIA GeForce - - 13 0x7f5a72032000 [Unified Memory GPU page faults]
294.98ms - - - - - - - - - - - - - PC 0x53b8b9 0x7f5a7204a000 [Unified Memory CPU page faults]
295.03ms - - - - - - - - - - - - - PC 0x53b824 0x7f5a72048000 [Unified Memory CPU page faults]
295.26ms 60.448us - - - - - - - - - NVIDIA GeForce - - 8 0x7f5a7204b000 [Unified Memory GPU page faults]
295.46ms - - - - - - - - - - - - - PC 0x47f43e 0x7f5a7204c000 [Unified Memory CPU page faults]
295.49ms - - - - - - - - - - - - - PC 0x47f3a0 0x7f5a72050000 [Unified Memory CPU page faults]
296.00ms 74.271us - - - - - - - - - NVIDIA GeForce - - 21 0x7f5a72054000 [Unified Memory GPU page faults]
296.10ms - - - - - - - - - - - - - PC 0x4c6796 0x7f5a72054000 [Unified Memory CPU page faults]
296.27ms 71.488us - - - - - - - - - NVIDIA GeForce - - 25 0x7f5a72059000 [Unified Memory GPU page faults]
296.60ms 51.072us - - - - - - - - - NVIDIA GeForce - - 32 0x7f5a72060000 [Unified Memory GPU page faults]
296.65ms - - - - - - - - - - - - - PC 0x4c6fc4 0x7f5a72058000 [Unified Memory CPU page faults]
296.71ms - - - - - - - - - - - - - PC 0x4c6f7d 0x7f5a72051000 [Unified Memory CPU page faults]
296.73ms - - - - - - - - - - - - - PC 0x4c6fc4 0x7f5a7205d000 [Unified Memory CPU page faults]
296.75ms - - - - - - - - - - - - - PC 0x4c6fc4 0x7f5a72060000 [Unified Memory CPU page faults]
296.96ms 78.880us - - - - - - - - - NVIDIA GeForce - - 14 0x7f5a7204c000 [Unified Memory GPU page faults]
297.09ms 47.039us - - - - - - - - - NVIDIA GeForce - - 9 0x7f5a72041000 [Unified Memory GPU page faults]
297.19ms - - - - - - - - - - - - - PC 0x540b69 0x7f5a72041000 [Unified Memory CPU page faults]
297.22ms - - - - - - - - - - - - - PC 0x542998 0x7f5a72054000 [Unified Memory CPU page faults]
297.27ms - - - - - - - - - - - - - PC 0x542998 0x7f5a72055000 [Unified Memory CPU page faults]
Actually, for our use case, we just need a fixed size universal memory buffer. Wondering if we should just roll our own vector for this.
from manifold.
Well, the longest run time is still coming down, so that's a good sign at least. You're welcome to roll your own; you seem pretty deep in the guts of CUDA already. Perhaps having beginD
and beginH
would be useful for indicating the prefetch direction? I trust you to find a good solution.
from manifold.
Yeah I did not plan to dig into such implementation details 2 days ago. Prefetch cannont help in this case because the initializer of the universal vector will do the uninitialized fill, and that will cause GPU page faults and subsequence CPU page faults due to memory migration. I cannot insert prefetch before the uninitialized fill so can't really deal with that.
I think I probably should file an issue to thrust regarding this as well, universal vectors are supposed to be used on the cpu and gpu, and this will clearly cause performance problems in that case. (mainly for small vectors I guess)
from manifold.
Sounds good. Anyway, you've averted a bunch of GPU OOM problems, while cutting down the run time of large problems. Even at the cost of some reduced performance on small problems, this still seems good to merge. What do you think?
from manifold.
Yes should be good to merge, I think I can work on the performance for small problems on later PRs.
from manifold.
FYI: using a custom vector implementation get the time to something like this:
nTri = 512, time = 0.00285064 sec
nTri = 2048, time = 0.00649843 sec
nTri = 8192, time = 0.00892476 sec
nTri = 32768, time = 0.0603074 sec
nTri = 131072, time = 0.124639 sec
nTri = 524288, time = 0.383436 sec
nTri = 2097152, time = 1.26768 sec
nTri = 8388608, time = 5.03177 sec
Not sure why complicated problems are now slower, I guess this is probably related to how I do prefetching (did not fine tune it, and seems pretty hard to fine tune anyway).
However, there are quite a few tests failures. They already failed when using the universal vector, so I guess there is some problem in implementing the dynamic backend feature. I will try to fix that and submit a PR that includes these two features (custom vector + dynamic backend)
from manifold.
That still looks great, thanks! Might it help to do just the custom vector first and then do the dynamic backend as a follow-on? Always nice to break up the PRs if we can, especially if the later one is triggering the debug.
from manifold.
That result is with dynamic backend enabled, I have to try putting it back to the master and see it it works and still performs this well. The problem is that the heuristics I wrote to determine where to put the vector is dependent on the dynamic backend parameters: if the vector size is less than X, put it in the cpu, and gpu otherwise. Will clean it up and open a PR tmr if everything works fine.
from manifold.
Tuning prefetch is harder than I thought... the results are not very consistent. I guess I will just leave it as is and work on other items first.
from manifold.
Related Issues (20)
- Python created object reports as non-manifold. HOT 3
- Watertightness of Mesh with an Edge Shared by 4 Faces
- vertex halfedge iterator
- Manifold 2.4.5 release tar.gz is incomplete HOT 3
- Vec out of Range HOT 8
- Python binding needs two import call HOT 4
- Manifold Decompose doesn't preserve vertex properties HOT 4
- memory leak when TBB and PSTL is enabled HOT 27
- Triangulate bug: Two separate polygons HOT 5
- [Question] robust geometric predicates, polygon triangulation
- Warning comparison of integer expressions of different signedness
- Modularize Manifold HOT 17
- Build without exceptions HOT 1
- Remove Thrust HOT 19
- How to figure out required size of mem in the C-API? HOT 1
- Crash in Project() HOT 4
- gcc14 build failure HOT 7
- Triangulation issue: Zebra HOT 3
- BSD compiler error HOT 1
- Another Zebra Triangulation issue HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from manifold.