Comments (33)
Thank you for the links. I have started coding and may get it done soon. You are correct that It's only a little bit more complicated than SW. I found the 2007-JAP paper more accurate than the manual (your first link) in describing the dimensions of the parameters H, D, W, and gamma. So I will just follow the paper.
from gpumd.
Hi andephane,
Thanks for suggesting this. I just noticed your comments today. The trick you mentioned is very important. I think I may need to use two neighbor lists for this potential (or at least a two-level neighbor list), one with shorter cutoff (for the 3-body part) and one with larger cutoff (for the 2-body part). This requires re-designing some parts of GPUMD and will take some time. What materials are you studying?
Best,
Zheyong
from gpumd.
Thanks for your response. I'm currently working on silicon carbide, but I've also used the potential to nanoporous amorphous silica with and without water. Even without that trick, I believe that GPUMD may be faster than LAMMPS for smaller systems.
Would you be able to help me with a simple implementation so I could test this? For you it could be a matter of an hour or two since SW is already there.
from gpumd.
Yes, I am interested in adding this potential into GPUMD. I cannot make a promise when it will be done, as the holidays are coming, but I will do it as soon as possible. Do you think it is enough to read this paper:
P. Vashishta, R. K. Kalia, A. Nakano, J. P. Rino. J. Appl. Phys. 101, 103515 (2007)
from gpumd.
That should be sufficient. The modern formulation is the one mentioned in the LAMMPS docs:
http://lammps.sandia.gov/doc/pair_vashishta.html (probably the same as in that paper).
with our reference implementation that may be good to have: https://github.com/lammps/lammps/blob/master/src/MANYBODY/pair_vashishta.cpp
Looking forward to it :) I'm happy to answer any questions you might have.
from gpumd.
Hi andeplane,
I have finished coding. How would you like to get the code? By email? I am not familiar with github and have never tried to use the provided version control tools. Previously, when I updated the code, I simply uploaded the modified files from my computer. However, for this Vashishta potential, I have not tested it fully and don't want to upload the files now. Perhaps you know how to do this professionally?
from gpumd.
From my test, the 3-body part only takes about 10% of the whole computation and the Vashishta potential is about 10 times more time consuming than the SW potential. With 8000 atoms, the speed is only about 2x10^6 atom*step/second using a Tesla K40. What's the performance of LAMMPS from your runs?
from gpumd.
After doing some tests, I feel that my implementation is correct. So I have updated the code. I also sent an email to your gmail with an example attached.
from gpumd.
Fantastic! I will test this sometime over new years. What system did you test? I haven't tried running the system that is in the paper.
If 3-body is only 10%, that is great. Typically it should be even less I think, but that depends on the system of course.
I'll give you benchmarks from LAMMPS later. Fantastic work!
from gpumd.
I have tested SiC in the zincblende structure with 8000 atoms in total. With this number of atoms, the performance of K40 is not saturated of course. It may reach 5x10^6 atom*step/second for larger systems. NPT ensemble at 300 K and 0 GPa. Neighbor list with a skin of 1 A and is updated when needed. Double precision. For your convenience, here are the new materials related to this potential:
-
src/
--vashishta.cu and vashishta.h: the major source codes for this potential
--A function in potential.cu which is used to read in parameters for this potential
--A Vashishta structure defined in common.h
--An added "case" in the "switch-case" construction in force.cu -
potentials/sw/
A potential file for SiC: sic_vashishta_2007.txt -
doc/
Section 4.7 of the manual contains the formulas and conventions I used.
From these, you can understand what need to be done to add a new potential model into GPUMD. It took me two days of hard work (not two hours). Look forward to your testing results. When you have time, we can compare the forces in the same structure (with some randomness) computed by GPUMD and LAMMPS. I hope the forces are the same.
from gpumd.
Ok, interesting. In LAMMPS with the GPU package on P100, I get 13.5e6 on a SiC nano particle, 9000 atoms.
Thanks a lot for your effort, I'll get back to you within a week or so :)
from gpumd.
Oh, I have not tested with P100. The speed of the LAMMPS version sounds very fast. Is the version on your github homepage?
from gpumd.
Hmm, your SW benchmarks etc seem MUCH faster than LAMMPS, and I think that there is an enormous overhead in kernel execution in LAMMPS. So I think it should be possible to achieve ~50 million atom-timesteps/second on P100 with vashishta.
The implementation in LAMMPS is here: https://github.com/lammps/lammps/tree/master/lib/gpu
(see vashishta.cu).
from gpumd.
I don't how much P100 is faster than K40, but it seems 50 million atomstep/second is hard to achieve for this potential even on P100. I have just tested that using single precision, I can get 5 million atomstep/second on K40. Not very impressive. Perhaps you can test GPUMD on P100?
from gpumd.
Hmm interesting. P100 should be ~2x on single precision and 3x on double precision compared to K40. I will test GPUMD on P100 (I have for SW previously) when I'm back to work in January =)
from gpumd.
Single precision on P100:
INFO: Speed of this run = 1.50843e+07 atom*step/second.
With LAMMPS (also SP) I get 7.3e+06, so GPUMD is 2x as fast
from gpumd.
By using 30x30x30 unit cells (512k atoms) and 7.35Å cutoff, I get 2.37544e+07 atom*step/second with GPUMD.
from gpumd.
I was travelling during the last few days. So are you satisfied or not with the current performance of GPUMD for the Vashishta potential? Perhaps there is some room for improvement? As this potential is much more expensive than the SW potential (perhaps 10 times more), the overhead in LAMMPS as you mentioned might not be as important as in the case of the SW potential? Do you want to use GPUMD to do simulations with this potential? If so, you might need to first validate the implementation by comparing forces directly with LAMMPS. Also note that the functionalities of GPUMD are very limited. I am still experimenting with some new features such as the MTK integrator for the NPT ensemble.
from gpumd.
I think there is room for improvement, but I'm not sure how to figure that out, hehe. How do you typically profile such applications?
I might use GPUMD with it and would of course compare forces and results carefully before any production run. Most of our simulations on GPUs I've done so far is just to reach long times in NVT. GPUMD would be sufficient for this =)
One reason why I believe there is more to gain (but not sure) is that LAMMPS GPU implementation is still very sensitive to CPU speed. I get a rather large increase in speed when I simulate on a system with a powerful CPU.
from gpumd.
I usually first check whether or not the number of registers in the force evaluation kernels can be reduced to some critical values. I may have used too many registers such that the occupancy is not optimal. The parameter BLOCK_SIZE_VASHISHTA was set to 64 (any multiple of 32 is allowed as there is no binary reduction in the force evaluation kernels) and it may not be optimal. You can also try to switch on -DUSE_LDG (check line 75 in common.h) in the makefile. This might result in a few percent gain of performance in some cases (but sometimes it does not from my experiences). You may have noticed that I have already avoided using the expensive pow() function as all the eta parameters from the papers I have read are integers.
You can also time the individual force evaluation kernels in GPUMD and LAMMPS. The simple tool nvprof is enough for doing this. The performance difference between GPUMD and LAMMPS running with this potential might be resulted from the CPU part based on your descriptions. In GPUMD, Nearly all the calculations are done on the GPU. Only the initialization is done on the CPU.
from gpumd.
Yes I did see that the pow is gone (I did the same trick in one of my implementations for LAMMPS), very nice! I will check with nvprof with both LAMMPS and GPUMD.
By the way, with my i7-6950x CPU, I get 20e6 atom timesteps/second (14400 atoms) with pure silicon (diamond structure) with stillinger weber on 10 cores. This is more than LAMMPS can do with P100 GPU. So when a CPU is able to do that, I'm pretty sure GPU's can do more :D
from gpumd.
One way to further accelerate this potential is to tabulate the two-body part (using e.g., cubic splines) because the two-body part is the hot spot (perhaps takes up ~90% of the whole computation time). Recently, I have done some experiments on the spline-based EAM potential and gained some experience. I will try this when I have time. Have you checked whether or not the current implementation is consistent with LAMMPS?
from gpumd.
Hi, I was on the run when I saw this and totally forgot to answer, sorry about that.
That's very cool! I have testet a simple linear interpolation on the CPU version (vashishta/table in LAMMPS) which gives a nice speedup. I assume we'd also get nice speedups on the GPU although memory reads could be more costly on the GPU?
I haven't compared with LAMMPS yet, but nothing I've seen so far scares me :)
from gpumd.
Hi, I have tried the linear interpolation as you did in pair_vashishta_table.cpp for LAMMPS. Unlike the 3-5X speedup for the CPU version (reported in the LAMMPS manual), I only got at most 2X improvement for my GPU version. The performance depends on the table length N. From my test, using __ldg() improves the performance a lot. So it's better to compile with the choice -DUSE_LDG when using this potential. Perhaps you can try to further optimize it?
from gpumd.
I have uploaded files related to the tabulated Vashishta potential. Using a table length of N=20000, there is about a 2X speedup. The relative difference of the force compared to the analytical case is of the order of 1.0e-5.
If you want to do some tests, you can start with examples/ex5, where this tabulated potential is used to calculate the phonon density of states of beta-SiC. It takes about 6 min in K40.
from gpumd.
I noticed that you made the linear interpolation in terms of r^2 instead of r. I guess the purpose was to avoid computing r = sqrt(r^2)? It seems that this extra computation is not much for the GPU implementation (where memory reads consume more resources). I changed to work with r instead of r^2 in GPUMD and it turned out that the accuracy of the linear interpolation improved a lot (the errors of force are reduced by 5 times).
from gpumd.
Hi,
I have further improved the performance of the analytical and tabulated Vashishta potentials. I think I have tried my best to optimize them.
from gpumd.
Very cool! I'm currently busy finishing up some papers, so I won't have time to look at this yet, but i definitely will! I'll run the benchmarks soon anyway. Thanks, and awesome work :)
from gpumd.
And yeah, I made it linear in rsq due to the square root. On GPU's the square root is more or less free!
Btw, what memory layout did you use for tabulation? I know that storing them as texture is much faster due to random access pattern. This is how positions are stored in the GPU package and KOKKOS package.
from gpumd.
There is a compiling option -DUSE_LDG in GPUMD.
When this is on, some global memory accesses use the __ldg() intrinsic, which is similar to texture. I have used this for the tabulated data. It is true that for this potential, using __ldg() is much faster.
However, using __ldg() makes some other short-ranged many-body potentials (such as Tersoff) somewhat slower. I may perform more thorough tests in the future and make the best choices for the users automatically.
BTW, I have checked (using some initial structures with some randomness in positions) that the forces computed by GPUMD are consistent with those by LAMMPS. The agreement is up to 1.0e-4, and the difference should be caused by something like unit conversions with slightly different physical constants, very different computation orders, ect.
from gpumd.
Ahh you mentioned that above. I just tried single precision with USE_LDG, and get no difference in running with or without table. Maybe this effect is more important for double precision?
Cool that the forces are consistent! Well done :)
from gpumd.
Yes, it seems that the performance depends on many factors. Even though single precision is faster, I still used double precision in my published works. So I usually do not care much about the case of single precision :-)
from gpumd.
Now I have merged the analytical and the tabulated versions for this potential (without affecting the inputs and the outputs). I am satisfied with the implementation so I close this issue now.
from gpumd.
Related Issues (20)
- AEMD HOT 9
- dump_observer average integration test failing HOT 8
- Implement SNES in GPU
- Header needs update in the dipole moment example file HOT 6
- full batch for energy training HOT 2
- cutoff distance in NEP.txt files HOT 4
- ensemble npt and F_e HOT 4
- nep executable error - "no kernel image is available for execution on the device" HOT 10
- units of input stress in trian.xyz/test.xyz HOT 2
- cudaErrorIllegalAddress: an illegal memory access was encountered HOT 8
- add force due to external electric field HOT 1
- Generate seperate frames by dump_exyz HOT 1
- trajectory file "dump.xyz" fails to include initial model.xyz as a start. HOT 1
- The keyword "velocity" in "run.in" to initialize the velocities of the system atoms might be given a reference value by default. HOT 1
- Possible BUG of enhance sampling perfomed by gpumd+plumed HOT 4
- Questions about NEP descriptors,Why is there an extra factor(2) in the derivation? HOT 5
- Add output for stopping_power_loss in electron_stop HOT 1
- A BUG of compute_rdf HOT 1
- it would be better if GPUMD could consider constant potential. HOT 1
- the eigenvector. out output from the phono function module is a binary file. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpumd.