Comments (4)
As we can see from the results, even using kpar can speed up the calculation of cg with less memory, the time cost is still much longer than that by dav. The speed up is about 10% for kpar is from 1 to 2 in cg.
from abacus-develop.
I also do the test on QE for example mp-1067451.
For CG method, the kpar will slow down the calculation, which seems strange.
For david method, the calculation is normal for kpar=2, which indicate the less memory cost in QE than ABACUS, and this is because the pw_diag_ndim is 4 in ABACUS and 2 in QE.
CG:
kpar ks_solver scf_time scf_steps normal_end ibzk \
qe/qe-nk1 None None 27244.7 2 False 8
qe/qe-nk2 None None 28781.4 2 False 8
qe/qe-nk4 None None 40043.1 2 False 8
scf_time_each_step
qe/qe-nk1 [21035.1, 6209.600000000002]
qe/qe-nk2 [22302.300000000003, 6479.0999999999985]
qe/qe-nk4 [31618.5, 8424.600000000002]
DAV:
kpar ks_solver scf_time scf_steps normal_end \
qe-dav-nk1/mp-1067451/00000 None None 4854.9 2 False
qe-dav-nk2/mp-1067451/00000 None None 4368.9 2 False
qe-dav-nk4/mp-1067451/00000 None None NaN 1 False
ibzk scf_time_each_step \
qe-dav-nk1/mp-1067451/00000 8 [3558.7000000000003, 1296.1999999999998]
qe-dav-nk2/mp-1067451/00000 8 [3181.3999999999996, 1187.5]
qe-dav-nk4/mp-1067451/00000 8 None
from abacus-develop.
I set the pw_daig_ndim to 2 in ABAUCS and do the kpar test on mp-1067451 with a larger machine c64_m520_cpu (mpi parallel with 32 cores).
Compared to kpar=1, kpar=2 can speed up the SCF calculation about 20%, while kpar=4 is slower than kpar=2.
It doesn't seem like the larger the kpar, the higher the efficiency.
As the kpar larger, the memory cost is larger. For this case, kpar=1/2 can finish the SCF/FORCE/STRESS calculation, but kpar=4/6 can only finish SCF/FORCE calculation, and the memory for STRESS calculation is larger than 520G.
The memory need by STRESS seems about 1.5 times to SCF/FORCE calculation.
kpar ks_solver scf_time scf_steps normal_end ibzk \
mp-1067451-new/00000 1 dav 5704.52 2.0 True 8
mp-1067451-new/00001 2 dav 4321.94 2.0 True 8
mp-1067451-new/00002 4 dav 4567.33 2.0 False 8
mp-1067451-new/00003 6 dav 4913.62 2.0 False 8
mp-1067451-new/00004 8 dav NaN NaN False 8
scf_time_each_step
mp-1067451-new/00000 [4708.34, 996.18]
mp-1067451-new/00001 [3541.35, 780.59]
mp-1067451-new/00002 [3726.31, 841.02]
mp-1067451-new/00003 [3997.89, 915.73]
mp-1067451-new/00004 None
from abacus-develop.
The performance may not be tested by one example, and the performance of c64_m520_cpu is unstable. I have rerun the mp-1067451-new/00002, and this time the time cost of first two SCF steps are 3427.07 and 764.52 s, which is faster than previous test.
from abacus-develop.
Related Issues (20)
- Need a function to get the global dimension of matrix in Parallel_2D
- large amount of irrelevant GPU memory allocation was observed when running GPU LCAO.
- Test: daily gnu-dav accuracy test failed at 20240620 HOT 2
- Bug: ABACUS `esolver_type lj` is not implemented correctly thus not practically usable
- diffierent output between v3.5.1(gnu) v3.6.5(intel) HOT 7
- Test: need to support reading the threshold from file in integrate test
- Failed clang-tidy/clang-format action might break the code HOT 5
- Why the charge file .cube still output when i set `out_chg=0`? HOT 2
- DM_R should be removed in class Gint_inout
- support giant basis calculation HOT 1
- SOC "Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)" error
- Replace the LOC.ParaV-> by orb_con.ParaV. in ESolver_KS_LCAO_TDFT
- ERROR when using ASE-ABACUS with different ABACUS version HOT 2
- Bug: `get_wf` seems not work properly
- Clarity of Code Changes in Pull Requests Compromised by Pre-Commit Formatting HOT 1
- Remove the dependency of IState_Charge on LOC
- Compile the serial ABACUS failed with RAPID_JSON=ON
- Code does not exit even when errors are encountered HOT 1
- out_pot does not work for nscf calculations
- (Possible) Bug: memory leak detected by clang-tidy in `read_wfc_nao.cpp` file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from abacus-develop.