Skip to content

Commit 3f7e7a4

Browse files
Flying-dragon-boxingzhangzh-pkuWHUweiqingzhoukirk0830haozhihan
authored
Fix: wrong name in cuda "lapack" interface (#6411)
* feat pexsi * fix : diag not completed * feat * feat: pexsi hsolver * CMake building implemented * Works * adapt to the new container * Turn off USE_PEXSI * Update LibRI to 553c91c * modify include files * namespace-ize * new inputs added * Configure Makefile Compiling, fix typos * Fix Makefile Intel toolchains compile errors * Fix even more PEXSI related Makefile compiling issues * Modify inputs and update to latest version (#2) * run INPUT.Default() in every process in InputParaTest (#3490) Co-authored-by: kirk0830 <[email protected]> * add blas support for FindLAPACK.cmake (#3497) * more unittest of QO: towards orbital selection (#3499) * Fix: fix bug in mulliken charge calculation (#3503) * fix phase * fix case test * Refactor: namespace Conv_Coulomb_Pot_K (#3446) * Refactor: namespace Conv_Coulomb_Pot_K * Refactor: namespace Conv_Coulomb_Pot_K --------- Co-authored-by: wqzhou <[email protected]> * enable the computation of all zeros in one function call (#3449) Co-authored-by: wqzhou <[email protected]> * replace ios.eof() by ios.good() to avoid meeting badbit and failbit in reading STRU (#3506) * Build: add ccache to accelerate the testing process (#3509) * Build: add ccache to accelerate the testing process * Update test.yml * Update test.yml * Update test.yml * Docs: to avoid the misunderstanding in docs (#3518) * to avoid the misunderstanding in docs * Update docs/quick_start/hands_on.md Co-authored-by: Chun Cai <[email protected]> --------- Co-authored-by: Chun Cai <[email protected]> * Docs: fix a missing depencency in conda build env (#3508) * Feature: Add ENABLE_RAPIDJSON option to control the output of abacus.json (#3519) Add ENABLE_RAPIDJSON option to control the output of abacus.json * Feature: add python wrapper for math sphbes (#3475) * recommit for review * add python wrapper * remove timer since performace tests add * Feature: support segment split in kline mode in KPT file and `out_band` band output precision control, `8` as default (#3493) * add precision control * correct serial version of nscf_band function * fix issue 3482 * update unit and integrated test * update document * correct unittest and make compatible with false and true * fix: bug in Autotest.sh when result.ref has no totaltimeref (#3523) * Fix : unit test of module_xc (#3524) * Fix: omit small magnetic moments to avoid numerical instability (#3530) * update deltalambda * avoid numerical error in orbMulP * add constrain on Mi * change case reference value * Fix: fix multiple compiler warnings (#3515) * Fix: add noreturn attribute to warning_quit * Add type conversion * fix string literal * fix small number trunctuation * Fix system call returned value not checked * fix missing braket * Refactor parameter_pool.cpp and parameter_pool.h * remove duplicated return statements * Change WARNING_QUIT occurances in tests * Add warning message to help debug UT * output the default precision flag (#3496) Co-authored-by: kirk0830 <[email protected]> * Build: Improving CMake performance for finding LibXC and ELPA (#3478) * Fix for finding LibXC and ELPA * For compatibility to previous routines * syntax fix for FindELPA.cmake * Update cmake/FindELPA.cmake Co-authored-by: Chun Cai <[email protected]> * Using CMake interface as default for finding LibXC * update docs * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * update docs for installing pkg-config * Update FindLibxc.cmake * Update FindLibxc.cmake * remove previous LibXC routine in CMakeLists.txt Co-authored-by: Chun Cai <[email protected]> * Update easy_install.md with Makefile-built LibXC supported * Update easy_install.md to include different behavior in different version on finding ELPA --------- Co-authored-by: Chun Cai <[email protected]> * Docs: correct some docs about mp2 smearing method (#3533) * correct some docs about mp2 smearing method * add docs about mv method * Feature : printing band density (#3501) Co-authored-by: wenfei-li <[email protected]> Co-authored-by: wqzhou <[email protected]> * add some docs for PR#3501 (#3537) * Feature: enable restart charge density mixing during SCF (#3542) * add a new parameter mixing_restart * do not update rho if iter==mixing_restart * do not update rho if iter==mixing_restart-1 * reset mix and rho_mdata if iter==mixing_restart * fix SCF exit directly since drho=0 if iter=GlobalV::MIXING_RESTART * re-set_mixing in eachiterinit for PW and LCAO * enable SCF restarts in esolver_ks::RUN * add some UnitTests * add some Docs * new inputs added * Update input-main.md (#3551) Solve the format problem mentioned in issue 3543 * Build: fix compatibility issue against toolchain install (#3540) * Fix for finding LibXC and ELPA * For compatibility to previous routines * syntax fix for FindELPA.cmake * Update cmake/FindELPA.cmake Co-authored-by: Chun Cai <[email protected]> * Using CMake interface as default for finding LibXC * update docs * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * update docs for installing pkg-config * Update FindLibxc.cmake * Update FindLibxc.cmake * remove previous LibXC routine in CMakeLists.txt Co-authored-by: Chun Cai <[email protected]> * Update easy_install.md with Makefile-built LibXC supported * Update easy_install.md to include different behavior in different version on finding ELPA * fix compatibility issue against toolchain * Change default ELPA install routine to old one --------- Co-authored-by: Chun Cai <[email protected]> * Test: Configure performance tests for math libraries (#3511) * add performace test of sphbes functions. * fix benchmark cmake errors * add dependencies for docker * update docs * add performance tests for sphbes * add google benchmark * rewrite benchmark tests in fixtures * disable internal testing in benchmark * merge benchmark into integration test --------- Co-authored-by: StarGrys <[email protected]> * Configure Makefile Compiling, fix typos * Fix Makefile Intel toolchains compile errors * Fix even more PEXSI related Makefile compiling issues * Update hsolver_pw.cpp (#3556) when use_uspp==false, overlap matrix should be E. * Fix: cuda build target (#3276) * Fix: cuda buid target * Update CMakeLists.txt --------- Co-authored-by: Denghui Lu <[email protected]> --------- Co-authored-by: wqzhou <[email protected]> Co-authored-by: kirk0830 <[email protected]> Co-authored-by: Haozhi Han <[email protected]> Co-authored-by: Zhao Tianqi <[email protected]> Co-authored-by: PeizeLin <[email protected]> Co-authored-by: jinzx10 <[email protected]> Co-authored-by: Chun Cai <[email protected]> Co-authored-by: Peng Xingliang <[email protected]> Co-authored-by: Jie Li <[email protected]> Co-authored-by: Wenfei Li <[email protected]> Co-authored-by: Denghui Lu <[email protected]> Co-authored-by: YI Zeping <[email protected]> Co-authored-by: wenfei-li <[email protected]> Co-authored-by: jingan-181 <[email protected]> Co-authored-by: StarGrys <[email protected]> Co-authored-by: Haozhi Han <[email protected]> * Revert "Modify inputs and update to latest version" * Update FindPEXSI.cmake to fix Comments * Fix CI errors * Fix CI Errors and Merge with Upstream * Resolve Pull Request Reviews * Fix parallel communication related issue * Fix vars in Makefile.vars, add input tests and comments for pexsi vars * Fix nspin > 1 cases * Improvement: take calculated mu as new initial guess, may slightly improve performance * Fix mistakes in the last commit * Fix: params and features - set default pexsi_temp - fix md in pexsi * fix empty lines * Fix: move params to pexsi_solver, rename USE_PEXSI to ENABLE_PEXSI * Docs: added docs for pexsi inputs * Fix unit test issues in input_conv * Change default pexsi_npole from 80 to 40 * Place pexsi_EDM in DensityMatrix, set size of pexsi_dm = 1 when GlobalV::NSPIN==4, and add comments for dmToRho * An unit test added for DiagoPexsi * modify for changed gint interface * correct nspin related behaviors * add efermi passthrough * Revert "add efermi passthrough" This reverts commit d7b402d. * commits to resolve conversations related to codes * DM and EDM pointers in pexsi now handled by diagopexsi, and copying h s matrices no longer needed * add pexsi examples * fix pexsi unit test (original version shouldn't run) * add building docs for pexsi * set cxx standard to c++14, which is required in make_unique * Fix: Fix typo related to pexsi * update to PPEXSIDFTDriver2 * default npoints to 1, so single core pexsi will work * Fix Compile errors * refactor to abandon `pdiagh` * Fix mu_buffer and nspin * Updates with latest * Refactor: in ESolver_KS_PW, calculate deband in iter_finish, not in hamilt2density * Fix: make files in consistent with upstream * Refactor * Refactor * Refactor * Refactor * Refactor * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Refactor: Remove set kvec funcs in `K_Vectors` * Refactor: Remove final_scf * Refactor: Fix kvecc2d/d2c * Fix: Tests * Fix: Tests * Fix: Tests * Fix: Tests * Refactor: Final? * Fix * Fix * Fix * Fix * Fix: Compile Error on CUDA > 12.9 * Fix: Compile Error on CUDA > 12.9 * Feature: Support linear combination of coulomb_param for EXX PW * Fix: Fix compile issue * Uploading hybrid gauge tddft (#6369) * hybrid gague * update tests * update * update * update * update * update unit test * fix tests * update tests * fix read_wfc * fix catch_properties.sh * fix restart * update gpu test * update tests * fix * fix input_conv * Improve md calculation stress output in running log (#6366) * Improve md calculation stress output in running log * Module_IO Unittest modify * ModuleMD Unittests modify * modify code comment in fire_test.cpp * maintain setprecision(8) for md stress output * Refactor: Remove redundant Input_para from ESolver Class (#6370) * Refactor: Replace PARAM.inp with inp in ESolver classes for consistency * Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency * Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method * Revert "Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method" This reverts commit b1bd0fd. * Revert "Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency" This reverts commit f4f81e3. * Fix: Fix memory leak introduced by new gint module (#6375) * fix memory leak * delete copy assignment * refactor Exx_Opt_Orb (#6378) Co-authored-by: linpz <[email protected]> * Add use sw and fix Floating point exception (#6372) * remove float error in sunway * fix ig=0 * add the sw * change the make_dir * unify the gg use * fix compile bug * add init * temporarily remove the sunway define * add the pesduo * fix compile bug * fix bug in the betar * modify the test * Update the output formats of rt-TDDFT (#6381) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * [Refactor] Rename grid to module_grid and genelpa to module_genelpa (#6386) * Rename grid to module_grid * Rename genelpa to module_genelpa * Fix cmake * Update the outputs of geometry relaxation (#6387) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * Feature: support the output of matrix representation of symm_ops (#6390) * Feature: support output the matrix representation of symmetry operation * Feature: support the output of matrix representation of symm_ops * update the document * Feature: Output real space wavefunction and partial charge density when `device=gpu` (#6391) * Fix GPU output of out_pchg and out_wfc_norm, out_wfc_re_im * GPU integrate test is functional again * Optimize RT-TDDFT dipole output (#6393) * Perf: support GPU version of cal_force_cc with LCAO basis (#6392) * support GPU version of cal_force_cc with LCAO basis * fix a bug * [Refactor] Move module_lr to source_lcao and add a new folder module_external in source_base (#6388) * Move module_lr to source_lcao * Fix test build * Move blas_connector to module_external * Fix header use * Fix internal header use * A fierce battle with Makefile😡 * Move blacs_connector.h to module_external * Move lapack_connector.h and lapack_wrapper.h to module_external * Fix header usage * Move scalapack_connector.h to module_external * Fix a bug for the output information after relaxation (#6395) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * update some function names, update output A to Angstrom * change eV/A to eV/Angstrom * bump version to 3.9.0.10 (#6397) Co-authored-by: Liang Sun <[email protected]> * Fix: fix exx_gamma_extrapolation error in MPI * Fix: fix exx_gamma_extrapolation error in MPI * Update lapack.cu * Fix: Integrate test --------- Co-authored-by: zhangzhihao <[email protected]> Co-authored-by: zhangzh-pku <[email protected]> Co-authored-by: wqzhou <[email protected]> Co-authored-by: kirk0830 <[email protected]> Co-authored-by: Haozhi Han <[email protected]> Co-authored-by: Zhao Tianqi <[email protected]> Co-authored-by: PeizeLin <[email protected]> Co-authored-by: jinzx10 <[email protected]> Co-authored-by: Chun Cai <[email protected]> Co-authored-by: Peng Xingliang <[email protected]> Co-authored-by: Jie Li <[email protected]> Co-authored-by: Wenfei Li <[email protected]> Co-authored-by: Denghui Lu <[email protected]> Co-authored-by: YI Zeping <[email protected]> Co-authored-by: wenfei-li <[email protected]> Co-authored-by: jingan-181 <[email protected]> Co-authored-by: StarGrys <[email protected]> Co-authored-by: Haozhi Han <[email protected]> Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: HTZhao <[email protected]> Co-authored-by: lanshuyue <[email protected]> Co-authored-by: Liang Sun <[email protected]> Co-authored-by: dzzz2001 <[email protected]> Co-authored-by: linpeize <[email protected]> Co-authored-by: linpz <[email protected]> Co-authored-by: liiutao <[email protected]> Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: Critsium <[email protected]> Co-authored-by: Taoni Bao <[email protected]> Co-authored-by: Chen Nuo <[email protected]>
1 parent 2fcc64e commit 3f7e7a4

File tree

2 files changed

+9
-7
lines changed
  • source/source_base/module_container/ATen/kernels/cuda
  • tests/16_SDFT_GPU/187_PW_SDFT_MALL_BPCG_GPU

2 files changed

+9
-7
lines changed

source/source_base/module_container/ATen/kernels/cuda/lapack.cu

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,10 @@ struct lapack_trtri<T, DEVICE_GPU> {
7070
{
7171
// TODO: trtri is not implemented in this method yet
7272
// Cause the trtri in cuSolver is not stable for ABACUS!
73-
//cuSolverConnector::trtri(cusolver_handle, uplo, diag, dim, Mat, lda);
74-
cuSolverConnector::potri(cusolver_handle, uplo, diag, dim, Mat, lda);
73+
// But why?! trtri and potri are different routines for different job!
74+
// How can BPCG work without using a proper routine?
75+
cuSolverConnector::trtri(cusolver_handle, uplo, diag, dim, Mat, lda);
76+
// cuSolverConnector::potri(cusolver_handle, uplo, diag, dim, Mat, lda);
7577
}
7678
};
7779

@@ -201,4 +203,4 @@ template struct lapack_getrs<std::complex<float>, DEVICE_GPU>;
201203
template struct lapack_getrs<std::complex<double>, DEVICE_GPU>;
202204

203205
} // namespace kernels
204-
} // namespace container
206+
} // namespace container
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
etotref -96.9361190965003630
2-
etotperatomref -48.4680595483
3-
totalforceref 248.979444
4-
totalstressref 230453.604050
1+
etotref -96.9361125708192191
2+
etotperatomref -48.4680562854
3+
totalforceref 248.979476
4+
totalstressref 230453.662470
55
totaltimeref 6.44

0 commit comments

Comments
 (0)