GithubHelp home page GithubHelp logo

deepmodeling / dpdata Goto Github PK

View Code? Open in Web Editor NEW
195.0 9.0 128.0 6.5 MB

Manipulating multiple atomic simulation data formats, including DeePMD-kit, VASP, LAMMPS, ABACUS, etc.

Home Page: https://docs.deepmodeling.com/projects/dpdata/

License: GNU Lesser General Public License v3.0

Python 79.27% C 0.02% Makefile 0.02% Roff 3.74% OCaml 16.96%
python atomic-data

dpdata's Introduction

dpdata

conda-forge pip install Documentation Status

dpdata is a python package for manipulating data formats of software in computational science, including DeePMD-kit, VASP, LAMMPS, GROMACS, Gaussian. dpdata only works with python 3.7 or above.

Installation

One can download the source code of dpdata by

git clone https://github.com/deepmodeling/dpdata.git dpdata

then use pip to install the module from source

cd dpdata
pip install .

dpdata can also by install via pip without source

pip install dpdata

Quick start

This section gives some examples on how dpdata works. Firstly one needs to import the module in a python 3.x compatible code.

import dpdata

The typicall workflow of dpdata is

  1. Load data from vasp or lammps or deepmd-kit data files.
  2. Manipulate data
  3. Dump data to in a desired format

Load data

d_poscar = dpdata.System("POSCAR", fmt="vasp/poscar")

or let dpdata infer the format (vasp/poscar) of the file from the file name extension

d_poscar = dpdata.System("my.POSCAR")

The number of atoms, atom types, coordinates are loaded from the POSCAR and stored to a data System called d_poscar. A data System (a concept used by deepmd-kit) contains frames that has the same number of atoms of the same type. The order of the atoms should be consistent among the frames in one System. It is noted that POSCAR only contains one frame. If the multiple frames stored in, for example, a OUTCAR is wanted,

d_outcar = dpdata.LabeledSystem("OUTCAR")

The labels provided in the OUTCAR, i.e. energies, forces and virials (if any), are loaded by LabeledSystem. It is noted that the forces of atoms are always assumed to exist. LabeledSystem is a derived class of System.

The System or LabeledSystem can be constructed from the following file formats with the format key in the table passed to argument fmt:

Software format multi frames labeled class format key
vasp poscar False False System 'vasp/poscar'
vasp outcar True True LabeledSystem 'vasp/outcar'
vasp xml True True LabeledSystem 'vasp/xml'
lammps lmp False False System 'lammps/lmp'
lammps dump True False System 'lammps/dump'
deepmd raw True False System 'deepmd/raw'
deepmd npy True False System 'deepmd/npy'
deepmd raw True True LabeledSystem 'deepmd/raw'
deepmd npy True True LabeledSystem 'deepmd/npy'
deepmd npy True True MultiSystems 'deepmd/npy/mixed'
deepmd npy True False MultiSystems 'deepmd/npy/mixed'
gaussian log False True LabeledSystem 'gaussian/log'
gaussian log True True LabeledSystem 'gaussian/md'
siesta output False True LabeledSystem 'siesta/output'
siesta aimd_output True True LabeledSystem 'siesta/aimd_output'
cp2k(deprecated in future) output False True LabeledSystem 'cp2k/output'
cp2k(deprecated in future) aimd_output True True LabeledSystem 'cp2k/aimd_output'
cp2k(plug-in) stdout False True LabeledSystem 'cp2kdata/e_f'
cp2k(plug-in) stdout True True LabeledSystem 'cp2kdata/md'
QE log False True LabeledSystem 'qe/pw/scf'
QE log True False System 'qe/cp/traj'
QE log True True LabeledSystem 'qe/cp/traj'
Fhi-aims output True True LabeledSystem 'fhi_aims/md'
Fhi-aims output False True LabeledSystem 'fhi_aims/scf'
quip/gap xyz True True MultiSystems 'quip/gap/xyz'
PWmat atom.config False False System 'pwmat/atom.config'
PWmat movement True True LabeledSystem 'pwmat/movement'
PWmat OUT.MLMD True True LabeledSystem 'pwmat/out.mlmd'
Amber multi True True LabeledSystem 'amber/md'
Amber/sqm sqm.out False False System 'sqm/out'
Gromacs gro True False System 'gromacs/gro'
ABACUS STRU False False System 'abacus/stru'
ABACUS STRU False True LabeledSystem 'abacus/scf'
ABACUS cif True True LabeledSystem 'abacus/md'
ABACUS STRU True True LabeledSystem 'abacus/relax'
ase structure True True MultiSystems 'ase/structure'
DFTB+ dftbplus False True LabeledSystem 'dftbplus'
n2p2 n2p2 True True LabeledSystem 'n2p2'

The Class dpdata.MultiSystems can read data from a dir which may contains many files of different systems, or from single xyz file which contains different systems.

Use dpdata.MultiSystems.from_dir to read from a directory, dpdata.MultiSystems will walk in the directory Recursively and find all file with specific file_name. Supports all the file formats that dpdata.LabeledSystem supports.

Use dpdata.MultiSystems.from_file to read from single file. Single-file support is available for the quip/gap/xyz and ase/structure formats.

For example, for quip/gap xyz files, single .xyz file may contain many different configurations with different atom numbers and atom type.

The following commands relating to Class dpdata.MultiSystems may be useful.

# load data

xyz_multi_systems = dpdata.MultiSystems.from_file(
    file_name="tests/xyz/xyz_unittest.xyz", fmt="quip/gap/xyz"
)
vasp_multi_systems = dpdata.MultiSystems.from_dir(
    dir_name="./mgal_outcar", file_name="OUTCAR", fmt="vasp/outcar"
)

# use wildcard
vasp_multi_systems = dpdata.MultiSystems.from_dir(
    dir_name="./mgal_outcar", file_name="*OUTCAR", fmt="vasp/outcar"
)

# print the multi_system infomation
print(xyz_multi_systems)
print(xyz_multi_systems.systems)  # return a dictionaries

# print the system infomation
print(xyz_multi_systems.systems["B1C9"].data)

# dump a system's data to ./my_work_dir/B1C9_raw folder
xyz_multi_systems.systems["B1C9"].to_deepmd_raw("./my_work_dir/B1C9_raw")

# dump all systems
xyz_multi_systems.to_deepmd_raw("./my_deepmd_data/")

You may also use the following code to parse muti-system:

from dpdata import LabeledSystem, MultiSystems
from glob import glob

"""
process multi systems
"""
fs = glob("./*/OUTCAR")  # remeber to change here !!!
ms = MultiSystems()
for f in fs:
    try:
        ls = LabeledSystem(f)
    except:
        print(f)
    if len(ls) > 0:
        ms.append(ls)

ms.to_deepmd_raw("deepmd")
ms.to_deepmd_npy("deepmd")

Access data

These properties stored in System and LabeledSystem can be accessed by operator [] with the key of the property supplied, for example

coords = d_outcar["coords"]

Available properties are (nframe: number of frames in the system, natoms: total number of atoms in the system)

key type dimension are labels description
'atom_names' list of str ntypes False The name of each atom type
'atom_numbs' list of int ntypes False The number of atoms of each atom type
'atom_types' np.ndarray natoms False Array assigning type to each atom
'cells' np.ndarray nframes x 3 x 3 False The cell tensor of each frame
'coords' np.ndarray nframes x natoms x 3 False The atom coordinates
'energies' np.ndarray nframes True The frame energies
'forces' np.ndarray nframes x natoms x 3 True The atom forces
'virials' np.ndarray nframes x 3 x 3 True The virial tensor of each frame

Dump data

The data stored in System or LabeledSystem can be dumped in 'lammps/lmp' or 'vasp/poscar' format, for example:

d_outcar.to("lammps/lmp", "conf.lmp", frame_idx=0)

The first frames of d_outcar will be dumped to 'conf.lmp'

d_outcar.to("vasp/poscar", "POSCAR", frame_idx=-1)

The last frames of d_outcar will be dumped to 'POSCAR'.

The data stored in LabeledSystem can be dumped to deepmd-kit raw format, for example

d_outcar.to("deepmd/raw", "dpmd_raw")

Or a simpler command:

dpdata.LabeledSystem("OUTCAR").to("deepmd/raw", "dpmd_raw")

Frame selection can be implemented by

dpdata.LabeledSystem("OUTCAR").sub_system([0, -1]).to("deepmd/raw", "dpmd_raw")

by which only the first and last frames are dumped to dpmd_raw.

replicate

dpdata will create a super cell of the current atom configuration.

dpdata.System("./POSCAR").replicate(
    (
        1,
        2,
        3,
    )
)

tuple(1,2,3) means don't copy atom configuration in x direction, make 2 copys in y direction, make 3 copys in z direction.

perturb

By the following example, each frame of the original system (dpdata.System('./POSCAR')) is perturbed to generate three new frames. For each frame, the cell is perturbed by 5% and the atom positions are perturbed by 0.6 Angstrom. atom_pert_style indicates that the perturbation to the atom positions is subject to normal distribution. Other available options to atom_pert_style areuniform (uniform in a ball), and const (uniform on a sphere).

perturbed_system = dpdata.System("./POSCAR").perturb(
    pert_num=3,
    cell_pert_fraction=0.05,
    atom_pert_distance=0.6,
    atom_pert_style="normal",
)
print(perturbed_system.data)

replace

By the following example, Random 8 Hf atoms in the system will be replaced by Zr atoms with the atom postion unchanged.

s = dpdata.System("tests/poscars/POSCAR.P42nmc", fmt="vasp/poscar")
s.replace("Hf", "Zr", 8)
s.to_vasp_poscar("POSCAR.P42nmc.replace")

BondOrderSystem

A new class BondOrderSystem which inherits from class System is introduced in dpdata. This new class contains information of chemical bonds and formal charges (stored in BondOrderSystem.data['bonds'], BondOrderSystem.data['formal_charges']). Now BondOrderSystem can only read from .mol/.sdf formats, because of its dependency on rdkit (which means rdkit must be installed if you want to use this function). Other formats, such as pdb, must be converted to .mol/.sdf format (maybe with software like open babel).

import dpdata

system_1 = dpdata.BondOrderSystem(
    "tests/bond_order/CH3OH.mol", fmt="mol"
)  # read from .mol file
system_2 = dpdata.BondOrderSystem(
    "tests/bond_order/methane.sdf", fmt="sdf"
)  # read from .sdf file

In sdf file, all molecules must be of the same topology (i.e. conformers of the same molecular configuration). BondOrderSystem also supports initialize from a rdkit.Chem.rdchem.Mol object directly.

from rdkit import Chem
from rdkit.Chem import AllChem
import dpdata

mol = Chem.MolFromSmiles("CC")
mol = Chem.AddHs(mol)
AllChem.EmbedMultipleConfs(mol, 10)
system = dpdata.BondOrderSystem(rdkit_mol=mol)

Bond Order Assignment

The BondOrderSystem implements a more robust sanitize procedure for rdkit Mol, as defined in dpdata.rdkit.santizie.Sanitizer. This class defines 3 level of sanitization process by: low, medium and high. (default is medium).

  • low: use rdkit.Chem.SanitizeMol() function to sanitize molecule.
  • medium: before using rdkit, the programm will first assign formal charge of each atom to avoid inappropriate valence exceptions. However, this mode requires the rightness of the bond order information in the given molecule.
  • high: the program will try to fix inappropriate bond orders in aromatic hetreocycles, phosphate, sulfate, carboxyl, nitro, nitrine, guanidine groups. If this procedure fails to sanitize the given molecule, the program will then try to call obabel to pre-process the mol and repeat the sanitization procedure. That is to say, if you wan't to use this level of sanitization, please ensure obabel is installed in the environment. According to our test, our sanitization procedure can successfully read 4852 small molecules in the PDBBind-refined-set. It is necessary to point out that the in the molecule file (mol/sdf), the number of explicit hydrogens has to be correct. Thus, we recommend to use obabel xxx -O xxx -h to pre-process the file. The reason why we do not implement this hydrogen-adding procedure in dpdata is that we can not ensure its correctness.
import dpdata

for sdf_file in glob.glob("bond_order/refined-set-ligands/obabel/*sdf"):
    syst = dpdata.BondOrderSystem(sdf_file, sanitize_level="high", verbose=False)

Formal Charge Assignment

BondOrderSystem implement a method to assign formal charge for each atom based on the 8-electron rule (see below). Note that it only supports common elements in bio-system: B,C,N,O,P,S,As

import dpdata

syst = dpdata.BondOrderSystem("tests/bond_order/CH3NH3+.mol", fmt="mol")
print(syst.get_formal_charges())  # return the formal charge on each atom
print(syst.get_charge())  # return the total charge of the system

If a valence of 3 is detected on carbon, the formal charge will be assigned to -1. Because for most cases (in alkynyl anion, isonitrile, cyclopentadienyl anion), the formal charge on 3-valence carbon is -1, and this is also consisent with the 8-electron rule.

Mixed Type Format

The format deepmd/npy/mixed is the mixed type numpy format for DeePMD-kit, and can be loaded or dumped through class dpdata.MultiSystems.

Under this format, systems with the same number of atoms but different formula can be put together for a larger system, especially when the frame numbers in systems are sparse.

This also helps to mixture the type information together for model training with type embedding network.

Here are examples using deepmd/npy/mixed format:

  • Dump a MultiSystems into a mixed type numpy directory:
import dpdata

dpdata.MultiSystems(*systems).to_deepmd_npy_mixed("mixed_dir")
  • Load a mixed type data into a MultiSystems:
import dpdata

dpdata.MultiSystems().load_systems_from_file("mixed_dir", fmt="deepmd/npy/mixed")

Plugins

One can follow a simple example to add their own format by creating and installing plugins. It's critical to add the Format class to entry_points['dpdata.plugins'] in pyproject.toml:

[project.entry-points.'dpdata.plugins']
random = "dpdata_random:RandomFormat"

dpdata's People

Contributors

amcadmus avatar angusezhang avatar chentao168 avatar dependabot[bot] avatar dmh1998dmh avatar ericwang6 avatar felix5572 avatar haidi-ustc avatar hongritianqi avatar huangjiameng avatar iprozd avatar liu-rx avatar liuliping0315 avatar marian-code avatar njzjz avatar njzjz-bot avatar panxiang126 avatar pkufjh avatar pre-commit-ci[bot] avatar pxlxingliang avatar robinzyb avatar shigeandtomo avatar silvia-liu avatar starinthesky72 avatar thangckt avatar tuoping avatar vibsteamer avatar wanghan-iapcm avatar yi-fanli avatar zezhong-zhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dpdata's Issues

VASP virial data processing

Hello.

Considering the link below, virial.raw file should include virials in the unit of eV, not virial pressure in kBar.
deepmodeling/deepmd-kit#230

Since the virial pressure of the VASP OURCAR file including is 'pressure' in unit kBar,
isn't it right to multiply the volume of the box to make 'virial.raw' file?

But I couldn't find any part multiplying volume.
Please let me know if I'm understanding wrong.

thx!
Sincerely, YJ Choi

[BUG] Bug in reading lammps trajectory files with random type id

dpdata assumes the types of elements in one file are the same, which is not true in some cases. For example, the trajectory file cannot be loaded correctly:

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS xy xz yz pp pp pp
0.0000000000000000e+00 6.8043376809999998e+00 2.5385198599999999e-02
0.0000000000000000e+00 6.7821075796999999e+00 1.8630761460000000e-01
0.0000000000000000e+00 6.6801861338000004e+00 6.5204177000000002e-02
ITEM: ATOMS id type x y z
1 1 3.48873 0.0697213 6.67774
2 2 3.38621 0.033338 3.34239
3 3 1.79424 1.7281 5.01015
ITEM: TIMESTEP
10
ITEM: NUMBER OF ATOMS
3
ITEM: BOX BOUNDS xy xz yz pp pp pp
3.0951719137647604e-02 6.7713982144168243e+00 2.5146837349522749e-02
3.1535098850918430e-02 6.7499602284333751e+00 1.8455822840494820e-01
3.1362715442244227e-02 6.6488234183577575e+00 6.4591924584292706e-02
ITEM: ATOMS id type x y z
1 3 6.63593 3.49936 3.46086
2 2 3.44881 6.57204 3.4593
3 1 1.85117 5.11268 4.96295

Code format

Summary

Do we have a general python format guideline (i.e. Black)? I noticed some format difference between files. If I use a format plugin in vscode, the actually code changes can be mixed with the format changes which can be confusing. Maybe we can fix the format issue in one commit and use a format workflow for all the following codes submitted.

box.raw file not found with nopbc file

I install dpdata 0.2.1 via conda.

When I tryed read the nobc example data via dpdata an error is rise.
Here belew the command I gave:
x = dpdata.LabeledSystem("mypath/deepmd-kit/examples/nopbc/data/C1H4O2", fmt='deepmd/npy')
Here the error:
FileNotFoundError: [Errno 2] No such file or directory: '/mypath/deepmd-kit/examples/nopbc/data/C1H4O2/set.000/box.npy'
Here what is written In the doc. "If one needs to train a non-periodic system, an empty nopbc file should be put under the system directory. box.raw is not necessary is a non-periodic system."
Infact in the example directory no box.raw or virial is present.

[BUG] OUTCAR transformation

Bug summary

I encountered an error converting OUTCAR data using DPData

IndexError: list index out of range

DeePMD-kit Version

2.0.4

TensorFlow Version

2.8.2

How did you download the software?

conda

Input Files, Running Commands, Error Log, etc.

Running Commands๏ผš

import dpdata
dsys = dpdata.LabeledSystem('OUTCAR')

Error Log๏ผš
Traceback (most recent call last):
File "", line 1, in
File "/home/wangchenyang/anaconda3/envs/deepmd/lib/python3.9/site-packages/dpdata/system.py", line 227, in init
self.from_fmt(file_name, fmt, type_map=type_map, begin= begin, step=step, **kwargs)
File "/home/wangchenyang/anaconda3/envs/deepmd/lib/python3.9/site-packages/dpdata/system.py", line 253, in from_fmt
return self.from_fmt_obj(load_format(fmt), file_name, **kwargs)
File "/home/wangchenyang/anaconda3/envs/deepmd/lib/python3.9/site-packages/dpdata/system.py", line 1013, in from_fmt_obj
data = fmtobj.from_labeled_system(file_name, **kwargs)
File "/home/wangchenyang/anaconda3/envs/deepmd/lib/python3.9/site-packages/dpdata/plugins/vasp.py", line 68, in from_labeled_system
= dpdata.vasp.outcar.get_frames(file_name, begin=begin, step=step, ml=ml)
File "/home/wangchenyang/anaconda3/envs/deepmd/lib/python3.9/site-packages/dpdata/vasp/outcar.py", line 71, in get_frames
coord, cell, energy, force, virial, is_converge = analyze_block(blk, ntot, nelm, ml)
File "/home/wangchenyang/anaconda3/envs/deepmd/lib/python3.9/site-packages/dpdata/vasp/outcar.py", line 134, in analyze_block
virial[0][2] = tmp_v[5]
IndexError: list index out of range

Steps to Reproduce

There is no steps

Further Information, Files, and Links

No response

[BUG] LabeledSystem from OUTCAR not working

Summary

The generation of a LabeledSystem from an OUTCAR file is not working (anymore, since version 0.2.6).

Running on CentOS 8 with python 3.7.9.

Steps to Reproduce

dsys = dpdata.LabeledSystem('OUTCAR.out', fmt='vasp/outcar')

Command works until version 0.2.5, with >=0.2.6 getting an error, here evaluated for version 0.2.8:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-d285c09cc64a> in <module>
----> 1 dsys = dpdata.LabeledSystem('OUTCAR.out', fmt='vasp/outcar')

/opt/python/python-3.7.9/lib/python3.7/site-packages/dpdata/system.py in __init__(self, file_name, fmt, type_map, begin, step, data, **kwargs)
    225         if file_name is None :
    226             return
--> 227         self.from_fmt(file_name, fmt, type_map=type_map, begin= begin, step=step, **kwargs)
    228 
    229         if type_map is not None:

/opt/python/python-3.7.9/lib/python3.7/site-packages/dpdata/system.py in from_fmt(self, file_name, fmt, **kwargs)
    251         if fmt == 'auto':
    252             fmt = os.path.basename(file_name).split('.')[-1].lower()
--> 253         return self.from_fmt_obj(load_format(fmt), file_name, **kwargs)
    254 
    255     def from_fmt_obj(self, fmtobj, file_name, **kwargs):

/opt/python/python-3.7.9/lib/python3.7/site-packages/dpdata/system.py in from_fmt_obj(self, fmtobj, file_name, **kwargs)
   1011 
   1012     def from_fmt_obj(self, fmtobj, file_name, **kwargs):
-> 1013         data = fmtobj.from_labeled_system(file_name, **kwargs)
   1014         if data:
   1015             if isinstance(data, (list, tuple)):

/opt/python/python-3.7.9/lib/python3.7/site-packages/dpdata/plugins/vasp.py in from_labeled_system(self, file_name, begin, step, **kwargs)
     75                 vol = np.linalg.det(np.reshape(data['cells'][ii], [3, 3]))
     76                 data['virials'][ii] *= v_pref * vol
---> 77         data = uniq_atom_names(data)
     78         return data
     79 

/opt/python/python-3.7.9/lib/python3.7/site-packages/dpdata/utils.py in uniq_atom_names(data)
     87     data['atom_names'] = unames
     88     tmp_type = list(data['atom_types']).copy()
---> 89     data['atom_types'] = np.array([uidxmap[jj] for jj in tmp_type], dtype=int)
     90     data['atom_numbs'] = [sum( ii == data['atom_types'] ) for ii in range(len(data['atom_names'])) ]
     91     return data

/opt/python/python-3.7.9/lib/python3.7/site-packages/dpdata/utils.py in <listcomp>(.0)
     87     data['atom_names'] = unames
     88     tmp_type = list(data['atom_types']).copy()
---> 89     data['atom_types'] = np.array([uidxmap[jj] for jj in tmp_type], dtype=int)
     90     data['atom_numbs'] = [sum( ii == data['atom_types'] ) for ii in range(len(data['atom_names'])) ]
     91     return data

IndexError: list index out of range

Problem reading multiple OUTCAR files

Summary
When reading multiple OUTCAR files having different numbers of frames, dpdata does not read all the files.

Detailed Description
I'm analyzing historical data and only OUTCAR files are available. When I try to read a group of seven OUTCAR files representing a single lengthy trajectory, only the first three files (each having the same number of frames) are read. The remaining OUTCAR files (having differing numbers of frames) appear to be ignored. Is there a fix for this in the current release or would it be possible to to generalize the current code to allow it to read multiple OUTCAR files each having a different number of frames? Thanks much.

Further Information, Files, and Links
Here is a portion of the script I'm using:

from dpdata import LabeledSystem,MultiSystems
from glob import glob

fs=glob('../OUTCAR.swf0.*')
ms=MultiSystems()
for f in fs:
try:
ls=LabeledSystem(f,fmt = 'vasp/outcar')
except:
print(f)
if len(ls)>0:
ms.append(ls)

ms.to_deepmd_raw('deepmd')
ms.to_deepmd_npy('deepmd')

This script generates the following files & directories:
box.raw coord.raw energy.raw force.raw set.000 set.001 set.002 type_map.raw type.raw virial.raw

Apparently, only the first three OUTCAR files (each having the same number of frames) were read. The last four OUTCAR files (having different numbers of frames) were apparently not read.

Each 'set' directory looks like this:
box.npy coord.npy energy.npy force.npy virial.npy

The dpdata is not supported with the vasp6.2.0 for make data set.

Summary
If I use dpdata to collect the data set form outcar calculated with vasp.6.2. I will get :
Traceback (most recent call last):
File "outtdp.py", line 13, in
if len(ls)>0:
NameError: name 'ls' is not defined

the script dp.py I used :

from dpdata import LabeledSystem,MultiSystems
from glob import glob
"""
process multi systems
"""
fs=glob('./OUTCAR') # remeber to change here !!!
ms=MultiSystems()
for f in fs:
try:
ls=LabeledSystem(f)
except:
print(f)
if len(ls)>0:
ms.append(ls)

ms.to_deepmd_raw('deepmd')
ms.to_deepmd_npy('deepmd')

the appendix is the OUTCAR calculated in vasp5.4.4 and vasp6.2.0 edition. How to fix this?

OUTCAR5.4.4.log
OUTCAR6.2.0.log

Failed to read cp2k aimd files - energy difference between xyz and log

Hello,
when reading cp2k aimd outputs ( dpdata.LabeledSystem('test_read_cp2k', fmt='cp2k/aimd_output')) I found this error:

assert log_info_dict['energies']==xyz_info_dict['energies'], (log_info_dict['energies'][0], xyz_info_dict['energies'][0],'There may be errors in the file')
AssertionError: (-10087.061, -10087.062, 'There may be errors in the file')

I controlled the error and it seems a numeric error due to the conversion from a.u. to eV and/or the used of float with different mantissa since in the .xyz it is reported -370.6925172920 and in the .log -370.692517291966794.

I'm running dpdata 0.2.1 intalled with pip with python 3.9.7 and numpy 1.21.2.

Thanks,
Lorenzo

The cp2k module gives incorrect result

The default unit for coordinate of cp2k output is angstrom, but dpdata considers it as bohr, and make a transformation.
line 193~ 195 in dpdata/cp2k/output.py /
coords_list.append([float(line_list[1])*AU_TO_ANG,
float(line_list[2])*AU_TO_ANG,
float(line_list[3])*AU_TO_ANG])

Failed to read cp2k aimd files from restarted files

When reading cp2k aimd outputs from restarted files, dpdata goes error with "AssertionError: (array([], dtype=float32), array([], dtype=float32), 'There may be errors in the file')"
Because with a start of cp2k aimd run, the initialized step (0th) was writen both in xyz and log files, but with a restart from former aimd runs, the initialized step was not written in xyz file.
dpdata failed to match the first energy in log and xyz file from a restarted aimd run.

Error with new version of CP2K package

Dear developers,

I'm having problems to use dpdata to import files from new version of CP2K package. Is this an issue or something I'm missing?

I made the calculation with the new CP2K version for the ~/dpdata/tests/cp2k/aimd directory and after that I got the problem.

Best regards,
Thank you
Filipe

type_map specified element order with MultiSystem does not work

โ€How to generate deepmd/raw in specified element order with MultiSystems๏ผŸโ€

--I use the code to generate the raw and npy files, but it generate the output data in order "C
F
H
Li
N
O
S".

from dpdata import LabeledSystem,MultiSystems
from glob import glob
"""
process multi systems
"""
fs=glob('./*/OUTCAR') # remeber to change here !!!
ms=MultiSystems()
for f in fs:
try:
ls=LabeledSystem(f,type_map=['N', 'S', 'O','C','F','Li','H'])
except:
print(f)
if len(ls)>0:
ms.append(ls)

ms.to_deepmd_raw('deepmd')
ms.to_deepmd_npy('deepmd')

[BUG] _!!!The newest dpdata can not read vasp6.3.0 machine learning molecular dynamic OUTCAR to make a train set!!!_

Summary

The use of the vasp6.3.0 machine learning module leads to the change of the output file of the OUTCAR file. The outcar.py of dpdata cannot read the OUTCAR, and can only read some data in vasprun.xml that is not calculated by the machine learning module.
(We tried to use vasp6 3.0 is because it greatly improves the speed of our calculation and generation of training set from a few days to a few hours. Combined with deepmd, we can quickly get the dynamic results of large-scale system)

Deepmd-kit version=2.0.3
dpdata version =0.25

Installation via pip๏ผŒVery common NVT ensemble room temperature dynamics. I think the output file format is not adapted

Code:
(dpdatanew) [js_wangyl@login1 dpdata]$ python OUTCAR1.py
/fs08/home/js_wangyl/.conda/envs/dpdatanew/lib/python3.9/site-packages/dpdata/vasp/outcar.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return atom_names, atom_numbs, atom_types, np.array(all_cells), np.array(all_coords), np.array(all_energies), np.array(all_forces), all_virials
OUTCAR
Traceback (most recent call last):
File "/fs08/home/js_wangyl/work/test/dpdata/OUTCAR1.py", line 14, in
if len(ls)>0:
NameError: name 'ls' is not defined
(dpdatanew) [js_wangyl@login1 dpdata]$

platform ๏ผš
CENTOS 7

we attached a OUTCAR and vasprun.xml๏ผŒ it was formed by vasp6.3.0 (Using machine learning)
OUTCAR and vasprunxml.zip

[Feature Request] Format plugin system

Summary

It's not a good idea to put all specific format functions into the main class. We should implement a plugin system like ASE's calculator or so on, where the main class can register the plugin class.

Detailed Description

Further Information, Files, and Links

[BUG] Failure in Gaussian's parser

After Gaussian has calculated all the cluster candidates, it terminates in step 8.
run.log reports an error ValueError: could not convert string to float: '219.870882556-1030.493229556'.

If other documents are needed, I can provide them in time.
run.log

The trouble encountered in extracting data from 'OUTCAR'

I 've meet troubles in extracting data from OUTCAR (AIMD) using this script:
#script#
from dpdata import LabeledSystem,MultiSystems
from glob import glob
"""
process multi systems
"""
fs=glob('OUTCAR') # remeber to change here !!!
ms=MultiSystems()
for f in fs:
try:
ls=LabeledSystem(f)
except:
print(f)
if len(ls)>0:
ms.append(ls)

ms.to_deepmd_raw('deepmd')
ms.to_deepmd_npy('deepmd',set_size=2000)

My OUTCAR contains three kinds of atoms and the errors are reported as follows:
#Error Report#
Traceback (most recent call last):
File "script.py", line 13, in
if len(ls)>0:
NameError: name 'ls' is not defined

The above script does not report an error when dealing monatomic system.

[Feature Request] Converting OUTCAR not fully successful when NWRITE is specified in INCAR

Summary

When a user specified NWRITE in INCAR before they begins a vasp task, Vasp may not dump enough information into OUTCAR, i.e. some keywords are missing. Thus the converted type_map.raw will be empty.

Detailed Description

In my case for instance. As a beginner, I manually specified NWRITE=1, which suppressed some vital information including keyword TITEL to be dumped into OUTCAR. This prevented dpdata from recognizing Elements(C, Zr and W) by keyword TITEL.
So finally I had to manually add rows like "TITEL = PAW_PBE C 08Apr2002" as a workaround...
An excerption of my OUTCAR(a C-Zr-W system) is attached in the end. I suggest users be informed of this by an enhanced version of dpdata/dpdata's documentation.

Further Information, Files, and Links
...
INCAR:
POTCAR: PAW_PBE C 08Apr2002
POTCAR: PAW_PBE Zr 08Apr2002
POTCAR: PAW_PBE W 08Apr2002
....
POTCAR: PAW_PBE C 08Apr2002
<Missing lots of keywords from POTCAR, including the most important TITEL>
local pseudopotential read in
partial core-charges read in
partial kinetic energy density read in
atomic valenz-charges read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
PAW grid and wavefunctions read in

number of l-projection operators is LMAX = 4
number of lm-projection operators is LMMAX = 8
...

[BUG]When I use a script to convert OUTCAR to npy format, I encounter the following error๏ผš

dp.zip

Traceback (most recent call last):
File "/work/wq/zcx/dtwistNiN2/0/dp/cov-outcar2dp.py", line 4, in
dsys.to('deepmd/npy', 'deepmd_data', set_size = dsys.get_nframes())
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/dpdata/system.py", line 136, in to
return self.to_fmt_obj(load_format(fmt), *args, **kwargs)
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/dpdata/system.py", line 903, in to_fmt_obj
return fmtobj.to_labeled_system(self.data, *args, **kwargs)
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/dpdata/format.py", line 77, in to_labeled_system
return self.to_system(data, *args, **kwargs)
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/dpdata/plugins/deepmd.py", line 52, in to_system
dpdata.deepmd.comp.dump(
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/dpdata/deepmd/comp.py", line 85, in dump
coords = np.reshape(data['coords'], [nframes, -1]).astype(comp_prec)
File "<array_function internals>", line 5, in reshape
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 298, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/home/wqt/anaconda2/envs/deepmd/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
ValueError: cannot reshape array of size 0 into shape (0,newaxis)
dp.zip

Converting gromacs data to raw files

Hello!

I am using dpdata for converting gromacs output files to raw files for deepmd-kit. I was able to get atom, atom types and coordinates from .gro file, but I was wondering how could I use dpdata to extract forces and energies? I have forces and energies in .trr & .xvg formats respectively from gromacs.

My questions are:

  1. do I need any specific format (from gromacs) to use dpdata and dump it to deepmd_raw files
  2. if not, how would I use dpdata for the same

Thanks,
Nisarg

After loading cp2k data, the result is empty

I try to use cp2k to obtain the data, and I have some problems that I can't fix it.

  1. I prepare the pos.xyz and .log file under output folder, and use out = dpdata.LabeledSystem('output', fmt='cp2k/aimd_output') to load it. But the result is nothing, such as print(out['atom_numbs']) == [].

Here is 1 of 4 frames of H2O-pos-1.xyz:

       3
 i =        0, time =        0.000, E =       -17.1635848163

  O        12.2353220000        1.3766420000       10.8698800000
  H        12.4151390000        2.2331250000       11.2576110000
  H        11.9224760000        1.5737990000        9.9869940000

and I attach the log file and .inp file ( I add a .md suffix to upload, rename it and you can open it)
H2O.log
H2O.inp.md

What should I do to fix this problem and convert it to the raw format for next deepmd training?

  1. What's the difference between fmt='cp2k/output' and fmt='cp2k/aimd_output', what should I use?

[BUG] Read `qe/cp/traj` with unit `angstrom` without `.cel` file

Summary

When dpdata read 'qe/cp/traj' with unit angstrom without '.cel' file, it will generate wrong result.

dpdata 0.2.6

Steps to Reproduce

Files list without cp.cel

cp.in
cp.pos

cp.in

...
CELL_PARAMETERS { angstrom }
       19.7299995422         0.0000000000         0.0000000000
        0.0000000000        19.7299995422         0.0000000000
        0.0000000000         0.0000000000        19.7299995422
...

command

dp_sys = dpdata.System(
    file_name = 'cp',
    fmt = 'qe/cp/traj',
    )
print(dp_sys['cells'][0])

output

[[10.44066613  0.          0.        ]
 [ 0.         10.44066613  0.        ]
 [ 0.          0.         10.44066613]]

When I add cp.cel file in filelist, then the output was right

cp.in
cp.pos
cp.cel
[[19.72999963  0.          0.        ]
 [ 0.         19.72999963  0.        ]
 [ 0.          0.         19.72999963]]

Further Information, Files, and Links

not working predictions with v2

Summary

predict method is not working with v2 version. I get error concernig cell array reshape

I am running version 0.2.1 installed with pip on linux

Steps to Reproduce

from dpdata import LabeledSystem
s = LabeledSystem(".", fmt="deepmd/raw")
s.predict()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-9f4569986ead> in <module>
----> 1 s.predict("../../../../../selective_train3/gen5/train5_5/ge_all_s5_5.pb")

~/Raid/conda_envs/dpmd_gpu_v2.0/lib/python3.9/site-packages/dpdata/system.py in predict(self, dp)
    986             else:
    987                 cell = None
--> 988             e, f, v = dp.eval(coord, cell, atype)
    989             data = ss.data
    990             data['energies'] = e.reshape((1, 1))

~/Raid/conda_envs/dpmd_gpu_v2.0/lib/python3.9/site-packages/deepmd/infer/deep_pot.py in eval(self, coords, cells, atom_types, atomic, fparam, aparam, efield)
    244         else :
    245             if self.auto_batch_size is not None:
--> 246                 e, f, v = self.auto_batch_size.execute_all(self._eval_inner, numb_test, natoms,
    247                               coords, cells, atom_types, fparam = fparam, aparam = aparam, atomic = atomic, efield = efield)
    248             else:

~/Raid/conda_envs/dpmd_gpu_v2.0/lib/python3.9/site-packages/deepmd/utils/batch_size.py in execute_all(self, callable, total_size, natoms, *args, **kwargs)
    114         results = []
    115         while index < total_size:
--> 116             n_batch, result = self.execute(execute_with_batch_size, index, natoms)
    117             if not isinstance(result, tuple):
    118                 result = (result,)

~/Raid/conda_envs/dpmd_gpu_v2.0/lib/python3.9/site-packages/deepmd/utils/batch_size.py in execute(self, callable, start_index, natoms)
     64         """
     65         try:
---> 66             n_batch, result = callable(max(self.current_batch_size // natoms, 1), start_index)
     67         except OutOfMemoryError as e:
     68             # TODO: it's very slow to catch OOM error; I don't know what TF is doing here

~/Raid/conda_envs/dpmd_gpu_v2.0/lib/python3.9/site-packages/deepmd/utils/batch_size.py in execute_with_batch_size(batch_size, start_index)
    106             end_index = start_index + batch_size
    107             end_index = min(end_index, total_size)
--> 108             return (end_index - start_index), callable(
    109                 *[(vv[start_index:end_index] if isinstance(vv, np.ndarray) and vv.ndim > 1 else vv) for vv in args],
    110                 **{kk: (vv[start_index:end_index] if isinstance(vv, np.ndarray) and vv.ndim > 1 else vv) for kk, vv in kwargs.items()},

~/Raid/conda_envs/dpmd_gpu_v2.0/lib/python3.9/site-packages/deepmd/infer/deep_pot.py in _eval_inner(self, coords, cells, atom_types, fparam, aparam, atomic, efield)
    276         else:
    277             pbc = True
--> 278             cells = np.array(cells).reshape([nframes, 9])
    279 
    280         if self.has_fparam :

ValueError: cannot reshape array of size 1 into shape (1,9)

The problem is here I think:

cell = ss['cells'].reshape((-1,1))

The v2 version seems to require cell vector transposed. If I swap the dimension everything works fine:

cell = ss['cells'].reshape((1,-1))

[BUG] fmt="cp2k/output" unable to read CP2K log file

Summary
Using dpdata with fmt="cp2k/output" returns the error:

Traceback (most recent call last):
  File "/scratch3/usr/felix/deepmd-wat-dcm/train-128w-36dcm-wif-interfacial-virial-240NN-manual-sel/test-lammps/36dcm/0/dcm-recal/50/generate-data.py", line 4, in <module>
    d_cp2klog.to("deepmd/raw", "dpmd_raw")
  File "/home/felix/.conda/envs/dpdata/lib/python3.10/site-packages/dpdata/system.py", line 281, in to
    return self.to_fmt_obj(load_format(fmt), *args, **kwargs)
  File "/home/felix/.conda/envs/dpdata/lib/python3.10/site-packages/dpdata/system.py", line 1026, in to_fmt_obj
    return fmtobj.to_labeled_system(self.data, *args, **kwargs)
  File "/home/felix/.conda/envs/dpdata/lib/python3.10/site-packages/dpdata/format.py", line 77, in to_labeled_system
    return self.to_system(data, *args, **kwargs)
  File "/home/felix/.conda/envs/dpdata/lib/python3.10/site-packages/dpdata/plugins/deepmd.py", line 22, in to_system
    dpdata.deepmd.raw.dump(file_name, data)
  File "/home/felix/.conda/envs/dpdata/lib/python3.10/site-packages/dpdata/deepmd/raw.py", line 64, in dump
    nframes = data['cells'].shape[0]
AttributeError: 'list' object has no attribute 'shape

However, using fmt="cp2k/aimd_output" works well without any issue.

CP2K version: 7.1
dpdata version: 0.2.7

Steps to Reproduce
In the attached zip file:

  1. python generate-data.py uses dpdata with fmt="cp2k/output"
  2. python aimd-generate-data.py uses dpdata with fmt="cp2k/aimd_output"

Further Information, Files, and Links
50.zip

xyz data mistaken as containing 'virials'

When using dpdata I found that MultiSystems kept thinking I had 'virials' in my data, when actually I didn't.

system.py line 1292 Object of type Nonetype has no len()

I circumvented this issue by adding a data['virials'] is not None check everytime a related error was reported.

Would someone like to explain where (and why) 'virials' is added as a key in self.data by default?

Converting cp2k data to dp data

Hello,

I have some simulations run on cp2k and was trying to train a DP-model from the AIMD data. While I tried converting the pos.xyz and *.log files to dp files, it ended up with this error.
File "cp2k-to-dp.py", line 2, in <module> g_out = dpdata.LabeledSystem("*", fmt = 'cp2k/aimd_output') File "/ihome/kjohnson/ska31/.virtualenvs/tensorflow/lib/python3.6/site-packages/dpdata-0.1.15-py3.6.egg/dpdata/system.py", line 945, in __init__ self.from_fmt(file_name, fmt, type_map=type_map, begin= begin, step=step) File "/ihome/kjohnson/ska31/.virtualenvs/tensorflow/lib/python3.6/site-packages/dpdata-0.1.15-py3.6.egg/dpdata/system.py", line 134, in from_fmt func(self, file_name, **kwargs) File "/ihome/kjohnson/ska31/.virtualenvs/tensorflow/lib/python3.6/site-packages/dpdata-0.1.15-py3.6.egg/dpdata/system.py", line 999, in from_cp2k_aimd_output xyz_file=glob.glob("{}/*pos*.xyz".format(file_dir))[0]

Is this a bug or has this code been make for a certain version of cp2k?
The version that I am using is 6.1.

I did end up making changes to the source code and managed to convert these files to dp, however, there were no raw files created for forces. So I wasn't sure if my cp2k input scripts were correct.

It would be great if you could:

  1. Mention the version of cp2k to use (if there is no bug in the code)
  2. Probably provide sample examples for cp2k (where the forces are accounted for)

Best,
Sid

A more efficient way of reading MD trajectory

In the workflow, we do not need to read every frame of trajectory, but only what we want. So, we should firstly make the following dict to map the frame to the trajectory:

frames_dict = {
  Trajectory0: [23, 56, 78],
  Trajectory1: [22],
  ...
}

Then, reading each trajectory:

for traj, f_idx in frames_dict.items():
    traj.read(f_idx)

For a LAMMPS trajectory or other raw text files, the read should be

def read(self, f_idx: list[int]):
    with open(self.fname) as f:
        for ii, lines in enumerate(itertools.zip_longest(*[f] * self.nlines)):
            if ii not in f_idx:
                continue
            self.process_block(lines)

where nlines is the number of lines in each block, which should be determined in the very beginning. Usually, every frame has the same number of lines.

process_block method should convert a LAMMPS frame to dpdata.

[Feature Request] Add unit tests

Summary

Add unit tests for current modules. Some of them do not have packages installed.

Detailed Description

According to Codecov, the following codes doesn't have unit tests.
system:

  • System.__str__
  • System.dump
  • System.extend
  • System.predict (#277)
  • LabeledSystem.__str__
  • MultiSystems.__len__ (fixed by #172)
  • MultiSystems.__repr__
  • MultiSystems.__str__ (fixed by #172)
  • MultiSystems.from_dir
  • MultiSystems.get_nframes (fixed by #172)
  • MultiSystems.predict
  • MultiSystems.pick_atom_idx

formats:

  • amber/md: from_system
  • ase/structure: to_system (fixed by #171)
  • ase/structure: to_labeled_system
  • pymatgen/structure: to_system (fixed by #171)
  • pymatgen/structure: to_labeled_system (fixed by #171)
  • mol: to_bond_order_system
  • sdf: to_bond_order_system (#188)
  • siesta/output: from_system
  • siesta/aimd_output: from_system

Difference in virial data from cp2k `.log` file and `.stress` file

Summary

The virial outputs in the .log file and the .stress file by CP2K are different, but dpdata (fmt="cp2k/output") only reads the .log file. However, CP2K seems to output pv_virial in the .log file but print the pv_total in the .stress file. This might be problematic for generating virial data using dpdata.

Details

STRESS TENSOR FROM .log FILE OF ONE SINGLE FRAME CALCULATION

  STRESS TENSOR [GPa]

            X               Y               Z
  X      -3.22841220      0.03875193      0.35205443
  Y       0.03875193     -2.99491119     -0.26930668
  Z       0.35205443     -0.26930668     -3.16014758

  1/3 Trace(stress tensor):  -3.12782366E+00

  Det(stress tensor)      :  -2.99521219E+01


.stress FILE OF THE SAME CALCULATION

#   Step   Time [fs]            xx [bar]            xy [bar]            xz [bar]            yx [bar]            yy [bar]            yz [bar]            zx [bar]            zy [bar]            zz [bar]
       0       0.000   -27535.1132302270      252.2330873873     3489.2605728168      252.2330873873   -25741.2101685947    -2388.1540627668     3489.2605728168    -2388.1540627668   -26927.8388906513

[BUG] fmt="cp2k/aimd_output" unable to read CP2K9.1 log file

Summary
Using dpdata with fmt="cp2k/aimd_output" returns the error:
File "<stdin>", line 1, in <module> File "/home/jxzhang/deepmd-kit/lib/python3.10/site-packages/dpdata/system.py", line 136, in to return self.to_fmt_obj(load_format(fmt), *args, **kwargs) File "/home/jxzhang/deepmd-kit/lib/python3.10/site-packages/dpdata/system.py", line 903, in to_fmt_obj return fmtobj.to_labeled_system(self.data, *args, **kwargs) File "/home/jxzhang/deepmd-kit/lib/python3.10/site-packages/dpdata/format.py", line 77, in to_labeled_system return self.to_system(data, *args, **kwargs) File "/home/jxzhang/deepmd-kit/lib/python3.10/site-packages/dpdata/plugins/deepmd.py", line 52, in to_system dpdata.deepmd.comp.dump( File "/home/jxzhang/deepmd-kit/lib/python3.10/site-packages/dpdata/deepmd/comp.py", line 83, in dump nframes = data['cells'].shape[0] AttributeError: 'list' object has no attribute 'shape'

Further Information, Files, and Links
cp2k.zip

[BUG] _Failed to post_fp

After Gaussian has calculated all the cluster candidates, it terminates in step 8 (post_fp).
run.log reports an error ValueError: could not convert string to float: '219.870882556-1030.493229556' .

If other documents are needed, I will provide them in time.

run.log

[BUG] `virials` issue in vasprun.xml

Summary

When using dpdata to process vasprun.xml, an error will occur regarding virials: IndexError: index 0 is out of bounds for axis 0 with size 0.

Suggested solution (it works)
Two files need to be modified:
(1) ./dpdata/vasp/xml.py. One can delete the last output variable all_strs (around line 101)
(2) ./dpdata/system.py. (around line 1284) One can comment out the following code, as shown in below:

# for ii in range (self.get_nframes()) :                                                                                                                                                                                      
#     vol = np.linalg.det(np.reshape(self.data['cells'][ii], [3,3]))                                                                                                                                                          
#     self.data['virials'][ii] *= v_pref * vol 

Don't know whether it will affect anything (for other calculations), but so far it works for me.

Best,
Zhengda

[BUG] The order of elements affects the judgment of molecular formula

Summary

dpdata will change the order of elements in molecular formula, so KeyError is raised. The essentially same molecular formula should be recognized.

Steps to Reproduce
script.py

import dpdata
import os
sys_entire = dpdata.MultiSystems().from_deepmd_npy(os.path.join("data.rest"), labeled = False)
print(i for i in sys_entire)
print(sys_entire.systems)
subsys = sys_entire['I12Pb4C4N4H24'][0]
(base) โžœ test_dpdata python3 script.py
<generator object <genexpr> at 0x7fb43b5ecb30>
{'C4H24I12N4Pb4': Data Summary
Unlabeled System
-------------------
Frame Numbers     : 21316
Atom Numbers      : 48
Element List      :
-------------------
C  H  I  N  Pb
4  24  12  4  4, 'C8H48I24N8Pb8': Data Summary
Unlabeled System
-------------------
Frame Numbers     : 1887
Atom Numbers      : 96
Element List      :
-------------------
C  H  I  N  Pb
8  48  24  8  8}
Traceback (most recent call last):
  File "/root/test_simplify/test_dpdata/script.py", line 6, in <module>
    subsys = sys_entire['I12Pb4C4N4H24'][0]
  File "/opt/anaconda3/lib/python3.9/site-packages/dpdata-0.2.8.dev12+g3968d3d.d20220706-py3.9.egg/dpdata/system.py", line 1158, in __getitem__
    return self.systems[key]
KeyError: 'I12Pb4C4N4H24'

data is too large to attach

[BUG] __add__ function of system.py is not working

Summary

When I changed

self.system_1.append(self.system_2)

to

self.system_3=self.system_1+self.system_2

in tests/test_system_append.py, a pile of errors were reported.

..............[[[4.3485389  4.20903041 5.2       ]
  [2.30878039 6.27327007 1.13      ]
  [4.64061163 3.49272294 4.58      ]
  [3.97070725 3.80408719 6.03      ]
  [1.38402421 6.25106647 1.5       ]
  [2.30624337 5.88874931 0.21      ]]]
.EEEEEEEEEEEEEE
======================================================================
ERROR: test_add_func (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_atom_names (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_atom_numbs (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_atom_types (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_cell (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_coord (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_energy (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_force (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_is_pbc (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_len_func (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_nframs (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
  1 import os
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_nopbc (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10
  1 import os

======================================================================
ERROR: test_orig (__main__.TestVaspXmlAppend)
  1 import os
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

======================================================================
ERROR: test_virial (__main__.TestVaspXmlAppend)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_system_append.py", line 35, in setUp
    self.system_1 = self.system_1.sub_system([0, 12, 4, 16, 8])
  File "/home/tuoping/dpdata/dpdata/system.py", line 966, in sub_system
    tmp_sys.data = System.sub_system(self, f_idx).data
  File "/home/tuoping/dpdata/dpdata/system.py", line 286, in sub_system
    tmp.data['cells'] = self.data['cells'][f_idx].reshape(-1, 3, 3)
IndexError: index 12 is out of bounds for axis 0 with size 10

----------------------------------------------------------------------
Ran 29 tests in 0.310s

FAILED (errors=14)

Steps to Reproduce

Further Information, Files, and Links

[Feature Request] ASE Support

Dear DP Team,

I'm hoping to use ASE's xyz output format with dpdata.

Please let me know if this feature is already available.

Thank you,

Yulie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.