Comments (12)
Thanks for reporting. There is indeed a bug in the released version, which should be fixed by #4.
from xtb.
I would very much appreciate having your recommendations as separate issues, since I cannot come up if a single pull request addressing all of them. Maybe a few thoughts here
- 2D structures are of course not reasonable as input for an GFN-xTB calculation, but could be used after a preprocessing/conversion step, either done externally or potentially on the
xtb
side. - I don't get this one,
xtb
doesn't use any bond topology information (it actually drops them while reading the SDF, which is part of the problem you reported). In principle we are producing bond order information inxtb
and could amend those from the SDF. - Same as for 1., we need hydrogen atoms for the calculation, unless we preprocess the structure explicitly such input is not suitable for GFN-xTB calculations.
- I always thought this is out-of-scope for
xtb
. I know that GFN-xTB is suited for this kind of high-throughput calculations, but thextb
program is not designed to work this way. Don't get me wrong here, I like the idea, but I don't think it is possible to get this running inxtb
in the near future. - This is kind of related to 1. and 3., first we don't use this information as per atom information, but we need to sum it up as molecular information. So it is more likely a bug that we are not taking this into account properly.
- This is actually the most easiest thing to address, since we are producing topological information. If we stop throwing away the SDF information, we could perform this check immediately.
Let's sum this up:
- we need to reimplement the SDF reader beyond the small fix in #4, R5 is kind of a bug in the current SDF reader and should be fixed along with this issue.
- R2 and R6 could be implemented with only minor changes. (separate issue or PR appreciated)
- R1 and R3 are both a larger feature request, I will discuss those with my boss and our group, maybe we can come up with something useful here.
- R4 is out of scope in my opinion, that means, I will not spend time to implement it in
xtb
.
from xtb.
IMHO, if a 2D case or an SDF is supplied without explicit hydrogens, I'd suggest calling obabel
to convert. (Granted, I'm biased.)
from xtb.
@awvwgk thank you for the additional comments, I will open individual requests with the new version. As you mentioned, not everything will be doable, but I have a test pipeline of a couple thousand of structures I want to send through. I will see what problems remain open after the patch. In the mean time I will just use the coord (tmol) input.
from xtb.
I've run through ~100k compounds using XYZ from Open Babel, but I have a validation set I've been using for holdout that I can run ~70k optimization + Hessian.
from xtb.
@ghutchis will you be able to share the results and your methods?
I am using either bash shell scripts or some single liner constructs that may add some penalty to the overall computational time. For example using the C60 fullerene data from Comprehensive theoretical study of all 1812 C60 isomers [XYZ] one can calculate single point energies with a wonderful speedup factor of 80 in 0.7 seconds for each molecule on a node with 44 CPUs/88 threads. The final time boils down to 22 min for 1812 molecules, based on the really awesome scaling of xtb.
$ for d in ./*/ ; do (cd "$d" && echo -n "active dir: $d" && xtb geom.xyz
| grep --line-buffered "TOTAL ENERGY"); done
active dir: ./1/ | TOTAL ENERGY -128.453291352107 Eh |
normal termination of xtb
active dir: ./10/ | TOTAL ENERGY -128.331358624594 Eh |
normal termination of xtb
active dir: ./100/ | TOTAL ENERGY -128.265600476068 Eh |
normal termination of xtb
active dir: ./1000/ | TOTAL ENERGY -128.166664861876 Eh |
normal termination of xtb
active dir: ./1001/ | TOTAL ENERGY -128.168084576528 Eh |
normal termination of xtb
active dir: ./1002/ | TOTAL ENERGY -128.169887114880 Eh |
normal termination of xtb
active dir: ./1003/ | TOTAL ENERGY -128.168197428449 Eh |
Of course once we calculate opt-freq we need to store the data, but I guess even 100k molecules should be doable in a couple of hours on multiple nodes. I wonder if one could pipe the SDF conversions from Open Babel directly to xtb and then extract the energies or other values one the fly. Its probably best done with the python API? I have not looked into that yet.
from xtb.
@tobigithub - this seems outside the discussion on this issue. Feel free to contact me. At the moment, the data isn't public, but we should be making several GFN-related manuscripts and data sets available soon. We generally use Python, but your shell script is similar and won't add much overhead.
from xtb.
I guess we are almost there, see #30. So there are a few open questions here, I need a reliable way to determine:
- that a structure is 2D (e.g. benzen is 2D in reality, so it should not catch this one)
- if explicit hydrogen atoms are missing (e.g. S8 doesn't need hydrogen atoms)
If it is not easily possible, I will go with garbage in / garbage out here.
from xtb.
A SDF is supposed to note whether it's a 2D depiction or a 3D coordinates (that might be flat):
https://gist.github.com/ghutchis/270df9db7c4f2f4a30519aabbaa3f73d
Note that in the 2nd line, a 2D depiction ends in '2D' and a 3D depiction ends in '3D'
As far as hydrogen atoms, I'd probably go with 'we're going to run what you sent us' - maybe with a warning if the charge/spin seem weird.
One request though would be to add up the formal charges in a SDF file from the M CHG lines:
https://gist.github.com/ghutchis/29130f768fd85444333b2a0ec643a441
M CHG 1 3 -1
(one charge on atom 3, -1)
M CHG 2 3 -1 5 -1
(two charges, atom 3 is -1 and atom 5 is -1)
I'd have to check on the radical options (M RAD), but there are a lot more cases with formal charges (e.g., amino acids, side chains, etc.)
M ISO might be useful for some purposes too.
from xtb.
I guess we are almost there, see #30. So there are a few open questions here, I need a reliable way to determine:
- that a structure is 2D (e.g. benzen is 2D in reality, so it should not catch this one)
- if explicit hydrogen atoms are missing (e.g. S8 doesn't need hydrogen atoms)
If it is not easily possible, I will go with garbage in / garbage out here.
@awvwgk I think that is something to check with tests, while it is impossible to cover all MDL MOL definitions, it will still leave room for errors. But covering 90% of the correct cases should be fine.
Meaning if molecules pass openbabel or molconvert they are fine, If xtb accepts molecules that do not pass a openbabel conversion something is wrong. Plus xtb should be able to calculate energies for a random selection of good molecules such as those below.
Data sources of good molecules can be PubChem, EBI, HMDB, NCI
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/01_conf_per_cmpd/SDF/
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/
https://www.ebi.ac.uk/chebi/downloadsForward.do
from xtb.
need a reliable way to determine:
that a structure is 2D (e.g. benzen is 2D in reality, so it should not catch this one)
A way to determine if a structure is 2D in MOL or SDF is that a single column's coordinates sum equals zero. In the case below, the z-axis is zero. It could also be x or y but most packages just zero out the z-axis. While there could be other cases (rotation along a specific bond) this is 99% never the case. I am sure there are matrix operations to test for all cases, but the sum of z-axis coordinates equals zero is a good test for having 2D molecules. Its what most of the software tools do.
Another observation is that xtb actually handles 2D molecules quite well. So it might not be a big issue, but additional tests are required. I am running the CHEBI molecules (https://www.ebi.ac.uk/chebi/downloadsForward.do) with a single energy point calculation to see if something is wrong.
Marvin 05071312412D
12 13 0 0 0 0 999 V2000
16.4064 -5.7650 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
16.4064 -6.6172 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
17.0908 -7.0082 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
17.8173 -6.6172 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
17.8173 -5.7650 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
17.0908 -5.3740 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
18.5856 -5.5415 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
19.0745 -6.1981 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
18.5856 -6.8407 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
17.0908 -4.5637 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
19.8987 -6.1981 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
15.7079 -7.0223 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
5 4 2 0 0 0 0
9 4 1 0 0 0 0
3 4 1 0 0 0 0
7 5 1 0 0 0 0
6 5 1 0 0 0 0
8 7 1 0 0 0 0
8 9 1 0 0 0 0
11 8 2 0 0 0 0
2 3 2 0 0 0 0
2 1 1 0 0 0 0
6 1 1 0 0 0 0
12 2 1 0 0 0 0
10 6 2 0 0 0 0
M END
> <ChEBI ID>
CHEBI:52617
> <ChEBI Name>
7,8-dihydro-8-oxoguanine
> <Star>
3
from xtb.
if explicit hydrogen atoms are missing (e.g. S8 doesn't need hydrogen atoms)
The test would be for organic compounds, most commonly carbon bound to hydrogen, nitrogen, oxygen, sulfur. So I would not be concerned with exotic cases, including metals or single elements. Elements CHNSOP are enough. Elements such as SNP will have multiple valencies, but here one could use the most common valence states for each element. The cheapest way would be to test if there is any carbon at all, short of fullerenes and pure carbon compounds.
A way to test would be via Lewis and Senior rules and check that against the charge. See
ANALOGOUS ODD-EVEN PARITIESIN MATHEMATICS AND CHEMISTRY
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.665.1134&rep=rep1&type=pdf
from xtb.
Related Issues (20)
- Optimization Error for Ag Metal Complex HOT 1
- Clarify copyright holders of dipro HOT 2
- Additional documentation needed for xtb IR module
- MTD simulations of a protein using GFN-FF
- xtb 6.6.1 segfaults in testsuite on Fedora 40 HOT 7
- Keyword for Octanol (wet) HOT 3
- ifort has been deprecated HOT 1
- Polarizability Per Atom
- Spin Polarization - Spin Constants for Hydrogen using GFN1 HOT 3
- Issue with GFN-FF MD in organic solvents HOT 1
- Installation errors with XTB 6.6.0 when compiling from source code (error when building tblite) HOT 3
- json flag ignored with tblite HOT 1
- test suite of a source build fails for 6.7.0 HOT 5
- Build xTB 6.7.0 failure when OpenMP disabled HOT 1
- Bug in gradient calculation
- Error while trying to convert 2d structure to 3d using gfn-ff
- Add option to compute/return Hessian in the C-API
- xtb crashes on Windows immediately HOT 3
- xtb compialtion failure ... HOT 1
- Visibly different structure depending on OMP_NUM_THREADS HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xtb.