xtb_step Phase H – File Drop Notes#
Two design changes from the citations / units discussion:
Citation levels: promote the three DFT-D4 (Caldeweyher 2017, 2019, 2020) papers from level 2 to level 1. Reasoning is below.
Thermochemistry units: convert ZPE, H(T), T*S, G(T), and the total free energy from xtb’s native E_h to kJ/mol at parse time. Add a derived
entropyquantity in J/mol/K so the chemist’s natural form of S (rather than just T*S) appears in the table.
Rationale: citation levels#
After looking at how MOPAC handles citations, my Phase G placement of the D4 papers at level 2 was wrong. MOPAC puts at level 1 the program citation, the Hamiltonian paper, every dispersion correction the user enabled (PM6-D3 cites Grimme 2010, PM6-DH+ cites Korth 2010, etc.), and even individual element-parameter papers when those elements are present. A typical MOPAC PM7 calculation produces a level-1 list of 5-10 references.
The SEAMM convention is “level 1 = anything that contributed to producing this result; let the user cull to fit the journal’s citation budget”. Level 2 is for component-of-component references that the user almost never wants in a primary list.
By that standard, the three D4 papers belong at level 1 whenever GFN2-xTB is the active method, because GFN2’s dispersion correction is part of the method. A user writing a paper with GFN2-xTB calculations might cite all three (or just one canonical one), but SEAMM’s job is to surface them in the primary list rather than hide them.
So Phase H promotes them. GFN1, GFN0, and GFN-FF still do not trigger D4 citations (those methods use older D3-style or different dispersion treatments).
The current level-1 list for a default GFN2-xTB + ALPB-water calculation is now:
SEAMM (from the framework, automatic)
RDKit (from from_smiles_step, automatic)
Bannwarth 2021 (xTB program)
Bannwarth 2019 (GFN2-xTB)
Ehlert 2021 (ALPB)
Caldeweyher 2017, 2019, 2020 (DFT-D4)
xtb_step itself (from the plug-in self-cite, automatic)
A chemist writing a paper would probably keep all of these for a methods section. No level-2 citations fall out of v1 yet – they will once we add MD (need metadynamics ref), TD-xTB, etc.
Rationale: thermochemistry units#
Looking at thermochemistry_step’s metadata, all of H, U, ZPE, G are declared in kJ/mol and S in J/mol/K. That matches what every modern chemistry textbook and every Gaussian / Q-Chem / ORCA paper reports. xtb’s native output is in E_h, which is fine for the electronic energy but unhelpful for thermo quantities – nobody publishes “G(T) = 0.002482 E_h”.
This drop converts thermo energies at parse time using
Q_(value, "E_h").to("kJ/mol"), the same idiom used in
thermochemistry_step, gaussian_step, and psi4_step. The
electronic energy and orbital energies stay in their existing units
(E_h for total/electronic energy, eV for HOMO/LUMO/gap) – those are
also the chemistry-paper conventions.
I also added a derived entropy quantity (S in J/mol/K)
alongside entropy_term (T*S in kJ/mol). xtb’s output gives only
T*S; the entropy itself is computed as T*S * 1000 / T. Both
appear in the results table; only entropy (J/mol/K, the form
used in standard chemistry tables) is stored as a database property
since T*S as a separate property is rarely useful.
Files in this drop#
Replace existing#
xtb_step/energy.py_cite_references: the three Caldeweyher D4 cite calls now uselevel=1instead oflevel=2. Docstring rewritten to describe SEAMM’s “level 1 = comprehensive contributing-papers list, user culls” convention. Method-specific citations (Bannwarth/Grimme/Pracht/Spicher) and the solvation citations (Ehlert/Stahn) keep their existing level=1.xtb_step/substep.pyparse_thermo_block: now converts E_h to kJ/mol on the fly usingureg.hartree.to("kJ/mol").magnitudeas the conversion factor. Also computes the newentropyfield (S in J/mol/K) from the T*S column viaS = T*S * 1000 / T. Returned dict keys are:temperature(K),zero_point_energy,enthalpy,entropy_term,gibbs_free_energy,total_free_energy(all kJ/mol),entropy(J/mol/K).xtb_step/metadata.pyAll thermo entries’
unitsfield changed from"E_h"to"kJ/mol";formatchanged from.6f(which is appropriate for E_h with 6 significant decimals) to.4f(appropriate for kJ/mol values that range from single digits to a few thousand). Newentropyentry in J/mol/K with a database property nameentropy#xTB#{model}.entropy_termis no longer a database property (it remains a results-table row for users who like the T*S form).xtb_step/frequencies.pyThe thermo block in
analyzenow includes"entropy"in the key tuple, ordered betweenentropy_termandgibbs_free_energyso the table reads naturally: ZPE, H(T), T*S, S, G(T), total free, T.
Not in this drop#
tk_*.py, optimization.py, __init__.py, setup.py,
energy_parameters.py, optimization_parameters.py,
frequencies_parameters.py, xtb.py, the data files, and the
installer are unchanged.
Sanity check on the conversion#
From the previous job.out (water, GFN2-xTB):
ZPE was 0.020101 E_h = 52.78 kJ/mol. Experimental water ZPE is ~55-56 kJ/mol; GFN2-xTB underestimates slightly, which is consistent with what the literature reports for tight-binding methods on hydrogen-bond systems.
G(T) (the G(RRHO) contribution) was 0.002482 E_h = 6.52 kJ/mol. Reasonable for water at 298 K.
T*S was 0.021400 E_h = 56.18 kJ/mol; entropy = 56.18 * 1000 / 298.15 = 188.4 J/mol/K. Literature water gas-phase entropy at 298.15 K is ~189 J/mol/K – agrees almost exactly. This is a nice sanity check on the parser and the conversion.
Test plan#
make lint && make install && make test– still pass.Re-run the water Energy/Optimization/Frequencies flowchart.
The Frequencies summary table should now show ZPE, H(T), T*S, S, G(T), total free energy in kJ/mol (with S in J/mol/K), with numbers that look like 52.78, 62.71, 56.18, 188.4, 6.52, -13311.78 (or thereabouts – water E in kJ/mol scaled from -5.07 E_h).
The references list should now include the three Caldeweyher papers in the primary references section, not the secondary one. The total primary count for GFN2-xTB+ALPB should be 6 xtb-related references plus the framework refs.
Code style#
All four files compile cleanly, all lines <= 88 characters. The
diffs are localized: one method body changed in substep.py, six
metadata entries updated in metadata.py, six lines changed in
energy.py (the level=2 -> level=1 and the docstring),
and one tuple item added in frequencies.py.