xtb_step Phase B – File Drop Notes#

This drop fills in the actual xTB-running functionality. After installing this on top of Phase A, the three substeps should produce real results when added to a flowchart and given a molecule.

Everything here is meant to overwrite the corresponding files in your local xtb_step working tree. __init__.py, setup.py, setup.cfg, the three *_step.py helper classes, and the four tk_*.py GUI classes are unchanged from Phase A and do not need to be touched. The data files (data/properties.csv, data/references.bib, data/seamm-xtb.yml, data/xtb.ini) are also unchanged from Phase A.

Files in this drop#

Replace existing#

xtb_step/substep.py: Big upgrade. Adds module-level constants for the method <-> CLI flag map (METHODS, METHOD_TO_CLI), solvation-model and solvent lists (SOLVATION_MODELS, SOLVENTS_GBSA_ALPB, SOLVENTS_CPCMX), and a CLI builder base_xtb_args(P, configuration) that all substeps share. Adds parse_thermo_block() for the Hessian thermochemistry table. Phase A’s check_periodicity(), write_coord_xyz(), run_xtb(), and read_xtbout_json() are unchanged.
xtb_step/metadata.py: Replaces the cookiecutter placeholder. Defines metadata["computational models"] (GFN0/1/2/FF, all flagged periodic: False) and metadata["results"] with 17 entries covering total/electronic energy, gap, HOMO/LUMO, dipole (vector + magnitude), partial charges, gradients, frequencies, IR intensities, reduced masses, and the thermochemistry quantities. Property names use the <name>#xTB#{model} convention; {model} is filled in at storage time by SEAMM’s store_results().
xtb_step/energy_parameters.py: Replaces the placeholder time parameter with the real ones: method (enum: GFN2-xTB default), charge, multiplicity, accuracy, solvation model, solvent, plus the standard results dictionary. Used as the base class for OptimizationParameters and (via OptimizationParameters) FrequenciesParameters.
xtb_step/optimization_parameters.py: Inherits from EnergyParameters. Adds optimization level (crude/sloppy/loose/lax/normal/tight/vtight/extreme), max iterations, and structure handling (overwrite / new config / new system / discard).
xtb_step/frequencies_parameters.py: Inherits from OptimizationParameters. Adds optimize first (yes/no – selects --ohess vs --hess), temperature, pressure.
xtb_step/energy.py: Now inherits from Substep. run() writes coord.xyz, calls check_periodicity(), builds the xtb command line via base_xtb_args(), calls run_xtb(), parses xtbout.json, populates a results dict matching metadata["results"], and calls store_results(). Citations are added to the references handler based on which method and solvation model were chosen. Defensive parsing – missing JSON keys are skipped, not errors.
xtb_step/optimization.py: Inherits from Energy. run() injects --opt LEVEL and delegates to Energy.run(). Post-run, picks up xtbopt.xyz and applies it according to the user’s structure handling choice.
xtb_step/frequencies.py: Inherits from Optimization. run() injects --ohess LEVEL or --hess (depending on optimize first), bypasses Optimization.run() to avoid double --opt insertion, and calls Energy.run() directly. Post-run, parses the :: THERMODYNAMIC :: block from xtb.out and the JSON / vibspectrum frequencies.

Not changed since Phase A#

xtb_step/__init__.py, xtb_step/xtb.py, xtb_step/xtb_step.py, xtb_step/energy_step.py, xtb_step/optimization_step.py, xtb_step/frequencies_step.py, xtb_step/tk_*.py, all of xtb_step/data/.

What to test after copying#

make lint && make install && make test – should still pass. Imports change but the public API (the four classes exported from __init__) does not.
Open SEAMM, build a flowchart with FromSMILES -> xTB. Inside the xTB step add an Energy substep. Edit Energy: you should see the real parameters (method, charge, multiplicity, accuracy, solvation model, solvent), with sensible defaults.
Run the flowchart on something simple (water, methane). On a working conda install -c conda-forge xtb, you should get xtb.out, xtbout.json, and a populated SEAMM properties database with at least total energy#xTB#GFN2-xTB and band gap#xTB#GFN2-xTB.
Optimization on the same molecule should additionally produce xtbopt.xyz and update the configuration’s coordinates.
Frequencies should additionally produce vibspectrum, hessian, g98.out, plus thermo quantities in the database.

Known unknowns and likely failure points#

I have NOT been able to test any of this against a running xtb or a running SEAMM, so the following are my best guesses based on the docs and the FHI-aims / MOPAC analogs. Things most likely to need fixing:

Executor invocation. substep.py:run_xtb() mirrors the FHI-aims pattern (self.parent.flowchart.executor, self.global_options, executor.run(cmd=..., config=..., ...)) but I haven’t run it. If it fails with an AttributeError on executor or global_options, that is the place to look. The likely culprit is whether the executor accepts files={} or wants the input files written by us into self.directory first (which the code does – write_coord_xyz() writes coord.xyz directly into self.directory before the call).
xtbout.json key names. The xTB docs show keys like "HOMO-LUMO gap / eV" (with spaces around the /), but I have seen alternative spellings in older versions ("HOMO-LUMO gap/eV", no spaces). The parser tries both. If your installed xtb uses yet another spelling for some quantity, expect that quantity to be missing from the results dict and add the alias to Energy._harvest_json.
Dipole-vector storage. I’m storing both the 3-component dipole_vector and the scalar dipole_moment (magnitude) in the data dict. metadata["results"] declares dipole_vector as [3] dimensional but does NOT give it a property name (so it is variable/table only, not stored as a database property). dipole_moment IS stored as a property. If you’d rather only one or the other, drop the unwanted entry from both metadata.py and Energy._harvest_json.
Configuration XYZ I/O. write_coord_xyz() checks for configuration.to_xyz_text() and falls back to a hand-built XYZ. Optimization._handle_optimized_structure() checks for configuration.from_xyz_text() and falls back to manual parsing. I’m not 100% certain those methods exist in the molsystem version you’re using; the fallback paths should work either way.
Multiple-method runs and ``self._model``. self._model is set inside Energy.run() from P["method"] before store_results() is called. SEAMM’s store_results() uses self.model (which is exposed by seamm.Node) to format property names like "total energy#xTB#{model}". I’m assuming the self._model = ... assignment is what self.model reads. The FHI-aims code does the same thing. If property names come out as "...#xTB#" (empty model) or "...#xTB#{model}" (literal {model}), the assignment isn’t being picked up and we’ll need to add a property override.
Inheritance trick in ``Optimization`` and ``Frequencies`` ``__init__``. Because Energy.__init__ sets self.parameters = xtb_step.EnergyParameters() and we want OptimizationParameters instead, the subclasses use super(Energy, self).__init__(...) to skip Energy and call Substep directly. This is correct Python but if it confuses seamm.Node (which expects to be initialized through a particular path), we may need to refactor. The cleanest alternative is to keep Energy.__init__ parameter-agnostic and have each subclass instantiate its own parameters after calling super().__init__().
Thermochemistry temperature. v1 only supports xtb’s default 298.15 K. The parameter is exposed (temperature) and a warning is printed if a different value is requested, but xcontrol’s $thermo block is not yet wired up. This is a known limitation to fix in v1.x.
Solvent list completeness. Pulled from the xTB docs at implementation time. May be missing solvents added in newer xtb releases; xtb itself will report unknown solvents at run time, and the user can also type a solvent name into the GUI as a free string (the parameter is an enumeration, but SEAMM enumerations don’t strictly enforce membership).
GFN-FF + solvation. GFN-FF is force-field-based; ALPB is parametrized for it (per Spicher & Grimme 2020), but GBSA may not be. base_xtb_args() does not currently refuse GFN-FF + GBSA; xtb will error out with a clear message if it is unsupported.
Where ``self.references`` is populated. Energy._cite_references() does self._bibliography["Bannwarth2021"] etc. The bibliography is loaded by seamm.Node from data/references.bib (Phase A), keyed by the BibTex citation key on the first line of each entry. The keys we use are Bannwarth2021, Bannwarth2019, Grimme2017, Pracht2019, Spicher2020, Ehlert2021, matching exactly what is in Phase A’s references.bib.

Code style#

All files compile cleanly. All lines are <= 88 characters (your setup.cfg’s flake8 max-line-length). I have NOT been able to run black --check here, so there may be small whitespace adjustments when you do make format for the first time – those should be one pass and then stable.