xtb_step Phase B – File Drop Notes#
This drop fills in the actual xTB-running functionality. After installing this on top of Phase A, the three substeps should produce real results when added to a flowchart and given a molecule.
Everything here is meant to overwrite the corresponding files in your
local xtb_step working tree. __init__.py, setup.py,
setup.cfg, the three *_step.py helper classes, and the four
tk_*.py GUI classes are unchanged from Phase A and do not need to
be touched. The data files (data/properties.csv,
data/references.bib, data/seamm-xtb.yml, data/xtb.ini) are
also unchanged from Phase A.
Files in this drop#
Replace existing#
xtb_step/substep.pyBig upgrade. Adds module-level constants for the method <-> CLI flag map (
METHODS,METHOD_TO_CLI), solvation-model and solvent lists (SOLVATION_MODELS,SOLVENTS_GBSA_ALPB,SOLVENTS_CPCMX), and a CLI builderbase_xtb_args(P, configuration)that all substeps share. Addsparse_thermo_block()for the Hessian thermochemistry table. Phase A’scheck_periodicity(),write_coord_xyz(),run_xtb(), andread_xtbout_json()are unchanged.xtb_step/metadata.pyReplaces the cookiecutter placeholder. Defines
metadata["computational models"](GFN0/1/2/FF, all flaggedperiodic: False) andmetadata["results"]with 17 entries covering total/electronic energy, gap, HOMO/LUMO, dipole (vector + magnitude), partial charges, gradients, frequencies, IR intensities, reduced masses, and the thermochemistry quantities. Property names use the<name>#xTB#{model}convention;{model}is filled in at storage time by SEAMM’sstore_results().xtb_step/energy_parameters.pyReplaces the placeholder
timeparameter with the real ones:method(enum: GFN2-xTB default),charge,multiplicity,accuracy,solvation model,solvent, plus the standardresultsdictionary. Used as the base class forOptimizationParametersand (viaOptimizationParameters)FrequenciesParameters.xtb_step/optimization_parameters.pyInherits from
EnergyParameters. Addsoptimization level(crude/sloppy/loose/lax/normal/tight/vtight/extreme),max iterations, andstructure handling(overwrite / new config / new system / discard).xtb_step/frequencies_parameters.pyInherits from
OptimizationParameters. Addsoptimize first(yes/no – selects--ohessvs--hess),temperature,pressure.xtb_step/energy.pyNow inherits from
Substep.run()writescoord.xyz, callscheck_periodicity(), builds the xtb command line viabase_xtb_args(), callsrun_xtb(), parsesxtbout.json, populates a results dict matchingmetadata["results"], and callsstore_results(). Citations are added to the references handler based on which method and solvation model were chosen. Defensive parsing – missing JSON keys are skipped, not errors.xtb_step/optimization.pyInherits from
Energy.run()injects--opt LEVELand delegates toEnergy.run(). Post-run, picks upxtbopt.xyzand applies it according to the user’sstructure handlingchoice.xtb_step/frequencies.pyInherits from
Optimization.run()injects--ohess LEVELor--hess(depending onoptimize first), bypassesOptimization.run()to avoid double--optinsertion, and callsEnergy.run()directly. Post-run, parses the:: THERMODYNAMIC ::block fromxtb.outand the JSON / vibspectrum frequencies.
Not changed since Phase A#
xtb_step/__init__.py, xtb_step/xtb.py,
xtb_step/xtb_step.py, xtb_step/energy_step.py,
xtb_step/optimization_step.py, xtb_step/frequencies_step.py,
xtb_step/tk_*.py, all of xtb_step/data/.
What to test after copying#
make lint && make install && make test– should still pass. Imports change but the public API (the four classes exported from__init__) does not.Open SEAMM, build a flowchart with FromSMILES -> xTB. Inside the xTB step add an Energy substep. Edit Energy: you should see the real parameters (method, charge, multiplicity, accuracy, solvation model, solvent), with sensible defaults.
Run the flowchart on something simple (water, methane). On a working
conda install -c conda-forge xtb, you should getxtb.out,xtbout.json, and a populated SEAMM properties database with at leasttotal energy#xTB#GFN2-xTBandband gap#xTB#GFN2-xTB.Optimization on the same molecule should additionally produce
xtbopt.xyzand update the configuration’s coordinates.Frequencies should additionally produce
vibspectrum,hessian,g98.out, plus thermo quantities in the database.
Known unknowns and likely failure points#
I have NOT been able to test any of this against a running xtb or a running SEAMM, so the following are my best guesses based on the docs and the FHI-aims / MOPAC analogs. Things most likely to need fixing:
Executor invocation.
substep.py:run_xtb()mirrors the FHI-aims pattern (self.parent.flowchart.executor,self.global_options,executor.run(cmd=..., config=..., ...)) but I haven’t run it. If it fails with anAttributeErroronexecutororglobal_options, that is the place to look. The likely culprit is whether the executor acceptsfiles={}or wants the input files written by us intoself.directoryfirst (which the code does –write_coord_xyz()writescoord.xyzdirectly intoself.directorybefore the call).xtbout.json key names. The xTB docs show keys like
"HOMO-LUMO gap / eV"(with spaces around the/), but I have seen alternative spellings in older versions ("HOMO-LUMO gap/eV", no spaces). The parser tries both. If your installed xtb uses yet another spelling for some quantity, expect that quantity to be missing from the results dict and add the alias toEnergy._harvest_json.Dipole-vector storage. I’m storing both the 3-component
dipole_vectorand the scalardipole_moment(magnitude) in the data dict.metadata["results"]declaresdipole_vectoras[3]dimensional but does NOT give it a property name (so it is variable/table only, not stored as a database property).dipole_momentIS stored as a property. If you’d rather only one or the other, drop the unwanted entry from bothmetadata.pyandEnergy._harvest_json.Configuration XYZ I/O.
write_coord_xyz()checks forconfiguration.to_xyz_text()and falls back to a hand-built XYZ.Optimization._handle_optimized_structure()checks forconfiguration.from_xyz_text()and falls back to manual parsing. I’m not 100% certain those methods exist in the molsystem version you’re using; the fallback paths should work either way.Multiple-method runs and ``self._model``.
self._modelis set insideEnergy.run()fromP["method"]beforestore_results()is called. SEAMM’sstore_results()usesself.model(which is exposed byseamm.Node) to format property names like"total energy#xTB#{model}". I’m assuming theself._model = ...assignment is whatself.modelreads. The FHI-aims code does the same thing. If property names come out as"...#xTB#"(empty model) or"...#xTB#{model}"(literal{model}), the assignment isn’t being picked up and we’ll need to add a property override.Inheritance trick in ``Optimization`` and ``Frequencies`` ``__init__``. Because
Energy.__init__setsself.parameters = xtb_step.EnergyParameters()and we wantOptimizationParametersinstead, the subclasses usesuper(Energy, self).__init__(...)to skip Energy and call Substep directly. This is correct Python but if it confusesseamm.Node(which expects to be initialized through a particular path), we may need to refactor. The cleanest alternative is to keepEnergy.__init__parameter-agnostic and have each subclass instantiate its own parameters after callingsuper().__init__().Thermochemistry temperature. v1 only supports xtb’s default 298.15 K. The parameter is exposed (
temperature) and a warning is printed if a different value is requested, but xcontrol’s$thermoblock is not yet wired up. This is a known limitation to fix in v1.x.Solvent list completeness. Pulled from the xTB docs at implementation time. May be missing solvents added in newer xtb releases; xtb itself will report unknown solvents at run time, and the user can also type a solvent name into the GUI as a free string (the parameter is an enumeration, but SEAMM enumerations don’t strictly enforce membership).
GFN-FF + solvation. GFN-FF is force-field-based; ALPB is parametrized for it (per Spicher & Grimme 2020), but GBSA may not be.
base_xtb_args()does not currently refuse GFN-FF + GBSA; xtb will error out with a clear message if it is unsupported.Where ``self.references`` is populated.
Energy._cite_references()doesself._bibliography["Bannwarth2021"]etc. The bibliography is loaded byseamm.Nodefromdata/references.bib(Phase A), keyed by the BibTex citation key on the first line of each entry. The keys we use areBannwarth2021,Bannwarth2019,Grimme2017,Pracht2019,Spicher2020,Ehlert2021, matching exactly what is in Phase A’sreferences.bib.
Code style#
All files compile cleanly. All lines are <= 88 characters (your
setup.cfg’s flake8 max-line-length). I have NOT been able to run
black --check here, so there may be small whitespace adjustments
when you do make format for the first time – those should be one
pass and then stable.