===================================
xtb_step Phase B -- File Drop Notes
===================================

This drop fills in the actual xTB-running functionality. After installing
this on top of Phase A, the three substeps should produce real results
when added to a flowchart and given a molecule.

Everything here is meant to overwrite the corresponding files in your
local ``xtb_step`` working tree. ``__init__.py``, ``setup.py``,
``setup.cfg``, the three ``*_step.py`` helper classes, and the four
``tk_*.py`` GUI classes are unchanged from Phase A and do not need to
be touched. The data files (``data/properties.csv``,
``data/references.bib``, ``data/seamm-xtb.yml``, ``data/xtb.ini``) are
also unchanged from Phase A.


Files in this drop
==================

Replace existing
----------------

``xtb_step/substep.py``
    Big upgrade. Adds module-level constants for the method <-> CLI flag
    map (``METHODS``, ``METHOD_TO_CLI``), solvation-model and solvent
    lists (``SOLVATION_MODELS``, ``SOLVENTS_GBSA_ALPB``,
    ``SOLVENTS_CPCMX``), and a CLI builder ``base_xtb_args(P,
    configuration)`` that all substeps share. Adds ``parse_thermo_block()``
    for the Hessian thermochemistry table. Phase A's
    ``check_periodicity()``, ``write_coord_xyz()``, ``run_xtb()``, and
    ``read_xtbout_json()`` are unchanged.

``xtb_step/metadata.py``
    Replaces the cookiecutter placeholder. Defines
    ``metadata["computational models"]`` (GFN0/1/2/FF, all flagged
    ``periodic: False``) and ``metadata["results"]`` with 17 entries
    covering total/electronic energy, gap, HOMO/LUMO, dipole (vector +
    magnitude), partial charges, gradients, frequencies, IR
    intensities, reduced masses, and the thermochemistry quantities.
    Property names use the ``<name>#xTB#{model}`` convention; ``{model}``
    is filled in at storage time by SEAMM's ``store_results()``.

``xtb_step/energy_parameters.py``
    Replaces the placeholder ``time`` parameter with the real ones:
    ``method`` (enum: GFN2-xTB default), ``charge``, ``multiplicity``,
    ``accuracy``, ``solvation model``, ``solvent``, plus the standard
    ``results`` dictionary. Used as the base class for
    ``OptimizationParameters`` and (via ``OptimizationParameters``)
    ``FrequenciesParameters``.

``xtb_step/optimization_parameters.py``
    Inherits from ``EnergyParameters``. Adds ``optimization level``
    (crude/sloppy/loose/lax/normal/tight/vtight/extreme),
    ``max iterations``, and ``structure handling`` (overwrite / new
    config / new system / discard).

``xtb_step/frequencies_parameters.py``
    Inherits from ``OptimizationParameters``. Adds ``optimize first``
    (yes/no -- selects ``--ohess`` vs ``--hess``), ``temperature``,
    ``pressure``.

``xtb_step/energy.py``
    Now inherits from ``Substep``. ``run()`` writes ``coord.xyz``, calls
    ``check_periodicity()``, builds the xtb command line via
    ``base_xtb_args()``, calls ``run_xtb()``, parses ``xtbout.json``,
    populates a results dict matching ``metadata["results"]``, and calls
    ``store_results()``. Citations are added to the references handler
    based on which method and solvation model were chosen. Defensive
    parsing -- missing JSON keys are skipped, not errors.

``xtb_step/optimization.py``
    Inherits from ``Energy``. ``run()`` injects ``--opt LEVEL`` and
    delegates to ``Energy.run()``. Post-run, picks up ``xtbopt.xyz`` and
    applies it according to the user's ``structure handling`` choice.

``xtb_step/frequencies.py``
    Inherits from ``Optimization``. ``run()`` injects ``--ohess LEVEL``
    or ``--hess`` (depending on ``optimize first``), bypasses
    ``Optimization.run()`` to avoid double ``--opt`` insertion, and
    calls ``Energy.run()`` directly. Post-run, parses the ``::
    THERMODYNAMIC ::`` block from ``xtb.out`` and the JSON / vibspectrum
    frequencies.


Not changed since Phase A
-------------------------

``xtb_step/__init__.py``, ``xtb_step/xtb.py``,
``xtb_step/xtb_step.py``, ``xtb_step/energy_step.py``,
``xtb_step/optimization_step.py``, ``xtb_step/frequencies_step.py``,
``xtb_step/tk_*.py``, all of ``xtb_step/data/``.


What to test after copying
==========================

1. ``make lint && make install && make test`` -- should still pass.
   Imports change but the public API (the four classes exported from
   ``__init__``) does not.

2. Open SEAMM, build a flowchart with FromSMILES -> xTB. Inside the
   xTB step add an Energy substep. Edit Energy: you should see the
   real parameters (method, charge, multiplicity, accuracy, solvation
   model, solvent), with sensible defaults.

3. Run the flowchart on something simple (water, methane). On a
   working ``conda install -c conda-forge xtb``, you should get
   ``xtb.out``, ``xtbout.json``, and a populated SEAMM properties
   database with at least ``total energy#xTB#GFN2-xTB`` and
   ``band gap#xTB#GFN2-xTB``.

4. Optimization on the same molecule should additionally produce
   ``xtbopt.xyz`` and update the configuration's coordinates.

5. Frequencies should additionally produce ``vibspectrum``, ``hessian``,
   ``g98.out``, plus thermo quantities in the database.


Known unknowns and likely failure points
========================================

I have NOT been able to test any of this against a running xtb or a
running SEAMM, so the following are my best guesses based on the docs
and the FHI-aims / MOPAC analogs. Things most likely to need fixing:

1. **Executor invocation.** ``substep.py:run_xtb()`` mirrors the
   FHI-aims pattern (``self.parent.flowchart.executor``,
   ``self.global_options``, ``executor.run(cmd=..., config=..., ...)``)
   but I haven't run it. If it fails with an ``AttributeError`` on
   ``executor`` or ``global_options``, that is the place to look.
   The likely culprit is whether the executor accepts ``files={}`` or
   wants the input files written by us into ``self.directory`` first
   (which the code does -- ``write_coord_xyz()`` writes ``coord.xyz``
   directly into ``self.directory`` before the call).

2. **xtbout.json key names.** The xTB docs show keys like
   ``"HOMO-LUMO gap / eV"`` (with spaces around the ``/``), but I have
   seen alternative spellings in older versions (``"HOMO-LUMO gap/eV"``,
   no spaces). The parser tries both. If your installed xtb uses yet
   another spelling for some quantity, expect that quantity to be
   missing from the results dict and add the alias to
   ``Energy._harvest_json``.

3. **Dipole-vector storage.** I'm storing both the 3-component
   ``dipole_vector`` and the scalar ``dipole_moment`` (magnitude) in
   the data dict. ``metadata["results"]`` declares ``dipole_vector``
   as ``[3]`` dimensional but does NOT give it a property name (so it
   is variable/table only, not stored as a database property).
   ``dipole_moment`` IS stored as a property. If you'd rather only one
   or the other, drop the unwanted entry from both ``metadata.py`` and
   ``Energy._harvest_json``.

4. **Configuration XYZ I/O.** ``write_coord_xyz()`` checks for
   ``configuration.to_xyz_text()`` and falls back to a hand-built XYZ.
   ``Optimization._handle_optimized_structure()`` checks for
   ``configuration.from_xyz_text()`` and falls back to manual parsing.
   I'm not 100% certain those methods exist in the molsystem version
   you're using; the fallback paths should work either way.

5. **Multiple-method runs and ``self._model``.** ``self._model`` is set
   inside ``Energy.run()`` from ``P["method"]`` before
   ``store_results()`` is called. SEAMM's ``store_results()`` uses
   ``self.model`` (which is exposed by ``seamm.Node``) to format
   property names like ``"total energy#xTB#{model}"``. I'm assuming the
   ``self._model = ...`` assignment is what ``self.model`` reads. The
   FHI-aims code does the same thing. If property names come out as
   ``"...#xTB#"`` (empty model) or ``"...#xTB#{model}"`` (literal
   ``{model}``), the assignment isn't being picked up and we'll need
   to add a property override.

6. **Inheritance trick in ``Optimization`` and ``Frequencies``
   ``__init__``.** Because ``Energy.__init__`` sets
   ``self.parameters = xtb_step.EnergyParameters()`` and we want
   ``OptimizationParameters`` instead, the subclasses use ``super(Energy,
   self).__init__(...)`` to skip Energy and call Substep directly. This
   is correct Python but if it confuses ``seamm.Node`` (which expects
   to be initialized through a particular path), we may need to
   refactor. The cleanest alternative is to keep
   ``Energy.__init__`` parameter-agnostic and have each subclass
   instantiate its own parameters after calling ``super().__init__()``.

7. **Thermochemistry temperature.** v1 only supports xtb's default
   298.15 K. The parameter is exposed (``temperature``) and a warning
   is printed if a different value is requested, but xcontrol's
   ``$thermo`` block is not yet wired up. This is a known limitation
   to fix in v1.x.

8. **Solvent list completeness.** Pulled from the xTB docs at
   implementation time. May be missing solvents added in newer xtb
   releases; xtb itself will report unknown solvents at run time, and
   the user can also type a solvent name into the GUI as a free string
   (the parameter is an enumeration, but SEAMM enumerations don't
   strictly enforce membership).

9. **GFN-FF + solvation.** GFN-FF is force-field-based; ALPB is
   parametrized for it (per Spicher & Grimme 2020), but GBSA may not
   be. ``base_xtb_args()`` does not currently refuse GFN-FF + GBSA;
   xtb will error out with a clear message if it is unsupported.

10. **Where ``self.references`` is populated.** ``Energy._cite_references()``
    does ``self._bibliography["Bannwarth2021"]`` etc. The bibliography
    is loaded by ``seamm.Node`` from ``data/references.bib`` (Phase A),
    keyed by the BibTex citation key on the first line of each entry.
    The keys we use are ``Bannwarth2021``, ``Bannwarth2019``,
    ``Grimme2017``, ``Pracht2019``, ``Spicher2020``, ``Ehlert2021``,
    matching exactly what is in Phase A's ``references.bib``.


Code style
==========

All files compile cleanly. All lines are <= 88 characters (your
``setup.cfg``'s flake8 max-line-length). I have NOT been able to run
``black --check`` here, so there may be small whitespace adjustments
when you do ``make format`` for the first time -- those should be one
pass and then stable.