===================================
xtb_step Phase H -- File Drop Notes
===================================

Two design changes from the citations / units discussion:

1. Citation levels: promote the three DFT-D4 (Caldeweyher 2017, 2019,
   2020) papers from level 2 to level 1.  Reasoning is below.

2. Thermochemistry units: convert ZPE, H(T), T*S, G(T), and the total
   free energy from xtb's native E_h to kJ/mol at parse time.  Add a
   derived ``entropy`` quantity in J/mol/K so the chemist's natural
   form of S (rather than just T*S) appears in the table.


Rationale: citation levels
==========================

After looking at how MOPAC handles citations, my Phase G placement of
the D4 papers at level 2 was wrong.  MOPAC puts at level 1 the
program citation, the Hamiltonian paper, every dispersion correction
the user enabled (PM6-D3 cites Grimme 2010, PM6-DH+ cites Korth 2010,
etc.), and even individual element-parameter papers when those
elements are present.  A typical MOPAC PM7 calculation produces a
level-1 list of 5-10 references.

The SEAMM convention is "level 1 = anything that contributed to
producing this result; let the user cull to fit the journal's
citation budget".  Level 2 is for component-of-component references
that the user almost never wants in a primary list.

By that standard, the three D4 papers belong at level 1 whenever
GFN2-xTB is the active method, because GFN2's dispersion correction
is part of the method.  A user writing a paper with GFN2-xTB
calculations might cite all three (or just one canonical one), but
SEAMM's job is to surface them in the primary list rather than hide
them.

So Phase H promotes them.  GFN1, GFN0, and GFN-FF still do not
trigger D4 citations (those methods use older D3-style or different
dispersion treatments).

The current level-1 list for a default GFN2-xTB + ALPB-water
calculation is now:

* SEAMM (from the framework, automatic)
* RDKit (from from_smiles_step, automatic)
* Bannwarth 2021 (xTB program)
* Bannwarth 2019 (GFN2-xTB)
* Ehlert 2021 (ALPB)
* Caldeweyher 2017, 2019, 2020 (DFT-D4)
* xtb_step itself (from the plug-in self-cite, automatic)

A chemist writing a paper would probably keep all of these for a
methods section.  No level-2 citations fall out of v1 yet -- they
will once we add MD (need metadynamics ref), TD-xTB, etc.


Rationale: thermochemistry units
================================

Looking at thermochemistry_step's metadata, all of H, U, ZPE, G are
declared in kJ/mol and S in J/mol/K.  That matches what every modern
chemistry textbook and every Gaussian / Q-Chem / ORCA paper reports.
xtb's native output is in E_h, which is fine for the electronic
energy but unhelpful for thermo quantities -- nobody publishes
"G(T) = 0.002482 E_h".

This drop converts thermo energies at parse time using
``Q_(value, "E_h").to("kJ/mol")``, the same idiom used in
``thermochemistry_step``, ``gaussian_step``, and ``psi4_step``.  The
electronic energy and orbital energies stay in their existing units
(E_h for total/electronic energy, eV for HOMO/LUMO/gap) -- those are
also the chemistry-paper conventions.

I also added a derived ``entropy`` quantity (S in J/mol/K)
alongside ``entropy_term`` (T*S in kJ/mol).  xtb's output gives only
T*S; the entropy itself is computed as ``T*S * 1000 / T``.  Both
appear in the results table; only ``entropy`` (J/mol/K, the form
used in standard chemistry tables) is stored as a database property
since T*S as a separate property is rarely useful.


Files in this drop
==================

Replace existing
----------------

``xtb_step/energy.py``
    ``_cite_references``: the three Caldeweyher D4 cite calls now use
    ``level=1`` instead of ``level=2``.  Docstring rewritten to
    describe SEAMM's "level 1 = comprehensive contributing-papers
    list, user culls" convention.  Method-specific citations
    (Bannwarth/Grimme/Pracht/Spicher) and the solvation citations
    (Ehlert/Stahn) keep their existing level=1.

``xtb_step/substep.py``
    ``parse_thermo_block``: now converts E_h to kJ/mol on the fly
    using ``ureg.hartree.to("kJ/mol").magnitude`` as the conversion
    factor.  Also computes the new ``entropy`` field (S in J/mol/K)
    from the T*S column via ``S = T*S * 1000 / T``.  Returned dict
    keys are: ``temperature`` (K), ``zero_point_energy``,
    ``enthalpy``, ``entropy_term``, ``gibbs_free_energy``,
    ``total_free_energy`` (all kJ/mol), ``entropy`` (J/mol/K).

``xtb_step/metadata.py``
    All thermo entries' ``units`` field changed from ``"E_h"`` to
    ``"kJ/mol"``; ``format`` changed from ``.6f`` (which is
    appropriate for E_h with 6 significant decimals) to ``.4f``
    (appropriate for kJ/mol values that range from single digits to
    a few thousand).  New ``entropy`` entry in J/mol/K with a
    database property name ``entropy#xTB#{model}``.  ``entropy_term``
    is no longer a database property (it remains a results-table row
    for users who like the T*S form).

``xtb_step/frequencies.py``
    The thermo block in ``analyze`` now includes ``"entropy"`` in
    the key tuple, ordered between ``entropy_term`` and
    ``gibbs_free_energy`` so the table reads naturally:
    ZPE, H(T), T*S, S, G(T), total free, T.


Not in this drop
================

``tk_*.py``, ``optimization.py``, ``__init__.py``, ``setup.py``,
``energy_parameters.py``, ``optimization_parameters.py``,
``frequencies_parameters.py``, ``xtb.py``, the data files, and the
installer are unchanged.


Sanity check on the conversion
==============================

From the previous job.out (water, GFN2-xTB):

* ZPE was 0.020101 E_h = 52.78 kJ/mol.  Experimental water ZPE is
  ~55-56 kJ/mol; GFN2-xTB underestimates slightly, which is
  consistent with what the literature reports for tight-binding
  methods on hydrogen-bond systems.
* G(T) (the G(RRHO) contribution) was 0.002482 E_h = 6.52 kJ/mol.
  Reasonable for water at 298 K.
* T*S was 0.021400 E_h = 56.18 kJ/mol; entropy = 56.18 * 1000 / 298.15
  = 188.4 J/mol/K.  Literature water gas-phase entropy at 298.15 K
  is ~189 J/mol/K -- agrees almost exactly.  This is a nice
  sanity check on the parser and the conversion.


Test plan
=========

1. ``make lint && make install && make test`` -- still pass.
2. Re-run the water Energy/Optimization/Frequencies flowchart.
3. The Frequencies summary table should now show ZPE, H(T), T*S, S,
   G(T), total free energy in kJ/mol (with S in J/mol/K), with
   numbers that look like 52.78, 62.71, 56.18, 188.4, 6.52,
   -13311.78 (or thereabouts -- water E in kJ/mol scaled from
   -5.07 E_h).
4. The references list should now include the three Caldeweyher
   papers in the *primary* references section, not the secondary
   one.  The total primary count for GFN2-xTB+ALPB should be 6
   xtb-related references plus the framework refs.

Code style
==========

All four files compile cleanly, all lines <= 88 characters.  The
diffs are localized: one method body changed in ``substep.py``, six
metadata entries updated in ``metadata.py``, six lines changed in
``energy.py`` (the ``level=2`` -> ``level=1`` and the docstring),
and one tuple item added in ``frequencies.py``.