molsystem package#

Submodules#

molsystem.align module#

Alignment methods for configurations

class molsystem.align.AlignMixin[source]#

Bases: object

A mixin for handling alignment of configuration.

RMSD(other, include_h=True, symmetry=False)[source]#

Compute the RMSD between configurations.

Parameters:
  • other (Configuration or iterable of Configurations) – A single configuration or e.g. list of configurations to match

  • include_h (bool = True) – Whether to include hydrogen atoms in the RMSD

  • symmetry (bool = False) – Whether to detect symmetric flips. Note this requires a lot of memory for larger systems.

Returns:

The RMS between the current configuration and the target(s).

Return type:

float or [float]

align(other, include_h=True, symmetry=False)[source]#

Align configurations.

Parameters:
  • other (Configuration or iterable of Configurations) – A single configuration or e.g. list of configurations to match

  • include_h (bool = True) – Whether to include hydrogen atoms in the RMSD

  • symmetry (bool = False) – Whether to detect symmetric flips. Note this requires a lot of memory for larger systems.

Returns:

The RMS between the current configuration and the target(s).

Return type:

float or [float]

molsystem.atoms module#

A dictionary-like object for holding atoms.

molsystem.atoms.grouped(iterable, n)[source]#

s -> (s0,s1,s2,…sn-1), (sn,sn+1,sn+2,…s2n-1), (s2n,…s3n-1), …

molsystem.bonds module#

A dictionary-like object for holding bonds

Based on tables in an SQLite database.

molsystem.bonds.grouped(iterable, n)[source]#

s -> (s0,s1,s2,…sn-1), (sn,sn+1,sn+2,…s2n-1), (s2n,…s3n-1), …

molsystem.cell module#

class molsystem.cell.Cell(a, b, c, alpha, beta, gamma)[source]#

Bases: object

A class to handle cell parameters and their transformations.

property a#

The length of the first cell vector.

property alpha#

The angle between b and c.

property b#

The length of the second cell vector.

property beta#

The angle between a and c.

property c#

The length of the third cell vector.

equal(other, tol=1e-06)[source]#

Check if we are equal to another iterable to within a tolerance.

Parameters:
  • other (iterable) – The other object to check against

  • tol (float = 1.0e-06) – The tolerance for comparing floating point numbers.

Returns:

equals – Boolean indicating whether the two are equal.

Return type:

bool

from_vectors(vectors)[source]#

Set the cell parameters from the lattice vectors.

Parameters:

vectors ([[float*3]*3]) – The lattice vectors as a list [a, b, c]

property gamma#

The angle between a and b.

property parameters#

The cell parameters as a list.

reciprocal_lengths()[source]#

The length of the reciprocal space lattice vectors, physics definition

Returns:

The 3 vector lengths

Return type:

[float*3]

reciprocal_vectors(as_array=False)[source]#

The reciprocal space lattice vectors. Physics definition with 2 pi

Parameters:

as_array (bool = False) – Whether to return a numpy array or Python lists

Returns:

transform – The transformation matrix

Return type:

[N][float*3] or ndarray

strain(*args)[source]#

Strain the cell.

Parameters:

args (6 * [float] or 6 floats) – The strain in Voigt notation, either a 6-vector or six floats.

to_cartesians(uvw, as_array=False)[source]#

Convert fraction coordinates to Cartesians

see https://en.wikipedia.org/wiki/Fractional_coordinates for a description.

Parameters:

uvw ([N][3*float] or ndarray) – The fractional coordinates.

Returns:

xyz – The Cartesian coordinates.

Return type:

[N][float*3] or ndarray

to_cartesians_transform(as_array=False)[source]#

Matrix to convert fractional coordinates to Cartesian.

see https://en.wikipedia.org/wiki/Fractional_coordinates for a description.

Parameters:

as_array (bool = False) – Whether to return a numpy array or Python lists

Returns:

transform – The transformation matrix

Return type:

[N][float*3] or ndarray

to_fractionals(xyz, as_array=False)[source]#

Convert Cartesian coordinates to fractional.

see https://en.wikipedia.org/wiki/Fractional_coordinates for a description.

Parameters:

xyz ([N][3*float] or ndarray) – The Cartesian coordinates.

Returns:

uvw – The ractional coordinates.

Return type:

[N][float*3] or ndarray

to_fractionals_transform(as_array=False)[source]#

Matrix to convert Cartesian coordinates to fractional.

see https://en.wikipedia.org/wiki/Fractional_coordinates for a description.

Parameters:

as_array (bool = False) – Whether to return a numpy array or Python lists

Returns:

transform – The transformation matrix

Return type:

[N][float*3] or ndarray

vectors(as_array=False)[source]#

The cell or lattice vectors.

Parameters:

as_array (bool = False) – Whether to return a numpy array or Python lists

Returns:

transform – The transformation matrix

Return type:

[N][float*3] or ndarray

property volume#

The volume of the cell.

molsystem.cell.cos(value)[source]#
molsystem.cell.dot(va, vb)[source]#
molsystem.cell.sin(value)[source]#

molsystem.cif module#

Functions for handling CIF files

Bond Orders#

1 sing single bond 2 doub double bond 3 trip triple bond 4 quad quadruple bond 5 arom aromatic bond 6 delo delocalized double bond 7 pi pi bond 8 poly polymeric bond

class molsystem.cif.CIFMixin[source]#

Bases: object

A mixin for handling CIF files.

from_cif_text(text)[source]#

Create this configuration from a CIF file..

Parameters:

text (str) – The text from the CIF file

Return type:

None

from_mmcif_text(text)[source]#

Create this configuration from an MMCIF file..

Parameters:

text (str) – The text from the MMCIF file

Return type:

None

Notes

This can be called from a SystemDB, _System or _Configuration object. The behavior and errors differ depending on what type of object is calling it:

SystemDB

When called from a _System object, a new _System will be created for each datablock in the CIF data, and a configuration will be created to hold the structure, unless their is an NMR ensemble, in which case each structure in the ensemble will be placed in a different configuration.

_System

In this case it is an error if there is more than one datablock. A new configuration will be created with the structure in the CIF datablock, unless the CIF data contains an NMR ensemble, in which case a configuration will be added for each conformer.

_Configuration

It is an error if there is more than one datablock in the CIF data. The configuration will be cleared and the structure from CIF data inserted into it. If there is an NMR ensemble in the datablock, and the representative conformer is identified, it will be loaded into the configuration. Otherwise an error will be raised.

read_cif_file(path)[source]#

Create new systems from a CIF file.

Read a CIF file and create a new system from each datablock in the file.

If the datablock has an ensemble, as denoted by a section ‘_pdbx_nmr_ensemble’, a configuration will be created for each conformer. If there is a representative conformer, the current configuration will point to it; otherwise to the last conformer.

Parameters:

path (str or Path) – A string or Path object pointing to the file to be read.

Returns:

List of systems created.

Return type:

[_System]

to_cif_text()[source]#

Create the text of a CIF file from this configuration.

Returns:

text – The text of the file.

Return type:

str

to_mmcif_text()[source]#

Create the text of a mmCIF file from this configuration.

Returns:

text – The text of the file.

Return type:

str

molsystem.cms_schema module#

Interface to the CMS schema.

class molsystem.cms_schema.CMSSchemaMixin[source]#

Bases: object

A mixin for handling CMS Schema.

to_cms_schema(properties=None)[source]#

Create a dictionary compliant with CMS Schema.

molsystem.column module#

molsystem.configuration module#

molsystem.configuration_properties module#

Property methods for configurations.

molsystem.elements module#

Tabulated data about the elements.

molsystem.elements.masses(atno_or_symbols)[source]#

Get the atomic mass given atomic symbols or numbers.

Parameters:

atno_or_symbols ([int or str]) – The atomic numbers or symbols

Returns:

masses – The atomic masses

Return type:

[float]

molsystem.elements.to_atnos(symbols)[source]#

Convert element symbols to atomic numbers.

Parameters:

symbols ([str]) – The atomic symbols

Returns:

atnos – The corresponding atomic numbers (1..118)

Return type:

[int]

molsystem.elements.to_symbols(atnos)[source]#

Convert atomic numbers to element symbols.

Parameters:

atnos ([int]) – The atomic numbers (1..118)

Returns:

symbols – The corresponding atomic symbols

Return type:

[str]

molsystem.frozencolumn module#

molsystem.inchi module#

Functions for handling InChI

class molsystem.inchi.InChIMixin[source]#

Bases: object

A mixin for handling InChI.

from_inchi(inchi, name=None, reorient=True, openbabel=True)[source]#

Create the system from a InChI string.

Parameters:
  • inchi (str) – The InChI string

  • name (str = None) – The name of the molecule

  • reorient (bool = True) – Whether to reorient to the standard orientation

  • openbabel (bool = False) – Whether to use Openbabel rather than default of RDKit

Return type:

None

from_inchikey(inchikey, name=None, reorient=True)[source]#

Create the system from an InChIKey string.

Parameters:
  • inchikey (str) – The InChIKey string

  • name (str = None) – The name of the molecule

  • reorient (bool = True) – Whether to reorient to the standard orientation

Return type:

None

property inchi#

Return the InChI string for this object.

property inchikey#

Return the InChIKey string for this object.

to_inchi(key=False, openbabel=False)[source]#

Create the InChI string from the system.

Parameters:
  • key (bool = False) – Whether to create the InChIKey

  • openbabel (bool = False) – Whether to use OpenBabel rather than default of RDkit

Returns:

The InChI string, or (InChI, name) if the rname is requested

Return type:

str

molsystem.molfile module#

Functions for handling MDL molfiles

class molsystem.molfile.MolFileMixin[source]#

Bases: object

A mixin for handling MDL Molfiles.

from_molfile_text(data)[source]#

Create the system from an MDL Molfile, version 3

Parameters:

data (str) – The complete text of the Molfile.

to_molfile_text(title=None, comment='Exported from SEAMM')[source]#

Create the text of the Molfile from the system.

Parameters:
  • title (str = None) – The title for the structure, by default the system name.

  • comment (str = 'Exported from SEAMM') – Comment line

Returns:

text – The text of the file.

Return type:

str

molsystem.openbabel module#

Interface to openbabel.

class molsystem.openbabel.OpenBabelMixin[source]#

Bases: object

A mixin for handling OpenBabel via its Python interface.

coordinates_from_OBMol(ob_mol)[source]#

Update the coordinates from an Open Babel molecule.

coordinates_to_OBMol(ob_mol)[source]#

Update the coordinates of an Open Babel molecule from the configuration.

find_substructures(template)[source]#

Find the substructures matching the template.

Parameters:

template (str, _Configuration, _Template, or _Subset) – The template, which may be a SMARTS string, or a molecular object.

Returns:

Lists of atom ids for matches.

Return type:

[[int]]

from_OBMol(ob_mol, properties='all', atoms=True, coordinates=True, bonds=True)[source]#

Transform an Open Babel molecule into the current object.

Parameters:
  • rdk_mol (rdkit.chem.molecule) – The RDKit molecule object

  • properties (str = "all") – Whether to include all properties or none

  • atoms (bool = True) – Recreate the atoms

  • coordinates (bool = True) – Update the coordinates

  • bonds (bool = True) – Recreate the bonds from the RDKit molecule

Return type:

molsystem._Configuration

from_sdf(path)[source]#

Directly read an SDF file for the configuration.

Parameters:

path (pathlib.Path or str) – The path or name of the file to write.

Returns:

system name, configuration name

Return type:

(str, str)

from_sdf_text(text)[source]#

Get the text of an SDF file for the configuration.

Parameters:

text (str) – The text of an SDF file

Returns:

system name, configuration name

Return type:

(str, str)

to_OBMol(properties=None)[source]#

Return an OBMol object for the configuration, template, or subset.

to_sdf(path)[source]#

Directly write an SDF file for the configuration.

Parameters:

path (pathlib.Path or str) – The path or name of the file to write.

Returns:

The text of the SDF file

Return type:

str

to_sdf_text()[source]#

Get the text of an SDF file for the configuration.

Returns:

The text of the SDF file

Return type:

str

molsystem.openbabel.openbabel_version()[source]#

Return the version of openbabel.

molsystem.openeye module#

Interface to OpenEye OEChem.

class molsystem.openeye.OpenEyeMixin[source]#

Bases: object

A mixin for handling OpenEye’s software via its Python interface.

from_OEMol(oe_mol, properties='all', atoms=True, coordinates=True, bonds=True)[source]#

Transform an OpenEye molecule into the current object.

to_OEGraphMol(properties=None)[source]#

Return an OEGraphMol object for the configuration, template, or subset.

molsystem.openeye.check_openeye_license()[source]#
molsystem.openeye.openeye_version()[source]#

The version of the OpenEye OEChem toolkit.

molsystem.pdb module#

Functions for handling PDB files

To Do#

Need to understand more fully the PDB/mmcif format and the how to carry the information about residues, chains, hetero groups, waters, etc. At the moment this is ignoring much of the information, and putting residue, chain, etc information directly on atoms.

I think we should use templates and subsets, but am not (yet) sure.

Presumably this metadata is most useful for setting up complicated simulations.

File Format#

For complete documentation, see http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html

Order of records:

RECORD TYPE             EXISTENCE           CONDITIONS IF  OPTIONAL
--------------------------------------------------------------------------------------
HEADER                  Mandatory
OBSLTE                  Optional            Mandatory in  entries that have been
                                            replaced by a newer entry.
TITLE                   Mandatory
SPLIT                   Optional            Mandatory when  large macromolecular
                                            complexes  are split into multiple PDB
                                            entries.
CAVEAT                  Optional            Mandatory when there are outstanding  errors
                                            such  as chirality.
COMPND                  Mandatory
SOURCE                  Mandatory
KEYWDS                  Mandatory
EXPDTA                  Mandatory
NUMMDL                  Optional            Mandatory for  NMR ensemble entries.
MDLTYP                  Optional            Mandatory for  NMR minimized average
                                            Structures or when the entire  polymer
                                            chain contains C alpha or P atoms only.
AUTHOR                  Mandatory
REVDAT                  Mandatory
SPRSDE                  Optional            Mandatory for a replacement entry.
JRNL                    Optional            Mandatory for a publication describes
                                            the experiment.
REMARK 0                Optional            Mandatory for a re-refined structure
REMARK 1                Optional
REMARK 2                Mandatory
REMARK 3                Mandatory
REMARK N                Optional            Mandatory under certain conditions.
DBREF                   Optional            Mandatory for all polymers.
DBREF1/DBREF2           Optional            Mandatory when certain sequence  database
                                            accession  and/or sequence numbering
                                            does  not fit preceding DBREF format.
SEQADV                  Optional            Mandatory if sequence  conflict exists.
SEQRES                  Mandatory           Mandatory if ATOM records exist.
MODRES                  Optional            Mandatory if modified group exists  in the
                                            coordinates.
HET                     Optional            Mandatory if a non-standard group other
                                            than water appears in the coordinates.
HETNAM                  Optional            Mandatory if a non-standard group other
                                            than  water appears in the coordinates.
HETSYN                  Optional
FORMUL                  Optional            Mandatory if a non-standard group or
                                            water appears in the coordinates.
HELIX                   Optional
SHEET                   Optional
SSBOND                  Optional            Mandatory if a  disulfide bond is present.
LINK                    Optional            Mandatory if  non-standard residues appear
                                            in a  polymer
CISPEP                  Optional
SITE                    Optional
CRYST1                  Mandatory
ORIGX1 ORIGX2 ORIGX3    Mandatory
SCALE1 SCALE2 SCALE3    Mandatory
MTRIX1 MTRIX2 MTRIX3    Optional            Mandatory if  the complete asymmetric unit
                                            must  be generated from the given coordinates
                                            using non-crystallographic symmetry.
MODEL                   Optional            Mandatory if more than one model
                                            is  present in the entry.
ATOM                    Optional            Mandatory if standard residues exist.
ANISOU                  Optional
TER                     Optional            Mandatory if ATOM records exist.
HETATM                  Optional            Mandatory if non-standard group exists.
ENDMDL                  Optional            Mandatory if MODEL appears.
CONECT                  Optional            Mandatory if non-standard group appears
                                            and  if LINK or SSBOND records exist.
MASTER                  Mandatory
END                     Mandatory

Description of HETATM records:

COLUMNS       DATA  TYPE     FIELD         DEFINITION
-----------------------------------------------------------------------
 1 - 6        Record name    "HETATM"
 7 - 11       Integer        serial        Atom serial number.
13 - 16       Atom           name          Atom name.
17            Character      altLoc        Alternate location indicator.
18 - 20       Residue name   resName       Residue name.
22            Character      chainID       Chain identifier.
23 - 26       Integer        resSeq        Residue sequence number.
27            AChar          iCode         Code for insertion of residues.
31 - 38       Real(8.3)      x             Orthogonal coordinates for X.
39 - 46       Real(8.3)      y             Orthogonal coordinates for Y.
47 - 54       Real(8.3)      z             Orthogonal coordinates for Z.
55 - 60       Real(6.2)      occupancy     Occupancy.
61 - 66       Real(6.2)      tempFactor    Temperature factor.
77 - 78       LString(2)     element       Element symbol; right-justified.
79 - 80       LString(2)     charge        Charge on the atom.

Description of CONECT records:

COLUMNS       DATA  TYPE      FIELD        DEFINITION
-------------------------------------------------------------------------
 1 -  6        Record name    "CONECT"
 7 - 11        Integer        serial       Atom  serial number
12 - 16        Integer        serial       Serial number of bonded atom
17 - 21        Integer        serial       Serial  number of bonded atom
22 - 26        Integer        serial       Serial number of bonded atom
27 - 31        Integer        serial       Serial number of bonded atom
class molsystem.pdb.PDBMixin[source]#

Bases: object

A mixin for handling PDB files.

from_pdb_text(data)[source]#

Create the system from a PDF file.

Parameters:

data (str) – The complete text of the Molfile.

to_pdb_text(title=None, comment='Exported from SEAMM')[source]#

Create the text of the PDB file from the system.

Parameters:
  • title (str = None) – The title for the structure, by default the system name.

  • comment (str = 'Exported from SEAMM') – Comment line

Returns:

text – The text of the file.

Return type:

str

molsystem.properties module#

molsystem.properties.add_properties_from_file(path)[source]#

The standard properties recognized by SEAMM.

These are officially defined properties that can be used anywhere in SEAMM, as long as the type and definition correspond to the standard.

Each property is defined by a string with up to three parts:

<property name>#<code or ‘experiment’>#<technique or model chemistry>

The property name is required. In most cases this is followed by either ‘experiment’ or the name of the code, e.g. ‘MOPAC’, ‘Gaussian’, or ‘VASP’. The final part, if present, is either the experimental technique used to measure the property, or the model chemistry, such as ‘MP2/6-31G**’, ‘PM7’, or a forcefield name such as ‘AMBER/ff19SB’.

You can create other properties on the fly, but they follow the above convention and should have an appropriate code and, if necessary, model chemistry, so that they full name is unique and does not conflict with any other defined name.

For example, the standard property “enthalpy of formation” refers to the experimental heat of formation, or a calculated value comparable to experimental values. If you are not sure what the heat of formation in e.g. MOPAC is, you could create a new property “enthalpy of formation#MOPAC#<parameterization>”, which is clearly similar to the standard “enthalpy of formation”. If the community decides that it is indeed the same, it can be replaced by the standard form, and also aliased to it for backwards compatibility.

molsystem.pubchem module#

Functions for handling PubChem

class molsystem.pubchem.PubChemMixin[source]#

Bases: object

A mixin for handling the PubChem database.

property PC_cid#

Return the PubChem CID for this structure, or None.

PC_from_cid(cid, fallback=None)[source]#

Create the configuration from the PubChem 3-D structure, if available.

Parameters:
  • cid (int) – The PubChem CID.

  • fallback (str) – A fallback SMILES, InChI, etc. to use if PubChem fails

PC_from_identifier(identifier, namespace='detect', fallback=None)[source]#

Create the configuration from the PubChem 3-D structure, if available.

Parameters:
  • identifier (int or str) – The PubChem identifier

  • namespace (str) – The PubChem namespace: cid, name, smiles, inchi, inchikey

  • fallback (str) – A fallback SMILES, InChI, etc. to use if PubChem fails

PC_iupac_name(fallback=None)[source]#

Return the IUPAC name for this structure, or None.

Parameters:

fallback (str) – A name to return if PubChem doesn’t have a name

molsystem.qcschema module#

Interface to qcschema.

class molsystem.qcschema.QCSchemaMixin[source]#

Bases: object

A mixin for handling QCSchema.

from_qcschema_dict(data)[source]#

Reset the molecule from the QCSchema data.

from_qcschema_json(json_data)[source]#

Reset the molecule from the QCSchema JSON.

to_qcschema_dict(properties=None)[source]#

Create a dictionary compliant with QCSchema.

to_qcschema_json()[source]#

Create the QCSchema JSON for the molecule.

molsystem.rdkit_ module#

Interface to RDKit.

class molsystem.rdkit_.RDKitMixin[source]#

Bases: object

A mixin for handling RDKit via its Python interface.

debug_print()[source]#
from_RDKMol(rdk_mol, properties='all', atoms=True, coordinates=True, bonds=True)[source]#

Transform an RDKit molecule into the current object.

Parameters:
  • rdk_mol (rdkit.chem.molecule) – The RDKit molecule object

  • properties (str = "all") – Whether to include all properties or none

  • atoms (bool = True) – Recreate the atoms

  • coordinates (bool = True) – Update the coordinates

  • bonds (bool = True) – Recreate the bonds from the RDKit molecule

Return type:

molsystem._Configuration

to_RDKMol(properties=None)[source]#

Return an RDKMol object for the configuration, template, or subset.

molsystem.rdkit_.rdkit_version()[source]#

Return the RDKit version.

molsystem.smiles module#

Functions for handling SMILES

class molsystem.smiles.GenSMARTS(mol_object=None)[source]#

Bases: object

A class to generate SMARTS strings for an object.

Parameters:

mol_object (_Configuration, _Template, _Subset)

property mol_object#

The molecular object to work with.

class molsystem.smiles.SMILESMixin[source]#

Bases: object

A mixin for handling SMILES.

property canonical_smiles#

Return the canonical SMILES string for this object.

from_smiles(smiles, name=None, reorient=True, flavor='rdkit')[source]#

Create the system from a SMILES string.

Parameters:
  • smiles (str) – The SMILES string

  • name (str = None) – The name of the molecule

  • reorient (bool = True) – Whether to reorient to the standard orientation

  • rdkit (bool = False) – Whether to use RDKit rather than default of OpenBabel

Return type:

None

property smarts#

Return the SMARTS string for this object.

property smiles#

Return the SMILES string for this object.

to_smarts()[source]#

Generate a SMARTS string for this object.

Returns:

The SMARTS string.

Return type:

str

to_smiles(canonical=False, hydrogens=False, isomeric=True, flavor='rdkit')[source]#

Create the SMILES string from the system.

Parameters:
  • canonical (bool = False) – Whether to create canonical SMILES

  • hydrogens (bool = False) – Whether to keep H’s in the SMILES string.

  • isomeric (bool = True) – Whether to use isomeric SMILES

  • rdkit (bool = False) – Whether to use RDKit rather than default of OpenBabel

Returns:

The SMILES string, or (SMILES, name) if the rname is requested

Return type:

str

molsystem.smiles.check_openeye_license()[source]#

molsystem.subset module#

molsystem.subsets module#

A class providing a convenient interface for subsets

molsystem.subsets.grouped(iterable, n)[source]#

s -> (s0,s1,s2,…sn-1), (sn,sn+1,sn+2,…s2n-1), (s2n,…s3n-1), …

molsystem.symmetry module#

molsystem.system module#

A dictionary-like object for holding a system

molsystem.system_db module#

A dictionary-like object for holding a system

class molsystem.system_db.SystemDB(parent=None, logger=<Logger molsystem.system_db (WARNING)>, **kwargs)[source]#

Bases: CIFMixin, MutableMapping

A database of systems for SEAMM.

A class based on a SQLite database for describing molecular and periodic systems.

See the documentation at https://molssi-seamm.github.io/molsystem for a complete description of the database schema, including a diagram.

The key concepts are:

System

The overall container for the configurations, atoms, bonds, etc. A system has one or more configurations (conformers), each of which details the atoms, bonds, etc.

Configuration

The configuration is a single instance of the system, with atoms, coordinates, bonds and subsets. It is close to what most programs consider the molecule or crystal.

There may be different set of atoms in different configurations of a single system. This supports e.g. grand canonical ensembles.

The set of bonds may also differ between different configurations, whether or not the set of atoms changes. This supports e.g. reactive forcefields.

Each configuration also has its own set of subsets, which are a way of collecting atoms into groups, and can be thought of as a generalization of the chain and residue nomenclature for proteins. A configuration may have any number of subsets, and atoms may be in more than one subset.

Atoms

Each configuration contains a set of atoms, though different configurations may contain different atoms. Each atom is identified by its atomic number and unique name and has coordinates. It may also have other attributes, but that depends on the simulation.

Bonds

Each configuration may also have information on bonds between the atoms. A bond connects two atoms, and has a bond order (single, double, triple, aromatic, …). In periodic systems with an infinite network of bonds, each bond also has a cell offset to identify the relative cells of the two atoms.

Templates and Subsets

Subsets define groups of atoms in a general way. They are defined by a template, which may be nothing more than a type and name, e.g. ‘residue/ala’, or the template may be linked with a system which has atoms, bonds, etc.

Each subset is linked to its group atoms. For example, the template ‘residue/ala’ mentioned above would have a subset for each alanine in the protein. If the template were linked to a system which was the alanine residue, then each atom would also be connected with the appropriate atom in the template system, so even if the order and names of atoms in the system were different, we could still identify each atom with the corresponding atom in the template system.

The tables that implement this are:

system

The list of all the systems in the database

configuration

The list of all configurations of all systems, labeled by the system they belong to.

atom

The list of atoms in all systems and configurations. The attributes of the atoms do not depend on the configuration, i.e. are unchanging.

bond

The list of all bonds in all systems and configurations, giving the two atoms that are bonded plus the bond order.

coordinates

The fractional or Cartesian coordinates of the atoms as well as any other atomic properties that vary by configuration.

subset

The instances of subsets.

element

The periodic table, used as a foreign key to identify atoms.

template

The templates – labels – for subsets, which may connect the subset to a template system.

configuration_subset

A joining table used to define which subsets are “in” a configuration.

subset_atom

A joining table to define with atoms are in a subset. There is an optional field for the template atom if the template has an associated system. In this case, the template atom identifies which atom in the template system is the same as the given atom.

symmetry

Information about point or space group symmetry.

cell

The information about the periodicity of a configuration, if needed.

atomset

A set of atoms, used with a joining table to connect atoms with configurations.

atomset_atom

The joining table connecting atomsets with their atoms.

bondset

A set of bonds, used with a joining table to connect bonds with configurations.

bondset_bond

The joining table connecting bondsets with their bonds.

attach(other)[source]#

Attach another system to this one’s database.

Parameters:

other (SystemDB) – The other SystemDB object containing the database

Returns:

name – The attachment name.

Return type:

str

attached_as(other)[source]#

The attachment name for another system.

Parameters:

other (SystemDB) – The other SystemDB object containing the attached database

Returns:

name – The attachment name.

Return type:

str

attributes(tablename: str)[source]#

The attributes – columns – of a given table.

Parameters:

tablename (str) – The name of the table, optionally including the schema followed by a dot.

Returns:

attributes – A dictionary of dictionaries for the attributes and their descriptors

Return type:

Dict[str, Any]

close()[source]#

Close the database.

property configuration_ids#

The list of configuration ids.

property configurations#

The list of configuration objects.

create_system(name='', make_current=True)[source]#

Add a new system.

Parameters:
  • name (str = None) – A user-friendly name for the system, defaults to no name.

  • make_current (bool = True) – If True, make this the current system.

Returns:

The newly created system.

Return type:

_System

create_table(name, cls=<class 'molsystem.table._Table'>, other=None)[source]#

Create a new table with the given name.

Parameters:
  • name (str) – The name of the new table.

  • cls (Table subclass) – The class of the new table, defaults to Table

Returns:

table – The new table

Return type:

class Table

property cursor#

A database cursor.

property db#

The database connection.

property db_version#

The version string for the database.

delete_system(system)[source]#

Delete an existing system.

Parameters:

system (int or _System) – The system to delete.

Return type:

None

detach(other)[source]#

Detach an attached system.

Parameters:

other (SystemDB) – The other SystemDB object containing the database

diff(other)[source]#

Differences between this system and another.

property filename#

The name of the file (or URI) for the database.

find_configurations(atomset=None, bondset=None)[source]#

Return the configurations that have given atom- or bondsets

Parameters:
  • atomset (int = None) – The id of the atomset.

  • bondset (int = None) – The id of the bondset.

Return type:

[_Configuration]

get_configuration(cid)[source]#

Return the specified configuration.

Parameters:

cid (int) – The id of the configuration.

Returns:

The requested configuration.

Return type:

_Configuration

get_configuration_ids(pattern='*')[source]#

Return the configuration ids matching the glob pattern.

Parameters:

pattern (list or str = "*") – The glob-style pattern for matching the configuration names

Returns:

The requested configuration ids

Return type:

[int]

get_configurations(pattern='*')[source]#

Return the configurations matching the glob pattern.

Parameters:

pattern (list or str = "*") – The glob-style pattern for matching the configuration names

Returns:

The requested configuration ids

Return type:

[_configuration]

get_system(id_or_name)[source]#

Get the specified system object.

Parameters:

id_or_name (int or str) – The id (int) or name (str) of the system

Returns:

The requested system.

Return type:

_System

Raises:

ValueError – If the system does not exist, or more than one have the requested name.

get_system_ids(pattern='*')[source]#

Return the system ids matching the glob pattern.

Parameters:

pattern (list or str = "*") – The glob-style pattern for matching the system names

Returns:

The requested system ids

Return type:

[int]

get_systems(pattern='*')[source]#

Return the systems matching the glob pattern.

Parameters:

pattern (list or str = "*") – The glob-style pattern for matching the system names

Returns:

The requested system ids

Return type:

[_system]

is_attached(other)[source]#

Return whether another system is attached to this one.

Parameters:

other (SystemDB) – The other SystemDB object containing the database

Returns:

Whether the database is attached.

Return type:

bool

list()[source]#

Return a list of all the tables in the system.

property n_configurations#

The number of configurations in the database.

property n_systems#

The number of systems in the database.

property n_templates#

The number of templates.

property names#

The names of the system.

property parent#

The parent of this, i.e. a Systems object.

property properties#

The class to handle the properties.

property system#

The current system object.

system_exists(id_or_name)[source]#

See if the given system exists.

Parameters:

id_or_name (int or str) – The id (int) or name (str) of the system

Returns:

Whether it exists.

Return type:

bool

property system_ids#

The list of system ids.

property systems#

The list of system objects.

property templates#

The defined templates.

molsystem.system_properties module#

Property methods for systems.

molsystem.table module#

molsystem.table.grouped(iterable, n)[source]#

s -> (s0,s1,s2,…sn-1), (sn,sn+1,sn+2,…s2n-1), (s2n,…s3n-1), …

molsystem.template module#

molsystem.templates module#

A dictionary-like object for holding templates

molsystem.topology module#

Topological methods for the system

class molsystem.topology.TopologyMixin[source]#

Bases: object

A mixin for handling topology in a configuration.

bonded_neighbors(as_indices=False, first_index=0)[source]#

The atoms bonded to each atom in the system.

Parameters:
  • as_indices (bool = False) – Whether to return 0-based indices (True) or atom ids (False)

  • first_index (int = 0) – The smallest index, e.g. 0 or 1

Returns:

neighbors – list of atom ids for each atom id

Return type:

{int: [int]} or [[int]] for indices

create_molecule_subsets()[source]#

Create a subset for each molecule in a configuration.

Returns:

The ids of the subsets, one per molecule.

Return type:

[int]

create_molecule_templates(full_templates=True, create_subsets=True)[source]#

Create a template for each unique molecule in a configuration.

By default also create subsets linking each template to the atoms of the molecules in the system.

Parameters:
  • full_templates (bool = True) – If true, create full templates by creating systems for the molecules.

  • create_subsets (bool = True) – If true, create subsets linking the templates to the molecules.

Returns:

The ids of the templates, or if create_subsets is True a two-element list containing the list of templates and list of subsets.

Return type:

[int] or [[int], [int]]

find_molecules(as_indices=False)[source]#

Find the separate molecules.

Parameters:

as_indices (bool = False) – Whether to return 0-based indices (True) or atom ids (False)

Returns:

molecules – A list of lists of atom ids or indices for the molecules

Return type:

[[int]*n_molecules]

get_molecule_smiles()[source]#

Return the a list of the canonical SMILES for each molecule..

Returns:

The canonical SMILES for each molecule, in order that they are found.

Return type:

[str]

reimage_bonded_atoms(reimage_molecules=True)[source]#

Ensure that the atoms in a molecule are “near” each other.

In a periodic system atoms can be translated by a unit cell without changing anything. However the bonds in a molecule need to account for such translations of atoms otherwise a bond may appear to be very long.

It is often convenient physically move atoms by the correct cell translations to bring the atoms of a molecule close to each other. That is what this method does, moving all the atoms close to the first. Optionally the molecule is also moved to bring its geometric center into the primary unit cell, i.e. with fractional coordinates in the range [0..1).

Parameters:

reimage_molecules (bool = True) – Whether to move molecules into the primary unit cell.

Returns:

True if the coordinates were changed.

Return type:

bool

reimage_molecules()[source]#

Reimage molecules into the primary unit cell.

The molecules are moved to bring their geometric center into the primary unit cell, i.e. with fractional coordinates in the range [0..1).

Returns:

True if the coordinates were changed.

Return type:

bool

Module contents#

molsystem A general implementation for molecular and periodic systems.