Atom Naming Features¶
This chapter describes options in Grade2 that allow the names (also know as atom IDs) of individual atoms in a ligand molecule to be set.
Where possible Grade2 will reuse atom names from the input file, for
instance for PDB chemical components.
Otherwise by default Grade2 names atoms numerically in order (so
if the first two atoms are carbon and oxygen they will be called
C1
and O2
.
Setting atom IDs for amino acids¶
Typical alpha amino acids with an amino group and a single beta carbon atom¶
Grade2 will now by default, recognize typical amino acids when supplied with an input that lacks atom IDs (aka atom names), for instance a SMILES string. The exact requirement used is that the molecule matches the SMARTS pattern:
[$([NX3H2,NX4H3+])][CX4H]([#6])[CX3](=[OX1])[OX2H,OX1-]
The pattern specifies that the molecule must have have either
a neutral NH2 or a NH3+ amino group followed by a
a 4-valent carbon atom with one hydrogen atom and one carbon atom attached
and then a neutral or charged carboxylic acid. A wider range of amino acids
are recognized when the --aa_loose
option is used (see next section).
If a typical amino acid is recognized then the PDB-standard
atom IDs (N CA C O OXT CB
) will be set for the main chain and beta carbon
atoms and for the hydrogen
atoms that they are bonded to. In addition, the ligand's atoms will be reordered
so that the main chain atoms are first in the list. Currently,
side chain atoms are assigned atom IDs using their numerical order
(rather than PDB-style Greek letter remoteness codes CG CD CE
etc).
So using 4-fluoroglutamate from SMILES C(C(F)C(=O)O)[C@@H](C(=O)O)N
as an example, Grade2 will assign atom IDs:
If you prefer for the renaming not to happen, then the Grade2 command-line --no_aa_labels option turns it off, leaving standard numerical order based atom IDs.
Note that, currently, no alterations are made if the input file specifies atom IDs (for example CIF restraint dictionaries and most MOL2 files).
In addition to setting main chain atom IDs the output restraint dictionary
will have the CCP4-extension CIF item _chem_comp.group
is set to peptide
This enables Grade2 CIF restraint dictionaries to be used in Coot to replace
protein residues with modified amino acids.
Please let us know if you would like this feature extended,
for instance to set PDB-style Greek letter remoteness IDs for side chain atoms
beyond CB
.
Setting atom IDs for "exotic" amino acids with the --aa_loose
option¶
Following a user-request, the atom naming feature has been extended to
a wide range of "exotic" amino acids with the command line option
--aa_loose
is used. If the option is not used but atom names
could be set then a warning message is produced in the terminal output,
for instance:
WARNING: The molecule is an "GLY-like alpha amino acid with an amino group", so ....
WARNING: ---- could set conventional amino acid atom IDs. If you want ....
WARNING: ---- this done, then please rerun with the option: --aa_loose
WARNING:
If a molecule is recognized as an amino acid by the --aa_loose
option
the output restraint dictionary will have
the CCP4-extension CIF item _chem_comp.group
is set to peptide
.
Please note that setup of restraints between an "exotic" amino acid
and adjacent monomers is dependent on the program using the restraint dictionary
and that setting atom IDs is not likely to be sufficient to ensure that
correct restraints are used.
The amino acid classes that are currently recognized by --aa_loose
are detailed below. If there is any need for recognition of any
other class of amino acid then please let us know.
alpha amino acid with CB and N-modification¶
This pattern allows modification of the nitrogen atom by a single carbon atom. The SMARTS used is:
[$([NX3])]([#6])[CX4H]([#6])[CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CN CA C O OXT CB
will be set. Please note that for
PDB chemical components there is no standard atom name for the carbon
atom attached to the nitrogen, but CN
is used in N-methyl-L-serine
https://www.rcsb.org/ligand/5JP and seems sensible.
For an example, given the SMILES
input C[C@@H](C(=O)O)NCC
the following atom IDs will be set:
AIB-like alpha amino acid with an amino group¶
This pattern matches alpha amino acids with two C beta atoms and an unmodified amino group. The SMARTS used is:
[$([NX3H2,NX4H3+])][CX4]([#6])([#6])[CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CA CB1 CB2 C O OXT
will be set. For an example, given the SMILES
input NC(C)(CO)C(O)=O
the following atom IDs will be set:
AIB-like alpha amino acid with N-modification¶
This pattern matches alpha amino acids with two C beta atoms and a nitrogen modified by a carbon atom. The SMARTS used is:
[$([NX3])]([#6])[CX4]([#6])([#6])[CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CN CA CB1 CB2 C O OXT
will be set. For an example, given the SMILES
input CNC(C)(CO)C(O)=O
the following atom IDs will be set:
GLY-like alpha amino acid with an amino group¶
This pattern matches alpha amino acids that are similar to glycine in that no beta carbon atom is present and that the amino nitrogen atom is either a neutral NH2 or a NH3+. The SMARTS used is:
[$([NX3H2,NX4H3+])][CX4][CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CA C O OXT
will be set. For an example, given the SMILES
input F[C@@H](C(=O)O)N
the following atom IDs will be set:
GLY-like alpha amino acid with N-modification¶
This pattern matches alpha amino acids that are similar to glycine but have a N-modification involving a carbon atom. The SMARTS used is:
$([NX3])]([#6])[CX4][CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CN CA C O OXT
will be set. For an example, given the SMILES
input F[C@@H](C(=O)O)NC
the following atom IDs will be set:
beta amino acid¶
This pattern matches beta amino acids. Please note that, unlike the previous
patterns, the matching is promiscuous allowing matches with N-modification and
modification at both the CA
and CB
atoms.
The SMARTS used is:
[$([NX3])][#6][#6][CX3](=[OX1])[OX2H,OX1-]
Atom IDs N CB CA C O OXT
will be set.
Please note that for
PDB chemical components there is no standard atom name for the extra
main chain carbon atom, but CB
is used in both beta-alanine
https://www.rcsb.org/ligand/BAL and 62H https://www.rcsb.org/ligand/62H .
For an example, given the SMILES
input FCC(CN)C(=O)O
the following atom IDs will be set: