Atom Naming Features

This chapter describes options in Grade2 that allow the names (also know as atom IDs) of individual atoms in a ligand molecule to be set.

Where possible Grade2 will reuse atom names from the input file, for instance for PDB chemical components. Otherwise by default Grade2 names atoms numerically in order (so if the first two atoms are carbon and oxygen they will be called C1 and O2.

Setting atom IDs for amino acids

Typical alpha amino acids with an amino group and a single beta carbon atom

Grade2 will now by default, recognize typical amino acids when supplied with an input that lacks atom IDs (aka atom names), for instance a SMILES string. The exact requirement used is that the molecule matches the SMARTS pattern:

[$([NX3H2,NX4H3+])][CX4H]([#6])[CX3](=[OX1])[OX2H,OX1-]

The pattern specifies that the molecule must have have either a neutral NH2 or a NH3+ amino group followed by a a 4-valent carbon atom with one hydrogen atom and one carbon atom attached and then a neutral or charged carboxylic acid. A wider range of amino acids are recognized when the --aa_loose option is used (see next section).

If a typical amino acid is recognized then the PDB-standard atom IDs (N CA C O OXT CB) will be set for the main chain and beta carbon atoms and for the hydrogen atoms that they are bonded to. In addition, the ligand's atoms will be reordered so that the main chain atoms are first in the list. Currently, side chain atoms are assigned atom IDs using their numerical order (rather than PDB-style Greek letter remoteness codes CG CD CE etc). So using 4-fluoroglutamate from SMILES C(C(F)C(=O)O)[C@@H](C(=O)O)N as an example, Grade2 will assign atom IDs:

Grade2 atom labels for fluoroglutamate C(C(F)C(=O)O)[C@@H](C(=O)O)N

If you prefer for the renaming not to happen, then the Grade2 command-line --no_aa_labels option turns it off, leaving standard numerical order based atom IDs.

Note that, currently, no alterations are made if the input file specifies atom IDs (for example CIF restraint dictionaries and most MOL2 files).

In addition to setting main chain atom IDs the output restraint dictionary will have the CCP4-extension CIF item _chem_comp.group is set to peptide This enables Grade2 CIF restraint dictionaries to be used in Coot to replace protein residues with modified amino acids.

Please let us know if you would like this feature extended, for instance to set PDB-style Greek letter remoteness IDs for side chain atoms beyond CB.

Setting atom IDs for "exotic" amino acids with the --aa_loose option

Following a user-request, the atom naming feature has been extended to a wide range of "exotic" amino acids with the command line option --aa_loose is used. If the option is not used but atom names could be set then a warning message is produced in the terminal output, for instance:

WARNING: The molecule is an "GLY-like alpha amino acid with an amino group", so ....
WARNING: ---- could set conventional amino acid atom IDs. If you want ....
WARNING: ---- this done, then please rerun with the option: --aa_loose
WARNING:

If a molecule is recognized as an amino acid by the --aa_loose option the output restraint dictionary will have the CCP4-extension CIF item _chem_comp.group is set to peptide. Please note that setup of restraints between an "exotic" amino acid and adjacent monomers is dependent on the program using the restraint dictionary and that setting atom IDs is not likely to be sufficient to ensure that correct restraints are used.

The amino acid classes that are currently recognized by --aa_loose are detailed below. If there is any need for recognition of any other class of amino acid then please let us know.

alpha amino acid with CB and N-modification

This pattern allows modification of the nitrogen atom by a single carbon atom. The SMARTS used is:

[$([NX3])]([#6])[CX4H]([#6])[CX3](=[OX1])[OX2H,OX1-]

Atom IDs N CN CA C O OXT CB will be set. Please note that for PDB chemical components there is no standard atom name for the carbon atom attached to the nitrogen, but CN is used in N-methyl-L-serine https://www.rcsb.org/ligand/5JP and seems sensible.

For an example, given the SMILES input C[C@@H](C(=O)O)NCC the following atom IDs will be set:

Grade2 atom labels for n-ethyl-alanine C[C@@H](C(=O)O)NCC

AIB-like alpha amino acid with an amino group

This pattern matches alpha amino acids with two C beta atoms and an unmodified amino group. The SMARTS used is:

[$([NX3H2,NX4H3+])][CX4]([#6])([#6])[CX3](=[OX1])[OX2H,OX1-]

Atom IDs N CA CB1 CB2 C O OXT will be set. For an example, given the SMILES input NC(C)(CO)C(O)=O the following atom IDs will be set:

Grade2 atom labels for alpha_methyl_serine NC(C)(CO)C(O)=O

AIB-like alpha amino acid with N-modification

This pattern matches alpha amino acids with two C beta atoms and a nitrogen modified by a carbon atom. The SMARTS used is:

[$([NX3])]([#6])[CX4]([#6])([#6])[CX3](=[OX1])[OX2H,OX1-]

Atom IDs N CN CA CB1 CB2 C O OXT will be set. For an example, given the SMILES input CNC(C)(CO)C(O)=O the following atom IDs will be set:

Grade2 atom labels for n_methyl_alpha_methyl_serine CNC(C)(CO)C(O)=O

GLY-like alpha amino acid with an amino group

This pattern matches alpha amino acids that are similar to glycine in that no beta carbon atom is present and that the amino nitrogen atom is either a neutral NH2 or a NH3+. The SMARTS used is:

[$([NX3H2,NX4H3+])][CX4][CX3](=[OX1])[OX2H,OX1-]

Atom IDs N CA C O OXT will be set. For an example, given the SMILES input F[C@@H](C(=O)O)N the following atom IDs will be set:

Grade2 atom labels for fluoroglycine F[C@@H](C(=O)O)N

GLY-like alpha amino acid with N-modification

This pattern matches alpha amino acids that are similar to glycine but have a N-modification involving a carbon atom. The SMARTS used is:

$([NX3])]([#6])[CX4][CX3](=[OX1])[OX2H,OX1-]

Atom IDs N CN CA C O OXT will be set. For an example, given the SMILES input F[C@@H](C(=O)O)NC the following atom IDs will be set:

Grade2 atom labels for n_methyl_fluoroglycine F[C@@H](C(=O)O)NC

beta amino acid

This pattern matches beta amino acids. Please note that, unlike the previous patterns, the matching is promiscuous allowing matches with N-modification and modification at both the CA and CB atoms.

The SMARTS used is:

[$([NX3])][#6][#6][CX3](=[OX1])[OX2H,OX1-]

Atom IDs N CB CA C O OXT will be set. Please note that for PDB chemical components there is no standard atom name for the extra main chain carbon atom, but CB is used in both beta-alanine https://www.rcsb.org/ligand/BAL and 62H https://www.rcsb.org/ligand/62H . For an example, given the SMILES input FCC(CN)C(=O)O the following atom IDs will be set:

Grade2 atom labels for beta_fluoromethylalanine FCC(CN)C(=O)O