Usage

Before you run the Grade2, make sure that you have followed the Configuration instructions and tested that Grade2 works properly.

Running the grade2 command

To run Grade2 you need to specify the molecule that you want to create a restraint dictionary for. There are currently 4 alternative input options:

1. Molecule input from SMILES string.

SMILES (Simplified Molecular-Input Line-Entry System) provides a way to describe a molecular structure as an ASCII string https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system. To generate a restraint dictionary for a given SMILES string simply run grade2 on the command-line followed by the SMILES surrounded by single quotes: grade2 'SMILES', for example:

$ grade2 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'

Please note that the dollar symbol $ above represents the command prompt. This will run grade2 producing output like the following:

$ grade2 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
 set CSDHOME=/home/software/xtal/CCDC/CSDS/2021.3/CSD_2022 from $BDG_TOOL_MOGUL=/home/software/xtal/CCDC/CSDS/2021.3/CSD_2022/bin/mogul
 ############################################################################
 ##   [grade2] ligand restraint dictionary generation
 ############################################################################

      Copyright (C) 2019-2022 by Global Phasing Limited

                All rights reserved.

                This software is proprietary to and embodies the confidential
                technology of Global Phasing Limited (GPhL). Possession, use,
                duplication or dissemination of the software is authorised
                only pursuant to a valid written licence from GPhL.

   Version:   1.1.0 <2022-02-01>
   Authors:   Smart OS, Sharff A, Holstein J, Womack TO,  Flensburg C,
              Keller P, Paciorek W, Vonrhein C and Bricogne G

 -----------------------------------------------------------------------------

 RDKit generated molecule and coordinates from input SMILES: CN1C=NC2=C1C(=O)N(C(=O)N2C)C
 CHECK: Check the molecule's InChiKey against known PDB components:
 CHECK: Exact match to PDB chemical component(s):
 CHECK:   CFF https://www.rcsb.org/ligand/CFF "caffeine"
 Minimization with MMFF94s reduces energy from -104.82 to -123.49 kcal/mol
 Using CCDC Mogul-like geometry analysis.
 Mogul version 2021.3.0, CSD version 543, csd-python-api 3.0.9
 Mogul Data Libraries: as543be_ASER
 Geometry Optimize coordinates against restraints using gelly ....
 ---- gelly: Took 42 steps, reducing the rms gradient to 0.04
 ---- gelly: and the rms bond deviation to 0.004 Angstroms.
 Have written CIF-format restraint dictionary to:   LIG.restraints.cif
 Have written ideal coordinates to PDB-format file: LIG.xyz.pdb
 Have written ideal coordinates to SDF-format file: LIG.xyz.sdf
 Have written ideal coordinates in  MOL2-format to: LIG.xyz.mol2
 Have written schematic 2D diagram SVG-format file: LIG.diagram.svg
 Have written 2D diagram & atom_id labels to file:  LIG.diagram.atom_labels.svg
 Suggestion: to view/edit the restraints, use one of the commands:
     coot -p LIG.xyz.pdb --dict LIG.restraints.cif
     EditREFMAC LIG.restraints.cif LIG.xyz.pdb LIG
 Normal termination (6 secs)
  • As you can see, before the restraint dictionary is produced a CHECK is made to see whether the ligand has already be defined in the wwPDB Chemical Component Dictionary https://www.wwpdb.org/data/ccd that describes residues small molecules in PDB entries. As can be seen, in this case the SMILES string is for caffeine and it would be sensible to use a restraint dictionary for CFF so that the atom names agree with the existing definition https://www.rcsb.org/ligand/CFF. See the next subsection.

  • Note that the CIF-format restraint dictionary is written to file LIG.restraints.cif and has the default PDB chemical component id (aka residue name or 3-letter code) of LIG. To set the 3-letter code use the command-line option --resname.

  • As well as the CIF-format restraint dictionary grade2 will write "ideal" coordinates based on the restraints to PDB, SDF and MOL2 formats. For more details see the coordinates files section.

  • Molecular diagrams are also produced, for more details see the schematic 2D molecular diagrams section.

  • Finally suggestions are given how to view the coordinates and restraints produced using Coot or EditREFMAC (supplied with BUSTER).

  • Note that if you do not want the coordinate or molecular diagram output files then the --just_cif option can be used.

2. PDB chemical component definition

To generate a restraint dictionary for an existing PDB ligand it is best to use the --PDB_ligand option. For instance, to generate a restraint dictionary for caffeine CFF run:

$ grade2 --PDB_ligand CFF

or using the equivalent short option -P

$ grade2 -P CFF

This will produce output using the wwPDB Chemical Component Dictionary (CCD) compound record for caffeine CFF (see https://www.rcsb.org/ligand/CFF for an overview). Grade2 will download the wwPDB CCD CIF file for the compound from either PDBeChem: https://www.ebi.ac.uk/pdbe-srv/pdbechem/ or from Ligand Expo: http://ligand-expo.rcsb.org/. The output restraint dictionary will be called CFF.restraints.cif and other files will be named CFF.*, see the Grade2 outputs chapter.

If the --PDB_ligand option is used then the atom names will agree with the wwPDB CCD definition for the compound. This has the advantage that if you deposit the final structure to the PDB the compound's atoms will not be renamed.

3. Input molecule file

The third input option is to use a file to specify the input molecule. The command-line option --in should be used to specify the input filename. For instance, to generate a restraint dictionary for the SDF file ligand_35.sdf with the 3-letter code L35 run:

$ grade2 --in ligand_35.sdf --resname L35

or using the equivalent short options -i and -r

$ grade2 -i ligand_35.sdf -r L35

the output restraint dictionary will be L35.restraints.cif and other files will be named L35.*, see the Grade2 outputs chapter.

Normally, the format of the input file is detected from the filename extension (for example .sdf). If necessary the command-line option --itype can be used to specify the input format.

Currently, Grade2 supports the following input formats:

Grade2 Input Molecular File Formats

File format

Normal extension

Notes

mol/sdf

.mol or .sdf

The MDL Molfile and SDF file formats provide a good exchange-format for molecules between applications and databases. As the format lacks atom names these will be generated by Grade2. Please note that, if an SDF file contains multiple molecules only the first molecule will be processed by Grade2.

Tripos MOL2

.mol2

The MOL2 format has the advantage of representing bond orders, atom ID's (names) and Cartesian coordinates. On the other hand, MOL2 format has ambiguity in the format definition and is not supported by RDKit. Grade2 uses the CSD Python API to read (and write) MOL2 files and so can handle MOL2 files produced by CSD programs. The CSD-convention for MOL2 files is to use the partial charge field to store the formal charge of an atom. Other programs, such as Open Babel, use the MOL2 partial charge field to store partial charges and atomic formal charge information is lost. For MOL2 files with partial charges, Grade2 now attempts to reconstruct the atomic formal charges from valency considerations. If the reconstruction process fails, it is possible to manually edit correct formal charges, please see the FAQ Editing MOL2 file of a charged molecule with atomic partial charges.

SMILES

.smi

Please note that, if the SMILES file contains multiple molecules only the first molecule will be processed. If the SMILES file has a name field then this will be used for the name of the ligand, unless the command-line option --name is specified. It is often easier to directly specify a SMILES input string on the command-line rather than a SMILES file.

restraint dictionary CIF

.cif

CIF stands for Crystallographic Information File. It should be noted that CIF-format can be used for many types of data (for instance macromolecular coordinates or reflection data). Grade2 uses CIF-format for its principal output, the restraint dictionary file (see Outputs chapter) and this can also be used as an input file. Grade2 can read CIF-format restraint dictionaries written by Grade2 itself, eLBOW, AceDrg and Grade.

As Grade CIF restraint dictionaries lack atom formal charge (_chem_comp_atom.charge) records these are set zero when the restraint dictionary is read and care must be taken as this may cause the output molecule to be incorrect. Please the FAQ How can I use Grade2 to generate a restraint dictionary with atom names consistent with an existing Grade dictionary? for more detail.

wwPDB CCD CIF

.cif

CIF files for existing PDB ligands defined in the wwPDB Chemical Component Dictionary can be obtained either from PDBeChem: https://www.ebi.ac.uk/pdbe-srv/pdbechem/ or from Ligand Expo: http://ligand-expo.rcsb.org/ . Note that it is normally easier to get Grade2 to retrieve the wwPDB CCD CIF information directly using the --PDB_ligand option. Downloading the CCD cif file and using the --in is useful if there are firewall issues preventing script downloads.

4. The --lookup option

The --lookup option provides a mechanism whereby an external script is invoked to look up details of a ligand from a database. To use your own script, set environment variable BDG_GRADE2_LIGAND_LOOKUP to the location of the script. Please see https://gitlab.com/gphl/grade2_lookup_scripts for example scripts written in different languages and description of what your script needs to do.

By default, if BDG_GRADE2_LIGAND_LOOKUP is not set, grade2 --lookup CID uses a script that downloads ligand details from PubChem https://pubchem.ncbi.nlm.nih.gov/ using CID the PubChem compound identifier. For example, running

$ grade2 --lookup 123

will download details, of the drug Triforin, of from PubChem using its CID 123 (see https://pubchem.ncbi.nlm.nih.gov/compound/123 for the Triforin PubChem entry). This will run grade2 producing output like the following:

$ grade2 --lookup 123 --just_cif
 ############################################################################
 ##   [grade2] ligand restraint dictionary generation
 ############################################################################

      Copyright (C) 2019-2022 by Global Phasing Limited

                All rights reserved.

                This software is proprietary to and embodies the confidential
                technology of Global Phasing Limited (GPhL). Possession, use,
                duplication or dissemination of the software is authorised
                only pursuant to a valid written licence from GPhL.

   Version:   1.3.0 <2022-10-??>
   Authors:   Smart OS, Sharff A, Holstein J, Womack TO, Flensburg C,
              Keller P, Paciorek W, Vonrhein C and Bricogne G

 -----------------------------------------------------------------------------

 Lookup option --lookup "123"
 ---- Database: "PubChem"
 ---- Information: https://pubchem.ncbi.nlm.nih.gov/compound/123
 ---- Molecule name: "Tiformin"
 Systematic name set to "4-(diaminomethylideneamino)butanamide"
 RDKit generated molecule and coordinates from input SMILES: C(CC(=O)N)CN=C(N)N
 CHECK: Check the molecule's InChiKey against known PDB components:
 CHECK: The input molecule does not match any existing PDB chemical component (up to 2022-08-26).
 For help on checks against known PDB components, , see:  ....
 ---- https://gphl.gitlab.io/grade2_docs/faqs.html#checkpdbmatch
 Minimization with MMFF94s reduces energy from -118.68 to -162.18 kcal/mol
 Using CCDC Mogul-like geometry analysis.
 Mogul version 2021.2.0, CSD version 542, csd-python-api 3.0.8
 Mogul Data Libraries: as542be_ASER, Feb21_ASER, May21_ASER, Sep21_ASER
 Geometry Optimize coordinates against restraints using gelly ....
 ---- gelly: Took 249 steps, reducing the rms gradient to 0.05
 ---- gelly: and the rms bond deviation to 0.002 Angstroms.
 Have written CIF-format restraint dictionary to:   CID_123.restraints.cif
 Normal termination (4 secs)

You can notice that the SMILES string C(CC(=O)N)CN=C(N)N downloaded from PubChem is used as a starting point for the molecule. A CIF-format restraint dictionary is output to the file CID_123.restraints.cif, and this will include information about the molecule's name, its systematic (IUPAC) name and the PubChem information page.

Command-line arguments for grade2

Please note, that most grade2 command-line arguments have a long version, for instance --just_cif and a short version -j (see --just_cif). The long version can be abbreviated when this creates no ambiguity.

Help & setup command-line arguments

-h, --help

The --help option will write out a help message listing all the command-line arguments. Please note that help on each option is deliberately brief and more detail can be found in this chapter.

-checkdeps, --checkdeps

-checkdeps is a special option that checks that the external tool (CSD) that grade2 needs is accessible and works properly. Useful for setting up grade2 and for a quick test that the program works on a particular host. Please see the Installation section of this document for more details.

-V, --versions

--versions writes out version numbers of the program and Python/Data libraries used. Please use this option when reporting bugs.

Molecule input arguments

You must specify exactly one molecular input argument, so if you provide a SMILES string you cannot also provide an input CIF file.

'SMILES'

SMILES string input. The SMILES string should be given in single quotes to avoid SHELL mangling, for instance:

grade2 'C(=O)OH'

Please see the section above for more details.

-P PDB_ID, --PDB_ligand PDB_ID

downloads information for the given PDB chemical component id (also known as the residue name or 3-letter code) from PDBe or RCSB PDB. Please see the section above for more details.

-i IN_FILE, --in IN_FILE

Use the filename IN_FILE for the input molecule. Please see the section above for more details, including supported file formats.

-L ID, --lookup ID

Use an external script to lookup the molecule with ID in an external database. Please see the section above and https://gitlab.com/gphl/grade2_lookup_scripts for more details.

Optional command-line arguments

-r PDB_ID, --resname PDB_ID

The --resname option sets the output PDB chemical component id (aka residue name or 3-letter code) to the string specified by PDB_ID. Note that using --resname will normally alter the output filenames. The default PDB_ID code is LIG unless the code is available from the input (for instance, if the -P PDB_ID, --PDB_ligand PDB_ID option has been used).

Please see the FAQ What are the Grade2/BUSTER restrictions on residue name? for more information.

-o OUT_ROOT, --out OUT_ROOT

Output files produced will have filenames starting with this string. The actual filenames will be formed of the specified OUT_ROOT with an appropriate extension (see the Grade2 outputs chapter for more details), for instance the restraint dictionary CIF file will be called OUT_ROOT.restraints.cif.

If --out is not specified, by default output filenames will start with LIG., where LIG is the PDB_ID that can be set by the --resname or --PDB_ligand options.

-ocif OUT_CIF, --ocif OUT_CIF

The --ocif OUT_CIF option sets the full filename for the CIF restraint dictionary to the user-specified string OUT_CIF. This option can be used to exactly control the filename for the restraint dictionary including its file type. For instance, using --ocif ../ligand_ABC.dic will result in the restraint dictionary being written to a file ligand_ABC.dic in the directory above the current working directory.

Please note that the --ocif option overrides the -o/--out option. Furthermore, the --ocif option has no effect on the filename for other output files (if any). Consequently, it is recommended that it is used with the --just_cif option.

-f, --force_overwrite

By default grade2 will not overwrite existing files, instead exiting with an error message. Use the --force_overwrite option (or the -f short option) to force overwriting existing files.

-j, --just_cif

By default grade2 writes a number of output files (see the Grade2 outputs chapter). The --just_cif option will cause grade2 to write only the CIF-format restraint dictionary. It turns off the production of all other (PDB, SDF, MOL2 & SVG) files.

-s, --shelx

Produce SHELX restraint .dfix format output files. If --shelx is specified two additional output files will be created with the extensions .dfix and .with_hydrogen.dfix. The former file has restraints excluding those to hydrogen atoms.

-N, --no_charging

Use the --no_charging option to turn off the standard charging scheme that modifies groups likely to be charged at pH7. For instance, the standard charging scheme alters a neutral carboxylic acid to a carboxylate ion and also a neutral phosphoric acid to a phosphate ion, for more detail see the Charging chapter.

It should be noted that, the --no_charging option leaves the input molecule unchanged. So if the input molecule has a charged group then this will NOT be altered by the --no_charging option. If you want to model a ligand with a protonation state that is distinct from the standard charging scheme then use manual editing with Mercury as demonstrated by the FAQ How can I produce restraints for a ligand with a different protonation state or tautomer?.

-e, --ecloud

The -ecloud option now specifies that the ideal xyz coordinates will use the electron-cloud distances for bonds to hydrogen atoms rather than nuclear distances.

It should be noted, that in the first public release 1.0.0 of Grade2 the -ecloud option specified that for bond restraints to hydrogen atoms to be set to electron-cloud distances that are adequate for X-ray refinement. From release 1.1.0, Grade2 produces CIF restraint dictionaries containing both electron-cloud and nucleus X-H bond restraints, avoiding the requirement of separate restraint dictionaries for the two use cases. The -ecloud option is retained with the narrower effect on just the ideal xyz coordinates.

-c, --chirality_both

Use the --chirality_both option if you are not certain of the chiral configuration of the input molecule. The --chirality_both set the volume of all chiral restraints identified to "both" to allow for cases of ambiguous stereochemistry.

Note that the --chirality_both flag is not needed if starting from a non-stereo SMILES as restraints will then automatically be set to "both".

-b, --big_planes

Produce large fused planes that overemphasize ring planarity.

For a full description please see the Treatment of Planar Groups chapter.

-4, --4_atom_planes

instead of creating a single plane restraint for each flat 5/6-atom ring, produce 5 or 6 separate four-atom planes around that ring. In practice, using this option has little effect on refinement results. The --4_atom_planes option is included for testing as separate four-atom plane restraints are used by both Grade and in the first Grade2 release 1.0.0.

-n, --name NAME

The full name of ligand can be set using the --name option. Ideally, the full name should be human-readable, for example, "retinoic acid". The name will be shown in buster-report output. You should quotation marks if the full name contains a space, for example:

$ grade2 'Ic1ccccc1C(=O)[O-]' --name '2-iodobenzoic acid'

By default, the full name will be set to the InChIKey for the molecule, unless a name is already known for instance for PDB ligands.

--systematic IUPAC_NAME [PROGRAM] [PROGRAM_VERSION]

The --systematic option allows the systematic (IUPAC) name of the molecule to be specified. The systematic name provided will be included in the output CIF restraint dictionary using the _pdbx_chem_comp_identifier data category. It is optional to specify the name and version of the program used to find the systematic name.

For example, specifying --systematic "2-acetyloxy-4-iodobenzoic acid" specifies just the systematic name, without recording the program details. Note the use of the double quotation marks as the systematic name has a space. To record the program used and its version simply add after the systematic name. For example, --systematic "2-acetyloxy-4-iodobenzoic acid" ACD/Name v2021 will result in the following CIF records in the output restraint dictionary:

_pdbx_chem_comp_identifier.comp_id               LIG
_pdbx_chem_comp_identifier.type                  "SYSTEMATIC NAME"
_pdbx_chem_comp_identifier.program               ACD/Name
_pdbx_chem_comp_identifier.program_version       v2021
_pdbx_chem_comp_identifier.identifier            "2-acetyloxy-4-iodobenzoic acid"

--pubchem_names

The --pubchem_names option performs an online search for the ligand in the PubChem database https://pubchem.ncbi.nlm.nih.gov/. If the option is activated and the molecule is found then the PubChem title is used for the full name of ligand and the systematic name is set to the PubChem IUPAC name. The PubChemPy package is used to make most of the lookups.

The online search involves uploading the SMILES string of the molecule to PubChem. For this reason, the --pubchem_names option should not be used for confidential ligands. To be extra careful, by default the --pubchem_names option is deactivated until the environment variable BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB is set. If the option is specified without activation then Grade2 will terminate with an error message.

To activate the --pubchem_names option then, if you are a bash ksh or dash shell user:

$ export BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB="yes"

But if you are a csh or tcsh shell user:

$ setenv BDG_GRADE2_PUBCHEM_NAMES_ON_ACCEPT_SMILES_TO_WEB "yes"

If you are happy for --pubchem_names to be permanently enabled, BUSTER provides a convenient configuration mechanism to achieve this for all users of an installation. Please see the BUSTER Configure section of the BUSTER installation documentation.

--group GROUP

Set the CCP4-extension CIF item _chem_comp.group to GROUP. This item is used by CCP4 programs, like Coot, when producing restraints to link monomers together. Grade2 automatically sets the _chem_comp.group to peptide for amino acids both for PDB chemical components and while Setting atom IDs for amino acids. The item is also automatically set for PDB chemical components that are saccharides (to pyranose or furanose).

The --group option can be used to manually set _chem_comp.group to any value. If the option is used it overrides any automatically set value. Please note that to work properly it will also be necessary to set appropriate atom IDs for monomers to be connected properly.

-d, --database_id ID [DB_NAME] [URL] [DETAILS}

Set a corporate or database ID for the molecule and optionally other details for the molecule. The ID should be database identifier for the molecule, for example: 2083 (for PubChem) or DB01001 (for DrugBank).

One or more additional optional arguments DB_NAME, URL and DETAILS can also be given (separated by spaces). DB_NAME should be the name of the database (for example, PubChem or DrugBank). The URL should be a URL of a page giving details of the ligand a the database (for instance, https://pubchem.ncbi.nlm.nih.gov/compound/2083). DETAILS can be used for any other information (for example, "Corporate Compound Database - internal access only").

The ID will be shown in buster-report output. Future reporting tools will display all the information.

As an example, when producing a restraint dictionary for the PDB component VIA information about the DrugBank entry for Sildenafil from https://go.drugbank.com/drugs/DB00203 can be added:

$ grade2 --PDB_ligand VIA --database DB00203 DrugBank https://go.drugbank.com/drugs/DB00203

Note how grade2 options can be abbreviated when there is no ambiguity with other options. The information provided will be included in the output restraint CIF dictionary in the in gphl_chem_comp_database the CIF data category:

loop_
_gphl_chem_comp_database.comp_id
_gphl_chem_comp_database.id
_gphl_chem_comp_database.database
_gphl_chem_comp_database.url
_gphl_chem_comp_database.details
VIA     VIA      PDB                                   https://www.rcsb.org/ligand/VIA "RCSB PDB"
VIA     VIA      PDB https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/VIA       PDBe
VIA DB00203 DrugBank                             https://go.drugbank.com/drugs/DB00203          .

For more information please see the section Database Information in output CIF Restraint Dictionary.

Please note that if you want to add information about more database entries then further --database_id options can be specified. For instance to add information about the Wikipedia page:

$ grade2 --PDB_ligand VIA --database DB00203 DrugBank https://go.drugbank.com/drugs/DB00203 \
  --database . Wikipedia https://en.wikipedia.org/wiki/Sildenafil

-X, --no_extra

By default the output restraint dictionary CIF file will have many extra Grade2-specific items, for instance giving source of restraint values. Use the --no_extra to turn off the extra Grade2-specific items.

--itype {cif,sdf,mol,mol2,smi}

Format for the --in input file, selected from allowed list. By default, the format is detected from the filename extension and file contents (please see the section above for more details).

--rcsb

For the --PDB_ligand option download first from the RCSB site https://files.rcsb.org/ligands/ rather than from PDBeChem.

--no_aa_labels

This option turns off recognizing amino acids and setting atom IDs to N CA C O OXT CB. Please see Setting atom IDs for amino acids for more details.

--aa_loose

extends setting atom IDs to "exotic" amino acids, such as N-modified and beta amino acids. Please see Setting atom IDs for "exotic" amino acids for more details.

--debug

The --debug option turns on debug-level terminal output. The STDOUT output written by Grade2 will then include a large number of lines starting DEBUG:. These are not intended to be intelligible by end users but instead are useful to the program developers. You should only use the --debug option if reporting problems with Grade2.