Frequently Asked Questions¶

Please check the online version of this FAQs page: https://gphl.gitlab.io/grade2_docs/faqs.html as this is updated as new questions come in.

Please also see:

the BUSTER frequently-asked questions, and
the Grade2 Known issues page.

General FAQs¶

How can I check that Grade2 is correctly installed and works properly?¶

Click to expand/hide answer
To check that Grade2 is correctly installed and configured run:
grade2 -checkdeps
If this results in a final line starting with SUCCESS then Grade2 has been successfully configured with access to a working CSD software installation. If there is a problem please see the Configuration Instructions.

To test that all the components used by Grade2 work as expected on your system then run the command:
grade2_tests
grade2_tests will run over 300 unit, functional and integration tests written as part of the test-driven development used for coding Grade2. Please see Testing section for more details.

How can I run Grade2 without a CSD installation?¶

Click to expand/hide answer
Grade2 uses the Mogul and the CSD Python API tools from the CCDC. Because of this Grade2 requires an installation of the CSD-Core package to work. For details on how to obtain CSD-Core please see:

https://www.ccdc.cam.ac.uk/solutions/csd-core/

If you cannot get access to CSD-Core then you can run Grade2 on your non-confidential ligand using the Grade Web Server:

http://grade.globalphasing.org/

Why must I enter my name and email address to use the Grade Web Server?¶

Click to expand/hide answer
We ask you to enter your name and email address so that we can contact you if an issue arises when Grade2 is run with your molecule. We do not routinely contact people and do not store email addresses in the long term. For further details please see the Grade Web Server Conditions of Use and Privacy Policy: http://grade.globalphasing.org/grade_server/conditions.html .

Does Grade2 work with the latest update of CSD?¶

Click to expand/hide answer
Because the CSD and BUSTER installations are separate incompatibilities can arise. Before you update your CSD update please check https://gphl.gitlab.io/grade2_docs/csd_compatibility.html to make sure there is not a problem.

Grade2 terminates with a "CSDNotFoundException", what should I do?¶

Click to expand/hide answer
This FAQ relates to a problem introduced in CSD release 2023.1. If Grade2 terminates with an error message ending with a line that contains CSDNotFoundException for instance:
ccdc.utilities.CSDNotFoundException: CSD Data is not available
          in this installation. Cannot load CSD data from None
This is an indication the CSD Python API cannot find the CSD data directory. This can arise in two main ways:

The user has never run any CSD programs from the 2023.1 release. This can happen if CSD 2023.1, was installed by another user (e.g., software). The problem should be cleared by running a CSD program, for instance by starting the Mogul program (see below).

If you are working in a heterogeneous environment with multiple workstations accessing common directories, particularly involving different OS's and where the real path to the CSD Data directory may be different.

CSD release 2023.1 altered the mechanism that the CSD Python API uses to find the directory containing the CSD data. Previously the $CSDHOME environment was used. Now a configuration file CSD.ini is used to store the CSD data location. For Linux and macOS the file is normally located at ~/.config/CCDC/CSD.ini but it can also be provided in a OS-dependent "system-wide location for all users" (see section 12.7 of the "CSD Portfolio Release notes").

The user-space CSD.ini file is normally initialized during the CSD installation process or reinitialized by running a CSD-dependent program. In a heterogeneous environment with distinct installations of CSD on different machines, this can result in conflicts with CSD.ini providing invalid information of the location of the CSD data.

If you get a CSDNotFoundException running Grade2, then it is worth seeing whether it clears by running Mogul. This can either be done interactively or on the command line by running Mogul with an empty instruction file:
$ touch empty
$ mogul -ins empty
If this does not work, then please contact us at buster-develop@GlobalPhasing.com and we will try to help. Each user of Grade2 is likely to have clear the problem separately.

Grade2 terminates with a message about incompatible Python versions: what should I do?¶

Click to expand/hide answer
Grade2 uses the CSD Python API to perform access CSD data and analysis routines. Grade2 loads the CSD Python API at runtime from a separately installed CSD. This procedure requires that the Python versions used by Grade2 and the CSD Python API are compatible. If incompatible versions are detected Grade2 will terminate with a message like the following:
ERROR:
ERROR: Grade2 is running using Python version "3.9"
ERROR: but the CSD Python API miniconda has Python version is "3.7" from:
ERROR:
ERROR:     /Volumes/SmartDisk/CSD_2021/Python_API_2021/miniconda/lib/python3.7/
ERROR:
ERROR: These are incompatible and cannot work together.
ERROR:
ERROR: To solve this problem, either:
ERROR:
ERROR: 1. Update to the latest versions of CSD and BUSTER, or:
ERROR:
ERROR: 2. Set the environment variable BDG_GRADE2_PYTHON_VERSION to '3.7'
ERROR:     bash/ksh/zsh users should:
ERROR:        export BDG_GRADE2_PYTHON_VERSION="3.7"
ERROR:     csh users should:
ERROR:        setenv BDG_GRADE2_PYTHON_VERSION "3.7"
ERROR:    and rerun Grade2
This indicates that a recent update of Grade2, version 1.4.1 or above has been run with an older version CSD, dating to before version 2023.2.0, released prior to July 2023. As described in the message, there are two ways to get around the problem, either:

Update CSD to the latest version, or:

Set the environment variable BDG_GRADE2_PYTHON_VERSION to 3.7 as described in the Using Grade2 release 1.4.1 (and following) with old CSD releases section.

How do I make a suggestion for a new feature in Grade2?¶

Click to expand/hide answer
We really like suggestions for improvements to Grade2. Please send an E-mail to buster-develop@GlobalPhasing.com saying what you would like.

I have a problem with Grade2: what should I do?¶

Click to expand/hide answer
In order to help us in the book-keeping of user support requests, of the issues they raise and of the responses we supply, it would be really helpful if you could follow the following guidelines.

If you have a problem with Grade2 please:
First check the online Known issues page:

https://gphl.gitlab.io/grade2_docs/issues.html

as it may already be a known problem with a solution or workaround.

Then check the online version of this FAQs page:

https://gphl.gitlab.io/grade2_docs/faqs.html

as this is updated frequently as new questions come in.
Please send an e-mail to buster-develop@GlobalPhasing.com describing the problem with as much detail as possible.

Please make sure you include the following information in the e-mail:
A clear description of the issue, what kind of input was used and from where it originated (for instance, if a MOL2 file was used what program wrote it).

A descriptive subject for the e-mail. For instance, "Grade2 crashes for ligands containing boron" is much better than "Restraints Problem".

The terminal output of Grade2 where the problem is encountered.
The outputs of the commands:
grade2 -checkdeps
grade2_tests
The operating system and its version you are running.
It is really helpful for us if you could follow the following guidelines.

That you report separate problems one at a time rather than raising multiple issues in a single e-mail. Separate e-mails make it much easier to tackle and respond fully to each of the problems.

That you avoid reporting a new unrelated problem in a reply to an e-mail about a previous issue. Replies are automatically linked to the original report so making things rather confusing. It is much better if you could send a new email describing the new problem (with a relevant subject). It gets a bit confusing when a report comes in with a subject that is "Re: old issue with Grade2" when it is actually about something entirely different.

The article "9 Best Practices for Software Bug Reporting" gives some really useful further advice.

How to cite Grade2?¶

Click to expand/hide answer
The main citation for Grade2 is:

Smart, O.S., Sharff A., Holstein, J., Womack, T.O., Flensburg, C., Keller, P., Paciorek, W., Vonrhein, C. and Bricogne G. (2021) Grade2 version 1.5.0. Cambridge, United Kingdom: Global Phasing Ltd.

The Grade2 version number is reported when Grade2 is run and can be found from any Grade2 output CIF restraint dictionary by using grep. For example:
$ grep grade2_version LIG.restraints.cif
_gphl_chem_comp_info.grade2_version                   1.0.0

Security: does Grade2 upload any ligand information to public servers?¶

Click to expand/hide answer
Grade2 does not upload any information about a ligand to public servers, unless you activate and then choose to use the --pubchem_names option.

The --PDB_ligand ID option retrieves information from wwPDB sites about the specified existing PDB chemical component given ID, its three letter code . Similarly the --lookup ID uses a script to retrieve information for a molecule with ID from a public or internal chemical database depending on the script used. Both of these options retrieve information about pre-existing molecules.

During a Grade2 run a check for related PDB components is made. This check is entirely local with no use of any external services. The procedure finds the input molecule's InChIKey using RDKit routines. This InChIKey is compared to a precalculated list of InChIKeys for all PDB components that is distributed as part of Grade2. The list of InChiKeys will be up-to-date at the time of the Grade2 release and can be found in one of the following files (depending on the operating system):
$BDG_home/.mc/linux64/lib/python3.7/site-packages/pdbccdinchikeys/data/PDBCCD_id_status_date_inchikey_name.csv

$BDG_home/.mc/darwin/lib/python3.7/site-packages/pdbccdinchikeys/data/PDBCCD_id_status_date_inchikey_name.csv
The --pubchem_names option involves uploading the SMILES string of the molecule to PubChem and so it should not be used for confidential ligands. To be extra careful, by default this option is deactivated and will not work until it is activated. Please see --pubchem_names documentation for details of the activation process.

The Grade Web Server http://grade.globalphasing.org/ provides a way to Grade2 online. Clearly, this necessarily involves transmission of the molecule of interest to a public web server, so the Grade Web Server should not be used for confidential ligands.

Input FAQs¶

How can I use Grade2 to generate a restraint dictionary with atom names consistent with an existing Grade dictionary?¶

Click to expand/hide answer
Suppose you are working on a project and have used Grade to generate a restraint dictionary for the ligand, used this for model building & refinement and now want to continue using a Grade2 restraint dictionary. To make the switch painless it is important that the atom naming for the ligand should not be altered. Thanks to Wei-Chun Kao for raising this question.

Grade2 can reliably use CIF restraint dictionaries from AceDRG, eLBOW and Grade2 as an input. But Grade's restraint dictionaries CIF lacks explicit atom charge records (_chem_comp_atom.charge). The first release of Grade2 (1.0.0) would terminate with an error message in such a case. However, in most cases it is normally OK to assign all atoms a charge of 0 . Hence, from release 1.1.0, when Grade2 reads such as file it assigns a charge of 0 to each atom and writes a WARNING message to the terminal output:
WARNING:
WARNING: Input restraint file CIF lacks explicit atom charge records _chem_comp_atom.charge
WARNING: ---- so will set all the atom charges to 0 and continue.
WARNING: ---- This should be fine for most neutral molecules.
WARNING: ---- But for charged atoms/molecules it will fail!
WARNING: ---- See FAQs https://gphl.gitlab.io/grade2_docs/faqs.html for more information
WARNING: ---- and instructions about a manual workaround to set atom charges.
WARNING: ----
WARNING: ---- Check InChi match messages below for problems!
WARNING:
For most neutral molecules assigning a charge of zero will be fine. However, for charged molecules or those containing groups like nitro the approach will produce incorrect chemistry. It is important to check the subsequent terminal output for messages about the checks made on the InChI read from the input file and that for the RDKit molecule used by Grade2. There should be a message:
RDKit molecule generated has the same InChI as that from the input.
---- This indicates that the stereochemistry matches so setup is successful.
indicating success. But if there is output like:
WARNING: RDKit molecule created has an InChI that does not match that read from the input.
WARNING: This means the stereochemistry of the molecules is likely to be different.
WARNING: You are advised to check molecules and restraints carefully.
then it will be necessary to manually edit the molecule's bonding and charge state - please see the next FAQ for a guide of how to do so.

How can I run Grade2 if I only have a PDB file for the ligand?¶

Click to expand/hide answer
Grade2 does not allow PDB-format files to be directly used as an input. This is because the PDB-format does not normally carry information as to the order of the molecule's bonds. Assigning the bond order is necessary before restraints can be generated. If the PDB file for the ligand includes hydrogen atoms, then Open Babel can be used to assign bond orders and produce a MOL2-format file. For example for grade-INH.pdb:
$ obabel grade-INH.pdb -O grade-INH.mol2
Use the MOL2-format that results as an input to Grade2, for example:
$ grade2 --in grade-INH.mol2 --resname INH
Carefully examine the results checking that chemistry of the resulting molecule matches that of the original Grade. If there is any difference then Mercury should be used for the conversion to MOL2 as Mercury allows a manual editing of the chemical markup as described next.

In the case of an old macromolecular refinement result, ligands routinely lack explicit hydrogen atoms and this makes chemical markup particularly challenging and prone to error.

The CSD-core program Mercury (that you will have access to as it is distributed alongside Mogul) can be used to read in a PDB-format file of a ligand, assign bond orders and add hydrogen atoms if necessary. There is an "auto Edit Structure..." option but the results of this should always be carefully checked. If there is a problem then Mercury has comprehensive manual editing options that can be used to alter bond orders, add/delete hydrogen atoms and set atom charges. Once you are happy the chemistry of the ligand molecule is correct, then in Mercury save it to a MOL2 format. The MOL2 file can then be used as an input to Grade2, using the --in option.

As an example to show this process, lets use the PQA ligand from PDB structure 2bal. Suppose that the only details of the ligand available was the conformation from the PDB file that lacks hydrogen atoms (in reality the grade2 option --PDB_ligand PQA should be used).

Extracting the coordinates of the ligand from the PDB file:
$  egrep "HETATM.*PQA" 2bal/2bal.pdb > 2bal_pqa.pdb
Then run Mercury and load the file 2bal_pqa.pdb. Once loaded, select the Mercury option "Edit" -> "Auto Edit Structure..."

If your molecule lacks explicit hydrogen atoms tick the option "Add missing H atoms". Then click the "Apply" button and Mercury will then analyze the structure, assign bond orders and add hydrogen atoms:

You should check the results carefully as ascribing chemistry to a molecular structure in the absence of bond orders and without hydrogen atoms is a difficult task. If the bond orders and/or hydrogen atoms added are wrong then Mercury has comprehensive manual editing options that can be used to alter bond orders, add/delete hydrogen atoms and set atom charges. In the PQA example test case, Mercury correctly assigns the bond orders and adds hydrogen atoms to the PQA molecule and no editing is required, despite the piperidine ring being in a 'mangled' conformation.

Once you are happy that the edited chemical markup of the molecule is correct, select the Mercury menu item "File" -> "Save As" and the option "Mol2 files". This will save the molecule as a MOL2 file that will preserve the atom names from the original PDB file as well as your edited chemical markup.

Then use the resulting MOL2 file as the input to grade2 using the --in option. Note that you will also need to specify the correct residue name (3-letter code) for the molecule as this is not preserved by Mercury. Doing this for the example test case:
$ grade2 --in 2bal_pqa_mercury_autoedit.mol2 --resname PQA
results in producing a restraint dictionary for the molecule where the piperidine is charged. The non-hydrogen atom names are consistent with the input PDB file. The restraint dictionary can then be used for further fitting and/or refinement.

If you want to use a command line tool to do the process rather than Mercury then this can be done with OpenBabel, please see BUSTER wiki page Hydrogenate PDB with Open Babel. It is still essential to carefully check/correct the results if this is done.

Finally once again, it should be noted that in practice for structures from the PDB databank the --PDB_ligand option should be used as this downloads all information from the PDB chemical component dictionary.

How can I produce restraints for a ligand with a different protonation state or tautomer?¶

Click to expand/hide answer
The CSD-core program Mercury (that you will have access to as it is distributed alongside Mogul) can be used to manually edit the bonding, hydrogen atom positions and atomic charges of a ligand. Mercury can be used to edit a ligand's tautomeric or protonation state, while preserving its atom names. For a demonstration of how to do this in practice please see:

Video: Using Mercury to edit charge/ tautomeric state for Grade2 restraint dictionary generation

Editing MOL2 file of a charged molecule with atomic partial charges¶

Click to expand/hide answer
There is currently an issue Grade2 cannot read a MOL2 file of a charged molecule when it has atomic partial charges. When this happens, it is possible to manually edit correct formal charges using Mercury. For a demonstration of how to do this in practice please see:

What can be done if there is an ERROR in generating a 3D conformation for the molecule?¶

Click to expand/hide answer
Some molecular inputs to Grade2, such as SMILES strings or 2D SDF files, do not have 3D coordinates for the atoms. When given such an input, Grade2 will use RDKit routines to produce an initial 3D conformation.

Generating 3D conformations for a knotted molecule with many intersecting rings can be difficult. Indeed, it is possible to construct SMILES strings for molecules that cannot be constructed in 3D, for instance c1c2ccc3cc2ccc13 is a napthalene with an additional bond between carbon atoms on opposite sides of the double ring:

If Grade2 produces a message ERROR: Cannot generate a 3D conformation for the input molecule. then we would suggest:

Check the input SMILES string. Has it been corrupted? How reliable is its source?

Use an online 2D image generator for instance http://hulab.rxnfinder.org/smi2img/ or https://cactus.nci.nih.gov/gifcreator/ . Does the image make sense?

If the molecule is already available in another format (for instance SDF) then use this.

If the SMILES string contains stereo atom specifiers @ try removing or altering these.

Try other 3D conformation generators. These could be restraint generation programs (for instance Grade or AceDRG) or an online tool such as https://cactus.nci.nih.gov/translate/ . Check any results carefully using molecular graphics (for instance Mercury). If a reasonable 3D conformation is produced then use this as an input to Grade2.

If you are still stuck, please contact us at buster-develop@GlobalPhasing.com and we will try to help.

Grade2 runs slowly for a ligand. What can I do about this?¶

Click to expand/hide answer
The time consuming part of a Grade2 run is the use of Mogul procedures (Bruno et al., 2004) to search the CSD through the CSD Python API. Mogul speeds up retrieval of geometric information by storing tables of data for the most common chemical features found in the CSD. Access to information from the precalculated tables is fast. However, if the molecule in question has unusual chemistry (in particular for rings) Mogul will perform searches on structures from the CSD. Such searches involve accessing many gigabytes of data. Consequently, the speed of the searches is dependent on the speed of programmatic access to the CSD data.

To demonstrate that installation of the CSD on low-performance disks can result in slow runs, a test job (using a cut-down SMILES string from cephalosporin C 0MU):
$ time grade2 'CC1CS[C@@H](N=C1C(=O)O)C'
was run on an old Linux workstation (Intel i7-3770S @ 3.10GHz, 8 Gb memory) using Grade2 release 1.4.1 and CSD 2023.2.

CPU times for test job time grade2 'CC1CS[C@@H](N=C1C(=O)O)C'¶

ccdc-data installed

elapsed

user

sys

Internal Solid State Drive (SSD)

7.4 minutes

6.5 minutes

0.3 minutes

Internal Hard Disk Drive (SATA 7200 rpm)

24.1 minutes

7.2 minutes

0.3 minutes

Slow Network Drive (using sshfs to a remote datacenter, 36ms latency)

1060.5 minutes

9.0 minutes

1.3 minutes

It can be seen that using a CSD installation from a hard disk drive results in an elapsed run time that is three times longer than using a CSD installation on an internal solid-state disk drive. Grade2 can be used with CSD installed on a network drive but performance is dependent on the network capabilities and latency. The table shows an extreme example where CSD is installed on a network drive from a remote datacenter where Grade2 runs over 100 times slower than using the SSD. The user CPU reported only increases slightly showing that the Grade2 process spends time waiting for the CSD data.

In the past, particular issues have been found using NFS version 3 with Grade (the predecessor of Grade2), see https://www.globalphasing.com/buster/wiki/index.cgi?SoftwareMogulRelease2014NFSissues for details. NFS version 3 has been obsolete for many years and should not now be in use.

If Grade2 performance is an issue it is advisable to use a local SSD to store CSD data. Note that it is possible to use separate locations for CSD programs and data. How to do this has recently changed, and is now described in the "CSD Portfolio Release Notes" section 12.7 that describes the CSD.ini configuration file.

CPU times for test job time grade2 'CC1CS[C@@H](N=C1C(=O)O)C'¶
ccdc-data installed	elapsed	user	sys
Internal Solid State Drive (SSD)	7.4 minutes	6.5 minutes	0.3 minutes
Internal Hard Disk Drive (SATA 7200 rpm)	24.1 minutes	7.2 minutes	0.3 minutes
Slow Network Drive (using sshfs to a remote datacenter, 36ms latency)	1060.5 minutes	9.0 minutes	1.3 minutes

Output FAQs¶

Grade2 says that the ligand matches an existing PDB chemical component. What should I do?¶

Click to expand/hide answer
As part of the Grade2 run a check is made whether the molecule matches any existing PDB chemical component (from the wwPDB Chemical Component Dictionary https://www.wwpdb.org/data/ccd ). The check uses the InChIKey of the ligand. The InChIKey is a shortened form of the International_Chemical_Identifier (InChI) that facilitates the comparison of molecules.

For example, if Grade2 is supplied the SMILES string Cn1cnc2c1C(=O)N(C(=O)N2C)C then the following terminal output will result:
CHECK: Check the molecule`s InChiKey against known PDB components:
CHECK: Exact match to PDB chemical component(s):
CHECK:   CFF https://www.rcsb.org/ligand/CFF "caffeine"
So the SMILES string is for caffeine that is an existing PDB component https://www.rcsb.org/ligand/CFF. It is likely to be sensible to use the Grade2 dictionary for CFF for fitting and refinement.

Please note that tautomers normally have the same InChIKey (for an example see PDB components 2D3 and XQK).

If Grade2 reports an unexpected match to an existing PDB chemical component, then it may be a good idea to switch to using a Grade2 restraint dictionary for the matching chemical component. If you deposit the structure to the PDB with a ligand that matches an existing PDB chemical component then it will be renamed (including all the atom IDs). If a match is reported please check that the tautomeric/charge state is what you want before switching.

The Grade2 option --antecedent allows the atom IDs to be taken from a related ligand. If you are working on a tautomer of an existing PDB ligand component then it can be used to produce consistent atom IDs.

What are the Grade2/BUSTER restrictions on residue name?¶

Click to expand/hide answer
There is no problem in Grade2 using any string as a residue name for a novel ligand (using the --resname option. It is common practice for companies to use LIG, INH or DRG, although these three codes were issued in the PDB chemical component library, they have now been withdrawn:
From Jasmine Young <jasmine.young@rcsb.org>
Date: Wed, 27 Oct 2021 11:12:43 -0400

Dear all,

The wwPDB OneDep team would like to inform you that we have reserved a set of
ligand identifier codes that will never be used by the PDB. This is to allow
depositors to use such codes for their new ligands during structure
determination processes.

These reserved ligand codes are LIG, INH, DRG, and 01-99 (two digits). The
OneDep deposition system will be ready for this change in December 2021.

We encourage you adopt these ligand codes in your software packages.

Regards,

Jasmine
It can be noted, that some groups use the work code UNL but this has a specific meaning in the wwPDB database meaning "unknown ligand" https://www.rcsb.org/ligand/UNL. This normally indicates that an unexpected ligand has been identified from a blob of electron density. So it is best to avoid UNL as a working residue name.

By default, Grade2 now uses residue name LIG.

Grade2 can produce CIF restraint dictionaries for residue names longer than three-characters but currently there are often compatibility problems with downstream programs, such as BUSTER. BUSTER uses PDB-format for molecular input and currently can only handle residue names that are 3-characters or shorter. This will need to be dealt with soon (PDB news: once all three-character alphanumeric codes are exhausted four-character codes will be issued).

Please note, that it is also necessary to avoid residue names for the common compounds (such as SO4) that can be found in $BDG_home/tnt/data/common-compounds.

.

(the blank lines above are included so that hyperlinks work better).

Please check the online version of this FAQs page: https://gphl.gitlab.io/grade2_docs/faqs.html as this is updated as new questions come in.