Químio-Informática - Tutorial Structures Flashcards

1
Q

Qual foi o objetivo deste primeiro tutorial?

A

Illustrate the generation of SMILES and InChI strings, and demonstrate their
use as keys for database access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Software usado:

A

Datawarrior and RDKit (RDKit is to run GUIConvertMol)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To convert SMILES into molecular representation we can use what softwares?

A

JSME or Dataworrior (just copy the SMILES into the app and it will pop up) (u may add a text column and an image column on dataworrior)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to save mdl sdfile a structure created in Dataworrior

A

With Datawarrior: draw as in step 3 but now Copy Structure As →
Molfile V2 → Paste into Notebook or SimpleText editor → Save as paracetamol.mol.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What were the tasks in this tutorial?

A

a) if paracetamol is included, and b) if there are duplicated molecules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to convert SDF files into SMILES?

A

type the Win key → type “cmd” to open the command-line
shell → type python path_to_GUIConvMol.py. Press “SDF→SMILES”, browse and select
the 10000.sdf file, specify an output file named 10000.smiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What will be done with the 10000 smiles strings?

A

Copy the 10,000 SMILES strings into a worksheet. You will verify whether there are duplicated structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we identify the duplicated structures?

A

Add a column with an ID number for each compound (e.g.,
from 1 to 10000). You can use the COUNTIF function of the worksheet to search for
the first SMILES string in the array of 10000 strings, and then drag the formula.
Or the VLOOKUP function to search for the same string in the strings below. In
alternative, you can sort the lines by the SMILES string (so that if there are
duplicated strings they will be placed in contiguous cells), implement a formula to
check if one cell is equal to the cell below, and drag the formula.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we remove salts and charge in GUIConvMol? Why can it be useful?

A

Now generate the 10,000 SMILES strings again but select “Remove Salts” and
“Uncharge” in GUIConvMol, and repeat step 10 with the new version of the 10,000
SMILES strings. You will detect an additional pair of duplicated structures: in
fact, molecule 8156 is a carboxylate salt of molecule 8651. In aqueous solution the
carboxylate and the carboxylic acid are interconverted (acid-base equilibrium) and
so, depending on your application, you may want to consider them as duplicates or
not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can program can make Hashed Fingerprints?

A

The GUIConvMol program can calculate binary hashed fingerprints (fp).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we generate hashed fingerprints?

A

On the right panel fill in Size of FP: 64, Bits Per Pattern: 2 and Max Path: 3.
Press “Generate Fingerprints”, select the file 10stru.smi and an output file
(10stru_a.fp).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are some molecules in the file generated with the same hashed fingerprints?

A

Open the output file (10stru_a.fp) with a text editor or a worksheet and check
if all the 10 fingerprints are different. Three pairs of molecules obtained the
same fingerprint: 2/3, 5/6, and 7/8. Molecules 7 and 8 are stereoisomers, thus they
will never be distinguished by these hashed fingerprints because stereochemistry is
not considered. For the other two pairs, all the sequences with size 3 occurring in
one molecule also occur in the other, so the maximum path considered is not enough
to distinguish between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Generate new fingerprints with maximum path 5 for higher discrimination ability. What will happen now?

A

Inspect the new output file and observe that molecules 2 and 3 got different fingerprints. But molecules 5 and 6 still got the same fingerprint, although there are patterns of size 5 occurring in molecule 5 but not on molecule 6. This is due to collisions, and can be solved by increasing the size of the fingerprint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens when we enable the size of the fingerprint to 256?

A

Open the output file and see that the new parameters enabled the
discrimination of all molecules except the pair of stereoisomers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Now try to increase the number of bits activated by each pattern from 2 to 5, instead of increasing the fingerprint size from 64 to 256. What happens?

A

You can see that
increasing the number of activated bits (“1”) keeping the fingerprint size the
same, reduced the discrimination capability – the three pairs of molecules (2/3,
5/6, and 7/8) could not be distinguished again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly