Químio-Informática - Tutorial Trees / NN Flashcards

1
Q

What is the goal in this tutorial?

A

Goal: In this tutorial you will learn to use the Weka program to train and test decision trees as well as FF neural nets that learn from experimental data to become able to predict a property from the molecular structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What programs are needed for this tutorial?

A

Excel and Weka.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What dataset did we use?

A

The dataset of Jorissen and Gilson (J. Chem. Inf. Model. 2005, 45(3), 549-561) consisting of 250 compounds and their known protein target was retrieved from cheminformatics.org. The targets are CDK2 (cyclin-dependent kinase 2), COX2 (cyclooxygenase 2), FXa (coagulation factor Xa), PDE5
(phosphodiesterase 5), and A1A (alpha-1A adrenoceptor). The SMILES strings, protein targets and 110 molecular descriptors calculated with the CDK Descriptor Calculator were stored in a file named jor.xls. Decision trees will learn relationships between the molecular descriptors and the protein target, so that they can make predictions for new compounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we need from the jor.xls file?

A

Open the jor.xls file in a spreadsheet, sort the lines according to the
3rd column, to separate training and test set compounds. Starting at column D, copy
the labels in the first line, the descriptors and the activity (in the last column)
for the training set and paste into a simple text file or into a new spreadsheet
and save as CSV file (e.g., jor_tr.csv).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we select trees as classifiers in Weka?

A

Select the tab “Classify”. Under “Classifier” click the “Choose” button and choose “classifiers → trees → J48”. Choose “Use training set” under “Test options”. Click on the right zone under “Classifier” to open a new panel to configure the tree. Click “OK”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we test the test set?

A

Now use the file with the test set. It contains the descriptors for 27
structures and their classification according to the protein target. This is the
test set, i.e., the set you will use to assess the model’s ability to predict new
molecules. Under “Test options”, choose “Supplied test set” and click on “Set …”.
Choose “Open file …” and select the file with the test set.
1.6. On the last row of the “Result list” panel, right-click to select “Re
evaluate current model on test set …”. You can see the results in the right
panel. If you want to copy the individual predictions, you must first click the
“More options …” button, then “Choose” next to “Output predictions” and choose
“PlainText” or “CSV”. Evaluate again the model and inspect the individual
predictions in the right panel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly