know Flashcards

1
Q

universe
attributes
Va

A

all go U={x,x,x}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

indiscernibility relation IND{A}

A

son las equivalences y van en {{z,z},{s}}
aqui tienes que fijarte que te pide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

B lower es

A

100 cierto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

B upper

A

si y no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

positive region es

A

todo lo que en base a lo que te dicen es cierto ojo lo falso tambien cuenta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

decision system

A

A=(U, A U {d})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Boundary region es

A

b lower - b upper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

accuracy of the approximation

A

b upper / b lowe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

generalized decision

A

es cuando pones cada INDa osea la equivalence class con su decision
no es consistente si tiene varios numeros

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

decision relative discernability matrix

A

haces la tabla con las equivalence y pones las diferencias aqui es DECISION RELATIVE entonces si tienen la misma decision pones TETA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

boolean discernabilityfunction

A

es poner todas las diferencias pero en (xx) y las escibres asi
fA(attributos) = (X^b) v (ddhdh)
y luego haces la simplificacion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ojo con la simplificacion

A

tiene que tener todo los valores con lo que debes simplificar por eso es bueno tener una pqeuña

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

support?

A

Number of objects that fulfill the rule
te van a dar una regla y support es todos los que la cumplen completamente

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

accuracy es…

A

A fraction of correctly classified objects for the rule conditions
support / todos los que tiene la misma regla pero no la misma decision. entonces todos los que estan en la misma equivalnece class|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

coverage es…

A

support /# de objetos que son la regla contraria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

strenght…

A

support/todos los objetos en el universo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the point of using Boolean reasoning in rough sets?

A

Boolean reasoning is used to obtain the reducts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why do we need to do discretization?

A

Because in rough sets we use boolean reasoning and for that we need discrete data.

19
Q

supervised learning

A

data with decision classes /labels
classification problems
case-control studies
algorithm =. decision trees or rule-based learning

20
Q

unsupervised

A

unknown decision classes
looking for patterns in data
hierarchical clusterings

21
Q

performance or interpretability

A

performance for things involving life
interpretability for complex coding or data analysis model complex

interpretable ML techniques aim at giving legible answers for predictions

22
Q

cutoffpermutation value needs to be set to at least…

A

20! to have significant results

23
Q

when is undersampling neccesary

A

when the distribution of classes is unequal
e.g. 20 controles y 5 pacientes

24
Q

What is the classification accuracy?
what is the expected value?

A

Accuracy is the power/strenght of our model. We want accuracy over the expected 0.5 because that indicates that the model is correct more often than random chance.

Accuracy = (TP +TN) / (TP + TN + FP + FN). The number of correct predictions divided by total.

25
Q

can we trust the AUC with low samples

A

It is questionable if we should trust the model performance with such low values.

26
Q

when is appropiate to de k-fold cross validation

A

when we dont have external test set to test our data

27
Q

sensitivity

A

TP/(TP+FN)
TRUE POSITIVE RATE

28
Q

specificity

A

TN/(TN+FP)
TRUE NEGATIVE RATE

29
Q

AUC MEANING

A

its the area under the ROC curve and its obtained by changing the threshold (cut-off) between true positive rate and true negative rate,

30
Q

How to improve a rule based model. Improving the accuracy and AUC

A
  • Increase the number of permutations in MCFS
  • iNCREASE NUMBER IN BOTH SAMPLES
  • CHANGE REDUCER e.g johnson to genetic
  • decrese or increase the number of features
  • detect and remove objects that are wrongly classified
31
Q

Explain how you interpret a VisuNet graph, what does the following parameters mean?
- node size
- lines between nodes
- border size

A

node size= tells you the decision and coverage support.

Intensity of node color = tells us the relative importance of the feature.

lines between the nodes tell you about how strongly connected the nodes are. Red, thick lines indicates stronger connections.

Border size tells you how many times that feature is included in a rule.

32
Q

describe a strategy for construction decision trees.

A

use a TOP-DOWN approach and construct the tree RECURSIVELY one split at a time. For each split, the one with the highest ratio is chosen. The tree is finished when there are no possible splits that reduce the information value further.
* Top-down
* recursive
* one at a time

33
Q

Explain decision trees

A

The root of the tree represents the entire dataset and the first split from the node is the most important because it divides the largest number of features. Further down in the tree, we have other nodes that represent smaller splits and divergences in the data. The leafs represent the final classifications and we can follow the branches from root to leaf to get a sort of “rule” for the classification.

34
Q

What is feature selection and when should we use it?

A

Identify an ordered list of attributes that best discriminates between/among decision classes.
Good for identifying the most important features for classification in very large sets of features.

External features selection with for example MCFS is necessary if the dataset has more features than objects in the universe and it is done to reduce the dimensionality of features to those that most affect the classification.

35
Q

MCFS MAIN STEPS

A
  1. create S SUBSETS of m attributes chosen at random from the original d attributes
  2. s is chosen so that the difference in ranking between ten iterations is small and stable
  3. divide the subsets into training and tests set t times
  4. for each training set make a tree classifier
  5. evaluate on thhe test
  6. calculñate the relative importance
36
Q

DECISION TREE STEPS

A
  1. CALCULAS INFO DE TODO
  2. separas por atributo y calculas el info size de cada uno por sus diferentes clases. info ([2,3])
  3. SACAS el weighted de todos las clas de por ejemplo gen 1 donde sumas si era 2, 3 entonces 5/suma de todo + info de los otros
  4. gain resta de original - weighted
  5. split_info = sacas el info de la suma de los posibles classes de los dos [] sacas suma
  6. gain ratio =. gain / split _info
37
Q

steps for creating a rule-bassed model

A
  1. put aside an external validation set of subject samples
  2. data processing, remove imcomplete data
  3. feature selection: perform a feature selection to select the most important feautures and reduce noise
38
Q

advantages and disadvantages of MCFS=

A

Advantages=
preservation of the features
ranking of the features and statisticsal significance of them
little feature shadowing

Disadvantages=
Not possible to explain variability in the data
computational expensive

39
Q

odd distibution of objects e.g 20 cases and 2 control

A

Undersampling

40
Q

discretization,when should it be performed

A

before the split of test and training

41
Q

genetic algorithm

A

uses evolution and continues to search for better

42
Q

boolean expression law used in simplificarion

A

((AvB)^A =A

43
Q

technique used in MCFS to find a cutoff

A

permutation test

44
Q

resources to interpret gene expression levels

A

ENSEMBL GENE ONTOLOGY KEGG