Final Exam Flashcards

1
Q

What are the 3 normalization schemes used in class?

A

min-max

z-score

decimal scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 preprocessing techniques?

A
  1. Feature selection
  2. Dimensionality reduction
  3. Normalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is spatial autocorrelation?

A

objecsts that are physically close tend to be similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 3 types of data sets?

A

Record

  1. data matrix
  2. document data
  3. transaction data

Graph

  1. web data
  2. molecular structures

Ordered

  1. spatial
  2. temporal
  3. sequence
  4. sequential
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two different types of mappings?

A

1 of n

n of m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

explain the 1 of n mapping:

A

create 1 new attribute for every ordinal value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

explain the n of m mapping

A

create 1 new attribute with a unique representation for each ordinal value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 types of missing values?

A

Missing completely at random

  • scratch random items out
  • can subsitute mean but effects variance

Missing at Random

  • value missing is due to value of another variable

Non Ignorable Data

  • Value missing is due to limitations of measuring device
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 4 sampling schemes?

A
  1. Simple Random Sampling: each object is selected with equal probability
  2. Sampling with replacement: do not remove the object when sample is selected
  3. Sampling without replacement: remove the object when sample is selected
  4. Stratified sampling: split the data into several paritions and sample randomly from each one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between a training and test set?

A

Training set is used to build a model.

Test set is used to test a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is sampling important with respect to a training and test set?

A

We want to reduce the amount of bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is accuracy?

A

Used to compare performance of models

of correct predictions / # of predictons

TP + TN / (TP + TN + FP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is True Positive Rate / sensitivity?

A

fraction of positive examples predicted correctly by the classifier

TPR = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the True Negative Rate / specificity?

A

fraction of negative examples predicted correctly by the classifier

TNR = TN / (TN + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is False Positive Rate?

A

fraction of negative examples predicted as positive class

FPR = FP / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is False Negative Rate?

A

fraction of positive examples predicted as negative class

FNR = FN / (FN + TP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is percision?

A

The fraction of positive examples out of examples declared as positive

p = TP / TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is recall?

A

the fraction of positive examples correctly predicted by the classifier

r = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is F-measure?

A

summarizes precision and recall

2TP / ( 2TP + FP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the apriori principal?

A

if an itemset is frquent, then all of its subsets are frequent. Conversely, if an itemset if infrequen, then all of its supersets must be infrequent too.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Recite the Apriori algorithm?

A

Let k = 1

generate frequent itemsets of length 1

Repeat:

  1. generate (k+1) candidate itemsets from k frequent itemsets
  2. prune candidate itemsets containing subsets of length k that are infrequent
  3. count support for each canddiate by scanning the DB
  4. eliminate candidates that are infrequent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a frequent itemset?

A

An itemset that meets the minsup threshold.

23
Q

What is a maximal frequent itemset?

A

An itemset is maximal frequent if none of its immediate supersets is frequent

24
Q

What is a closed itemset?

A

An itemset is closed if none of its immediate supersets has the same support as the itemset.

It’s a compressed representation of support

25
Q

How is a ROC curve depicted?

A

A graphical chart with TP rate on y axis and FP rate on x axis.

Each points repersents a point that corresponds to a model induced by a classifier.

26
Q

What 2 things is a ROC curve useful for?

A
  1. Picking the best model after repeatedly adjusting a specific parameter (t)
  2. Comparing the relative performance among classifiers
27
Q

In the Apriori frequent itemset generation algorithm, what is a hash tree used for?

A

Determining if each enuerated k-itemset corresponds to an existing candidate itemset.

28
Q

Describe an FP tree used by the FP growth algoriothm for frequent itemset generation

A

Reads data set one transaction at a time

Maps each transation onto a path in the FP-tree

Paths may overlap as transactions are similar

The more paths overlap, the more compression

Sometimes makes tree small enough to fit into main memory

29
Q

What are the two datastructures used in the Apriori Algorithm and the FP Growth Algorithm?

A

FP tree

Hash tree

30
Q

How does the FP growth algorithm mine frequent itemsets?

A

It uses a recursive divided-and-conquer approach

It uses pointrs to assist frequent itemset generation.

It requires preprocessing.

31
Q

What is the mahalnanobis distance?

A
  • takes the average values of pairs of attributes and subtracts them from the mean of values
  • an alternative to normalizzation (scaling is built in)
  • take into account the dspread of the data in a direction
32
Q

What is the Minkowski Distance?

A
  • a generalization of the euclidean distance
  • it’s the same as the euclidean distance, but with a parameter r instead of 2
33
Q

What difference measures can be used to deal with attributes that have different ranges?

A
  1. Normalization techniques
  • scaling
  • min-max
  • decimal scaling
  1. Can choose an algorithm that is not affected by different ranges (decision trees)
  2. Mahalanobis distance
  3. PCA
    - cpatures largest variation and reduces dimensions
34
Q

What are the 3 types of outliers?

A
  1. Point: just a point on a 1D graph
  2. Contextual: depends on the context that you are looking at
  3. Collective: an outlier that exists as a sequence
35
Q

What are the 3 techniques for outlier detection?

A
  1. Statistical (grubbs test & linear regression)
  2. Density based (DBscan)
  3. Proximity based (KNN)
36
Q

What are 2 methods to evaluate a model?

A

Out of bag estimation

Cross validation

37
Q

What is bagging with respect to data mining and ensemble methods?

A

Bagging (bootstrap aggregating) is basically out of bag estimation.

Sample repreatedly with replacement from the original data to create new training sets.

Reduces variance and helps to avoid overfitting.

38
Q

What is boosting with respect to data mining and ensemble methods?

A
  • Give more emphasis on specific examples that are difficult to classify. Assign a higher weight, greater probability of being selected to them.
  • Records that are wrongly classified will have their weights increased.
  • Records that are classified correctly will have their weights decreased.
39
Q

Describe the AdaBoost method used in ensembles.

A
  • AdaBoost creates many classifiers / models and repreatedly draws from samples.
  • Samples that are easy to classifiy get a lower weight, and ones that are harder to classify get a higher weight.
  • If any intermediate rounds produce an error rate higher than 50%, the weights are reverted back and the resampling procedure is repreated.
  • The classifier also gets a weight.
40
Q

What conditions must be satisified for ensemble methods to improve the overall classification accuracy?

A

1) the different classifiers make different mistakes in the data
2) the different classifiers perform better than random guessing

41
Q

What are the 3 components of the curse of dimensionality?

A
  1. Runtime: if the algorithm does not behave at most linear with the # of attributes, the runtime increase too quickly
  2. Amount of Data: The amount of samples needed to cover the space with equal density grows exponentially with the number of dimensions
  3. Distances: distances between the data becomes meaningless. The maximum distance between data points does not grow linearly with the # of dimensions.
42
Q

What is the formula for the bias / variance tradeoff?

A

Mean Squared Error = bias2 + variance

43
Q

How is a ROC curve constructed?

A

Sort the threshold values from lowest to highest for all of the samples.

Plot the TPR vs the TNR for each sample.

44
Q

What are the 5 common clustering algorithms?

A

K-means

Hierarchical

DBscan

Expectation Maximization

Self Organizing Maps

45
Q

What is expectation maximization (EM)?

A

Represents data as distrubutions (usually nomal distribution)

Randomly initialize parameters (mean and standard deviation)

  1. Expectation: calculate likelyhood of falling into distribution based on parameters & assign data to parameters
  2. Maximization: find parameters that best represent the assignments
46
Q

Describe the characteristics of the K-means algorithm

A
  1. deterministic? no (difference results each time)
  2. suceptible to falling in a local minimum? yes
  3. need to specify # of clusters? yes
  4. is noise removed? no
  5. partitioned based? yes
  6. can handle varying densities? yes
47
Q

Describe the charactersistics of hierarchical clustering

A

deterministic? yes

suceptible to falling in a local minimum? no

need to specify # of clusters? no

is noise removed? yes
partitioned based? no
can handle varying densities? yes

48
Q

Describe the characteristics of the DB scan clustering algorithm

A

deterministic? yes

suceptible to falling in a local minimum? no

need to specify # of clusters? no

is noise removed? yes
partitioned based? yes
can handle varying densities? no

49
Q

Describe the characteristics of expectation maximization

A

deterministic? no (random component at start)

suceptible to falling in a local minimum? yes

need to specify # of clusters? no (run multiple times)

is noise removed? no
partitioned based? yes
can handle varying densities? not sure…

50
Q

Describe the characteristics of self organizing maps

A

deterministic? no (the map is randomly initialized)

suceptible to falling in a local minimum? yes

need to specify # of clusters?

is noise removed? no
partitioned based? not sure…
can handle varying densities? not sure…

51
Q

What are self organizing maps?

A

An effective clustering algorithm

An abstraction of the input data

52
Q

Describe how self organaizing maps work

A

Randomly initialize map

repeat:

compare a sample and all input datum in map

select the prottype that is most imilar

update the winner and neighbourhood to more ismilar to winner

53
Q

What is Principal Component Analysis?

A

It’s a preprocessing technique that maps the data to lower dimensional space

It captures the largest variation

reduces noise and dimensions

“knee in curve”