Final Exam Flashcards
What are the 3 normalization schemes used in class?
min-max
z-score
decimal scaling
What are the 3 preprocessing techniques?
- Feature selection
- Dimensionality reduction
- Normalization
What is spatial autocorrelation?
objecsts that are physically close tend to be similar
What are the 3 types of data sets?
Record
- data matrix
- document data
- transaction data
Graph
- web data
- molecular structures
Ordered
- spatial
- temporal
- sequence
- sequential
What are the two different types of mappings?
1 of n
n of m
explain the 1 of n mapping:
create 1 new attribute for every ordinal value
explain the n of m mapping
create 1 new attribute with a unique representation for each ordinal value
What are the 3 types of missing values?
Missing completely at random
- scratch random items out
- can subsitute mean but effects variance
Missing at Random
- value missing is due to value of another variable
Non Ignorable Data
- Value missing is due to limitations of measuring device
What are 4 sampling schemes?
- Simple Random Sampling: each object is selected with equal probability
- Sampling with replacement: do not remove the object when sample is selected
- Sampling without replacement: remove the object when sample is selected
- Stratified sampling: split the data into several paritions and sample randomly from each one
What is the difference between a training and test set?
Training set is used to build a model.
Test set is used to test a model.
Why is sampling important with respect to a training and test set?
We want to reduce the amount of bias.
What is accuracy?
Used to compare performance of models
of correct predictions / # of predictons
TP + TN / (TP + TN + FP + FN)
What is True Positive Rate / sensitivity?
fraction of positive examples predicted correctly by the classifier
TPR = TP / (TP + FN)
What is the True Negative Rate / specificity?
fraction of negative examples predicted correctly by the classifier
TNR = TN / (TN + FP)
What is False Positive Rate?
fraction of negative examples predicted as positive class
FPR = FP / (FP + TN)
What is False Negative Rate?
fraction of positive examples predicted as negative class
FNR = FN / (FN + TP)
What is percision?
The fraction of positive examples out of examples declared as positive
p = TP / TP + FP
What is recall?
the fraction of positive examples correctly predicted by the classifier
r = TP / (TP + FN)
What is F-measure?
summarizes precision and recall
2TP / ( 2TP + FP + FN)
What is the apriori principal?
if an itemset is frquent, then all of its subsets are frequent. Conversely, if an itemset if infrequen, then all of its supersets must be infrequent too.
Recite the Apriori algorithm?
Let k = 1
generate frequent itemsets of length 1
Repeat:
- generate (k+1) candidate itemsets from k frequent itemsets
- prune candidate itemsets containing subsets of length k that are infrequent
- count support for each canddiate by scanning the DB
- eliminate candidates that are infrequent