Final Exam Flashcards

Question

How is a ROC curve depicted?

Answer 1

A graphical chart with TP rate on y axis and FP rate on x axis. Each points repersents a point that corresponds to a model induced by a classifier.

Answer 2

1. Picking the best model after repeatedly adjusting a specific parameter (t) 2. Comparing the relative performance among classifiers

Answer 3

Determining if each enuerated k-itemset corresponds to an existing candidate itemset.

Answer 4

Reads data set one transaction at a time Maps each transation onto a path in the FP-tree Paths may overlap as transactions are similar The more paths overlap, the more compression Sometimes makes tree small enough to fit into main memory

Answer 5

FP tree Hash tree

Answer 6

It uses a recursive divided-and-conquer approach It uses pointrs to assist frequent itemset generation. It requires preprocessing.

Answer 7

* takes the average values of pairs of attributes and subtracts them from the mean of values * an alternative to normalizzation (scaling is built in) * take into account the dspread of the data in a direction

Answer 8

* a generalization of the euclidean distance * it's the same as the euclidean distance, but with a parameter r instead of 2

Answer 9

1. Normalization techniques - scaling - min-max - decimal scaling 2. Can choose an algorithm that is not affected by different ranges (decision trees) 3. Mahalanobis distance 4. PCA - cpatures largest variation and reduces dimensions

Answer 10

1. Point: just a point on a 1D graph 2. Contextual: depends on the context that you are looking at 3. Collective: an outlier that exists as a sequence

Answer 11

1. Statistical (grubbs test & linear regression) 2. Density based (DBscan) 3. Proximity based (KNN)

Answer 12

Out of bag estimation Cross validation

Answer 13

Bagging (bootstrap aggregating) is basically out of bag estimation. Sample repreatedly with replacement from the original data to create new training sets. Reduces variance and helps to avoid overfitting.

Answer 14

* Give more emphasis on specific examples that are difficult to classify. Assign a higher weight, greater probability of being selected to them. * Records that are wrongly classified will have their weights increased. * Records that are classified correctly will have their weights decreased.

Answer 15

* AdaBoost creates many classifiers / models and repreatedly draws from samples. * Samples that are easy to classifiy get a lower weight, and ones that are harder to classify get a higher weight. * If any intermediate rounds produce an error rate higher than 50%, the weights are reverted back and the resampling procedure is repreated. * The classifier also gets a weight.

Answer 16

1) the different classifiers make different mistakes in the data 2) the different classifiers perform better than random guessing

Answer 17

1. Runtime: if the algorithm does not behave at most linear with the # of attributes, the runtime increase too quickly 2. Amount of Data: The amount of samples needed to cover the space with equal density grows exponentially with the number of dimensions 3. Distances: distances between the data becomes meaningless. The maximum distance between data points does not grow linearly with the # of dimensions.

Answer 18

Mean Squared Error = bias² + variance

Answer 19

Sort the threshold values from lowest to highest for all of the samples. Plot the TPR vs the TNR for each sample.

Answer 20

K-means Hierarchical DBscan Expectation Maximization Self Organizing Maps

Answer 21

Represents data as distrubutions (usually nomal distribution) Randomly initialize parameters (mean and standard deviation) 1. Expectation: calculate likelyhood of falling into distribution based on parameters & assign data to parameters 2. Maximization: find parameters that best represent the assignments

Answer 22

1. deterministic? no (difference results each time) 2. suceptible to falling in a local minimum? yes 3. need to specify # of clusters? yes 4. is noise removed? no 5. partitioned based? yes 6. can handle varying densities? yes

Answer 23

deterministic? yes suceptible to falling in a local minimum? no need to specify # of clusters? no is noise removed? yes partitioned based? no can handle varying densities? yes

Answer 24

deterministic? yes suceptible to falling in a local minimum? no need to specify # of clusters? no is noise removed? yes partitioned based? yes can handle varying densities? no

Answer 25

deterministic? no (random component at start) suceptible to falling in a local minimum? yes need to specify # of clusters? no (run multiple times) is noise removed? no partitioned based? yes can handle varying densities? not sure...

Answer 26

deterministic? no (the map is randomly initialized) suceptible to falling in a local minimum? yes need to specify # of clusters? is noise removed? no partitioned based? not sure... can handle varying densities? not sure...

Answer 27

An effective clustering algorithm An abstraction of the input data

Answer 28

Randomly initialize map repeat: compare a sample and all input datum in map select the prottype that is most imilar update the winner and neighbourhood to more ismilar to winner

Answer 29

It's a preprocessing technique that maps the data to lower dimensional space It captures the largest variation reduces noise and dimensions "knee in curve"

Final Exam Flashcards

(53 cards)