Data mining Flashcards
name qualitative
nominal
ordinal
name quantitative
interval
ratio
Data preprocessing
aggregation
sampling
dimensionality reduction
feature subset selection
feature creation
discretization and binarization
attribute transformation
aggregation
combining two or more attributes
types of sampling
simple random sampling
sampling with replacement
sampling without replacement
stratified sampling
dimensionality reduction
PCA
singular value decomposition
feature subset selection
brute-force approach
embedded approach
filter approach
wrapper approach
attribute transformation
standardization
normalization
pro MIN
can handle non-elliptical shapes
limitation MIN
sensitive to noise and outliers
pro MAX, group average, ward’s method
less susceptible to noise and outliers
limitation average group, ward’s method
biased towards globular clusters
limitation MAX
tends to break large clusters
biased towards globular clusters
4 advantages of using decision tree
inexpensive to construct
extremely fast for classifying unknown records
easy to interpret
accuracy is comparable to others
4 disadvantages of using decision tree
do not generalize well to certain boolean functions
the used induction algorithm is greedy
not expressive enough for modeling continuous variables
tree replication