Week 11 - Data Mining (Clustering and Association) Flashcards
What is clustering and association used for?
when we don’t want to predict variables
What is clustering?
natural grouping of data based on similarities and differences in
What is needed to interpret and label clustered data?
human
What is involved in the algorithm k-mean?
assigns each data point to cluster who’s center is nearest
What does k stand for in k-mean?
predetermined number of clusters
What is k-means weakness
requires predetermined k and only works with numerical data
What is association?
finds the common co-occurrence of things
What is involved in the algorithm apriori?
finds subsets that are common to at least a minimum number of itemset (support threshold)
What are the two main data types
-qualitative
-quantitative
What are the qualitative data types
-nominal
-ordinal
What are the quantitative data types
-discrete
-continuous
What is nominal data
label data that does not have an order
What is ordinal data
label data that has an order
What is discrete data
numbered data that are integer or whole number
What is continuous data
numbered data in the form of fractional numbers