Intro. & Maths Flashcards
What are n and p?
n: the number of observations or cases in the dataset.
p: the number of variables or features (parameters) in the dataset.
Together, np create the dimensions of the dataset.
What topics are covered in the clustering section of the course?
Dissimilarities
Hierarchical clustering
Partitioning methods
Cluster validation
What is supervised learning?
Techniques which assume a given structure within the data.
What topics are covered in the classification section of the course?
Multivariate normal distributions
Linear and quadratic discriminant analysis
K-nearest neighbors
Why do multivariate analyses not (usually) have response variables?
Many variables are recorded, information is then gleaned from the dataset. One specific feature is not usually being measured.
What topics are covered in the multidimensional scaling section of the course?
Classical MDS
Metric MDS
Non-metric MDS
Procrustes analysis
What are the main topics in the course?
Clustering Classification Multidimensional scaling Model-based clustering PCA and FA
What topics are covered in the model-based clustering section of the course?
Mixture models
Decomposition of covariance matrices
What is PCA?
Principle components analysis is where many variables are subgrouped to reduce the amount of autocorrelation in the dataset (implying non-independance).
What is FA?
Factor analysis is similar to PCA in that it aims to reduce the number of meaningful features in the dataset. FA usually assumes an equal number of factors to features - but many are discarded because they add no new detail to the analysis.
What topics are covered in Principle Components Analysis?
Issues
Interpretation
Mechanics
Solution Validity
What topics are covered by factor analysis?
Rotations
Interpretation
Factor Models
PCA Vs FA
Column vectors are used…
For individual observations, xi
The i-th row of a marix X is also known as…
xi transpose (it’s a single observation)
Define variance
The square of the standard deviation