Intro. & Maths Flashcards

1
Q

What are n and p?

A

n: the number of observations or cases in the dataset.
p: the number of variables or features (parameters) in the dataset.
Together, np create the dimensions of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What topics are covered in the clustering section of the course?

A

Dissimilarities
Hierarchical clustering
Partitioning methods
Cluster validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is supervised learning?

A

Techniques which assume a given structure within the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What topics are covered in the classification section of the course?

A

Multivariate normal distributions
Linear and quadratic discriminant analysis
K-nearest neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do multivariate analyses not (usually) have response variables?

A

Many variables are recorded, information is then gleaned from the dataset. One specific feature is not usually being measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What topics are covered in the multidimensional scaling section of the course?

A

Classical MDS
Metric MDS
Non-metric MDS
Procrustes analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main topics in the course?

A
Clustering
Classification
Multidimensional scaling
Model-based clustering
PCA and FA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What topics are covered in the model-based clustering section of the course?

A

Mixture models

Decomposition of covariance matrices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is PCA?

A

Principle components analysis is where many variables are subgrouped to reduce the amount of autocorrelation in the dataset (implying non-independance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is FA?

A

Factor analysis is similar to PCA in that it aims to reduce the number of meaningful features in the dataset. FA usually assumes an equal number of factors to features - but many are discarded because they add no new detail to the analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What topics are covered in Principle Components Analysis?

A

Issues
Interpretation
Mechanics
Solution Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What topics are covered by factor analysis?

A

Rotations
Interpretation
Factor Models
PCA Vs FA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Column vectors are used…

A

For individual observations, xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The i-th row of a marix X is also known as…

A

xi transpose (it’s a single observation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define variance

A

The square of the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define standard deviation

A

The sum of the squared differences, divided by N.

17
Q

How is covariance calculated?

A

Expectation of the product of two variables, minus the product of each expectation.

18
Q

How is correlation calculated?

A

Covariance of variables divided by the square root of the product of the variances.

19
Q

What is along the diagonal of a covariance matrix?

A

The variance of each variable.

20
Q

Is the covariance matrix symetric?

A

Yes.

21
Q

What is along the diagonal of the correlation matrix?

A

1s.

22
Q

How is the correlation matrix calculated quickly from the sample covariance matrix?

A

R=(D^-1/2)S(D^-1/2)

23
Q

E[aTx] = …

A

aTu

24
Q

Var[aTx] = …

A

aT SIGMA a

25
Q

Cov(U, V) = …

A

aT SIGMA b

26
Q

What is the notation for the covariance matrix?

A

SIGMA!

27
Q

What are the three basic dissimmilarity properties?

A

d(x,y) >= 0,
d(x,y) = d(y,x)
d(x,y) >= d(z,x) + d(y,z)