Chapter 5 Flashcards

1
Q

What 3 possibilities may arise for training data in terms of chosen analysis?(3)

A
  1. As a random sample from the joint distribution of Y and X. This might be the case, for example, in a medical study where we observe patients from some population (e.g.
    presenting with a particular complaint) and we record various clinical variables X and whether or not the patient has a particular disease Y . In this case we can learn both
    about the group membership probabilities πk and about the distribution of X and so we can either use a regression- or discriminant-based approach.
  2. As separate random samples, chosen from each group, i.e. each value of Y . This might be the case, for example, in a clinical trial where we deliberately choose a sample of healthy patients and another sample of patients with a particular disease, taking measurements on various clinical variables X in each case. Under this sampling scenario we have no way of learning about the group membership probabilities Pr(Y = k) = πk, only the
    conditional distribution of X given Y . It is therefore not appropriate to use regression based approaches. Moreover, we can only use the Bayes approach to discriminant analysis if estimates for the πk can be provided by other means.
  3. As random samples of the group label Y for a chosen set of measurements X. This might be the case, for example, in a dose-ranging study where X represents the dose of a drug, chosen from a fixed set, and Y represents whether or not a patient suffers a particular side-effect. Under this sampling scenario we have no way of learning about the distribution of X, only the conditional distribution of Y given X. It is therefore only appropriate to use a regression-based approach.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are mean, variance and covariance of multivariate normal?(3)

A

E(Xi) = µi, Var(Xi) = σii and Cov(Xi, Xj ) = σij .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Suppose X ∼ Np(µ, Σ), A is a q × p matrix and b is a q-vector.
Then the linear transformation Y = AX + b is also…

A

A multivariate normal, with Y ∼

Nq(Aµ + b , AΣA^T ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are allocation regions?(1)

A

This set of functions defines a partition of the sample space S for X into K distinct regions
R1, R2,…,RK (Rk ∩ Rl = ∅, ∪K k=1Rk = S) such that given X = x if Qk(x) > Ql(x), for all l/= k, then we assign Y = k.
The subsets R1, R2,…,RK are sometimes called allocation regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Bayes classifier? What are the corresponding discriminant functions?(3)

A

Assigns an observation x to the class k for which the posterior probability pk(x) = Pr(Y = k|X = x) is largest. In this case
pk(x) > pl(x) ∀ l/= k
⇐⇒ fk(x)πk/(sum m=1 to k{fm(x)πm})> fl(x)πl/(sum m=1 to K{fm(x)πm} ∀l/= k
⇐⇒ fk(x)πk > fl(x)πl ∀ l/=k
Corresponding discriminant functions are:
Qk(x) = fk(x)πk, k = 1, . . ., K.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the maximum likelihood discriminant rule?(1)

A

If pik (prior group possibilities) are all equal at 1/K then observations are assigned to group with maximum likelihood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are decision boundaries?(1)

A

Qk(x) = Ql(x)

Region between Rk and Rl classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between linear and quadratic discriminant analysis?(1)

A

Quadratic discriminant analysis (QDA) is similar except the different groups are allowed to have different covariance matrices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Within groups sample matrix.(1)

A

SW = (1/n−K)[sum fromk=1 to K{(nk−1)*Sk}]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Within group k sample matrix.(1)

A

Sk = 1/(nk−1)Xk^THnkXk;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we measure classification of miscalculation? How do we know we have a good classification scheme based on this?(2)

A

Create a KxK matrix containing pij= Pr(allocate to grouo j|observation from group i)
For a perfect classification scheme, P would be equal to the K × K identity matrix IK. In practice, the best we can hope for is a matrix with diagonal elements close to 1 and
off-diagonal elements close to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are two types of in-sample validation methods used? What are the differences between these?(3)

A

-Plug in-calculates the pij from their analytic expressions,
replacing parameters with values estimated from data.
-Empirical method- Like the plug-in
method, the “empirical method” estimates the pij using the same data that were used to derive the discriminant rules. In this case we estimate pij with the empirical proportion:
pijhat = nij/ni,
where nij = #(x ∈ Rj from group i) and ni = sum{k
j=1}nij is the number of observations in group i.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Drawbacks of plug in method.(1)

A

Produces overestimates of the diagonal elements and underestimates of the off-diagonal elements due to over-fitting and ignoring the sampling variability in the parameter estimates.
Classification schemes without an underpinning model lack a probabilistic foundation therefore plug-in cannot be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Drawbacks of empirical method.(1)

A

Although the empirical method is simple and widely used, it also leads to optimistic estimates of the
performance of the classification scheme due to overfitting. Again this is essentially caused by double-use of the data for constructing and testing the classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate training error rate?(1)

A

1-(1/n)*[sum from i=1 to K{nii}]

Remember this is generally an overoptimistic view of classifier-if possible out-of-sample testing is preferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Can estimating priors over complete over training data lead to overfit or underfit?(1)

A

Experience suggests that this can lead to overfitting and so it is preferable to estimate unknown group membership probabilities using only the training data.