Chapter 4 Quiz Flashcards

1
Q

number of predictors or input variables used by the model

A

dimensionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

average, stdev, min, max, median, count

A

summary statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

gives info on scale, types of values, extremes, central values, skew, dispersion

A

summary statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

presence of two or more predictors sharing the same linear relationship with the outcome variable

A

multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to avoid multicollinearity?

A

take out variables that have a strong correlation with others in the matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

combining close or similar categories is done through what?

A

expert knowledge and common sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

few variables are made from weighted linear combinations of the original variables, retain majority of info of full original set

A

principal component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what subsets is PCA most valuable with?

A

subsets with same scale and high correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

variance is maximal, minimize distance from line

A

first principal component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

perpendicular to z prime, second largest variability

A

second principal component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does PCA allow you to see?

A

structure of data and weight of each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what models can be used to help combine or remove categorical variables?

A

classification and regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to make category into numerical?

A

use midpoint of range for all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does having too many dimensions cause?

A

sparsity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does PCA want to do?

A

minimize perpendicular sum of lines (most variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

L1 lasso, encourages sparsity, many coefficients will become zero

A

regularization

17
Q

tails of data, how big they are

18
Q

how symmetric the data is