SOA PA Flashcards

1
Q

bar chart

A

geom_bar()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

box plot

A

geom_boxplot()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

histogram

A

geom_histogram()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

scatterplot

A

geom_point()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

smoothed line

A

geom_smooth()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ggplot alpha

A

transparency parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

display separate plots

A

facet_wrap(~, ncol = ))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

two-dimensional grid of plots

A

facet_grid( ~ , ncol = )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

adjust axes range

A

xlim() & ylim()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

convert axes to log scales

A

scale_x_log10() & scale_y_log10()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

edit titles, subtitles, and captions

A

labs(), xlab(), ylab(), ggtitle()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

display multiple graphs

A

grid.arrange() in gridExtra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

numeric var descriptive stats code

A

summary()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

numeric var distribution displays

A

histograms, box plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

correct for skewness

A

log transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

categorical var descriptive stats code

A

table()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

categorical var graphical displays

A

bar charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

numeric v numeric descriptive stats code

A

cor()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

numeric v numeric graphical display

A

scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

numeric v categorical descriptive stats

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

numeric v categorical graphical display

A

split boxplots, histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

categorical v categorical descriptive stats code

A

table()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

discrete var

A

restricted to certain values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

continuous var

A

can assume any value in theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

levels

A

predefined values of a categorical var

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

supervised learning

A

understand relationships of predictors and target var

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

unsupervised learning

A

no target var; solely var relationship extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

numeric target predictive model

A

regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

categorical target predictive model

A

classification model, classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

training/test split

A

70-80%/20-30%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

root mean squared error

A

aggregated prediction errors to measure regression accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

test classification error rate

A

measures classifier accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

cross-validation

A

technique to select hyperparameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

hyperparameters

A

parameters that have to be supplied in advance and are not optimized as part of the model training process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

bias-variance tradeoff

A

more complex models have lower bias but higher variance than a less flexible model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

bias

A

difference between the expected value and the true value of the signal function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

variance

A

quantifies the amount by which f(x) would change if a different training set is used;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

irreducible error

A

variance of the noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

more complex model has

A

lower bias but higher variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

overfitting

A

when a model is unnecessarily complex, resulting in the misinterpretation of noise as the underlying signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

underfitting

A

when a model is too general/basic, resulting in little or no capturing of the signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

feature

A

derivations from the original variables and provide an alternative, more useful view of the information contained in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

variables

A

raw measurement that is recorded and constitutes the original dataset prior to any data transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

feature generation

A

the process of developing new features based on existing variables in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

feature selection

A

the procedure of dropping features with limited predictive power and therefore reducing the dimension of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

combining sparse categories with others

A

ensures that each level has a sufficient number of observations / preserves the differences in the behavior of the target variable among different factor levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

simple linear regression

A

regression using one predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

multiple linear regression

A

regression using more than one predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

regression coefficient

A

coefficient of the predictor

50
Q

ordinary least squares

A

choosing the estimates of coefficients to make the sum of the squared differences between the observed target values and the fitted valuesunder the model the least

51
Q

design matrix

A

contains the values of predictors

52
Q

residual

A

the discrepancy between the observed target value and the corresponding predicted value on either the training set or test set

53
Q

t-statistic

A

the ratio of the corresponding least squares estimate to its estimated standard deviation or error / measures the partial effect of a var on the target var

54
Q

coefficient of determination R^2

A

proportion of the variation of the target var that can be explained by the fitted linear model

55
Q

f-statistic

A

assesses the joint significance of the entire set of predictors, against the alternative

56
Q

akaike information criterion

A

balances the goodness of fit of a model to the training data with the complexity of the model captured by the number of parameters, which acts as a penalty term penalizing an overfitted model / the smallest AIC provides the best model

57
Q

bayesian information criterion

A

balances the goodness of fit of a model to the training data with the complexity of the model captured by the number of parameters, which acts as a penalty term penalizing an overfitted model / the smallest BIC provides the best model

58
Q

model diagnostics

A

quantitative and graphical tools that are used to identify evidence against the model assumptions and, if found, to refine the specification of the model in an effort to improve adequacy

59
Q

residuals vs fitted plot

A

plot of the residuals against the fitted values

60
Q

normal q-q plot

A

plot of the quantiles of the standardized residuals against the theoretical standard normal quantiles and can be used to check the normality of the random errors

61
Q

polynomial regression

A

regression where the relationship of the target and predictors is not linear

62
Q

binarization

A

turns a given categorical predictor into a collection of binary variables, each of which serves as an indicator of one and only one level of the categorical predictor

63
Q

interaction term

A

betaX1X2

64
Q

backward selection

A

start the model with all features and drop the feature that causes the greatest improvement in the model according to a certain criterion one at a time

65
Q

forward selection

A

start the model with just the intercept and augment the model by progressively adding the feature that results in the greatest improvement in the model, until no features can be added to improve the model

66
Q

best subset selection

A

considering and selecting the all models

67
Q

regularization/penalization/shrinkage

A

alternative to stepwise selection for feature selection and reducing model complexity

68
Q

lambda

A

regularization parameter

69
Q

regularization penalty

A

captures the size of the regression coefficients

70
Q

ridge regression

A

the sum of squares of the slope coefficients

71
Q

lasso regression

A

the sum of the absolute values of the slope coefficients

72
Q

elastic net

A

a combined regularization method of both ridge and lasso regression

73
Q

alpha

A

mixing coefficient

74
Q

regularization is used to trade off these 2 desirable characteristics of the coefficient estimates

A

model fit and model complexity

75
Q

desirable model fit

A

we want coefficient estimates that match the training data well in the sense that the training data well in the sense that the training RSS is reasonably small

76
Q

desirable model complexity

A

we want coefficient estimates that are small in absolute value so that the model is less prone to overfitting

77
Q

standardization

A

before performing regularization, it is judicious to standardize the predictors by dividing each by their standard error

78
Q

when lambda = 0

A

regularization penalty vanishes and the coeffcient estimates are identical to the ordinary least squares estimates

79
Q

when lambda = infinity

A

regularization penalty dominates and the estimates of the slope coefficients have no choice but to all be zero

80
Q

when lambda increases

A

the effect of regularization becomes more severe

81
Q

lasso and elastic net feature selection

A

the coefficients can be forced to 0

82
Q

hyperparameters

A

alpha & lambda

83
Q

hyperparameter tuning

A

cross-validation

84
Q

GLM distributions

A

continuous (positive) data, binary data, count data, aggregate loss data

85
Q

selecting link function

A

appropriateness of predictions, interpretability, canonical link

86
Q

log link

A

ensures positive predictions and easy to interpret

87
Q

logit link

A

form of the log link that is usually used for binary data

88
Q

weights

A

observations of the target var are averaged by exposure

89
Q

offsets

A

observations are values aggregated over all exposure units

90
Q

deviance

A

goodness-of-fit measure for GLMs which measures the extent to which the GLM departs from the saturated model and allows for perfect fit

91
Q

deviance residual

A

the signed square root of the contribution of the ith observation to the deviance

92
Q

deviance residual properties:

A

normally distributed even if the target distribution is nor, no systematic patterns, constant variance

93
Q

Penalized Likelihood measures

A

AIC and BIC

94
Q

confusion matrix

A

tabular display of how the predictions of a binary classifier line up wit the observed classes

95
Q

sensitivity

A

the relative frequency of correctly predicting an event of interest when the event does take place, or equivalently, the ratio of TP to the total positive events

96
Q

specificity

A

the relative frequency of correctly predicting a non-event when there is indeed no event, or the ratio of TN to the total negative events

97
Q

classification error rate

A

(FN + FP) / n

98
Q

ROC curve

A

graphical tool plotting the sensitivity against the specificity of a given classifier for each cutoff in the rand [0,1]

99
Q

area under the curve (AUC)

A

the exact value of the AUC may not mean much for the quantitative assessment of a classifier and in real applications it is often the relative value of the AUC that matters; the higher the better

100
Q

AUC = 1

A

the highest possible value of the AUC, which is attained by a classifier with perfect discriminatory power

101
Q

AUC = 0.5

A

the naive classifier which classifies the observations purely randomly without using the information contained in the predictors

102
Q

node

A

point on a decision tree that corresponds to a subset of the training data

103
Q

root node

A

the node at the top of the decision tree representing the full dataset

104
Q

terminal node (leaf)

A

nodes at the bottom of a tree which are not split further

105
Q

binary tree

A

each node has only two children

106
Q

depth

A

number of tree splits needed to go from the tree’s root node to the furthest terminal node

107
Q

every time we make a binary split, there are two inter-related decisions we have to make:

A

the predictor to split / the corresponding cutoff level, given the split predictor

108
Q

regression tree

A

analogous to linear models, the variability of a numeric target variable in a particular node of a decision tree is quantified by the residual sum of squares, or RSS

109
Q

classification tree

A

tree model where the target variable can take a discrete set of values

110
Q

entropy

A

value increases with the degree of impurity in the node

111
Q

gini

A

the higher the degree of node impurity, the higher the value of gini

112
Q

cost-complexity pruning

A

technique of controlling the complexity of a tree

113
Q

relative training error

A

training error of a tree scaled by the training error of the simplest tree, i.e., the tree with no splits

114
Q

principal components analysis

A

advanced data analytic technique that transforms a high-dimensional dataset into a smaller, much more manageable set of representative variables that capture most of the information in the original dataset

115
Q

principal components

A

composite variables of the existing variables generated such that they are mutually uncorrelated and collectively simplify the dataset, reducing its dimension and making it more amenable to data exploration

116
Q

loadings

A

weights of the PCs

117
Q

PCA applications

A

data visualization and feature generation

118
Q

drawbacks of PCA

A

target variable is ignored / interpretability

119
Q

k-means clustering

A

assigning each observation in a dataset into one and only one of k predefined clusters

120
Q

hierarchical clustering

A

assigning each observation in a dataset into one and only one of k clusters that are NOT predefined

121
Q

stepAIC()

A

function in the MASS package that automates stepwise model selection, in each step adding (forward selection) or dropping (backward selection) the feature to produce the greatest improvement in the model according to a certain criterion (AIC by default)

122
Q

AIC vs BIC

A

the only difference between the two is the size of the penalty that applies to the number of parameters