SOA PA Flashcards
bar chart
geom_bar()
box plot
geom_boxplot()
histogram
geom_histogram()
scatterplot
geom_point()
smoothed line
geom_smooth()
ggplot alpha
transparency parameter
display separate plots
facet_wrap(~, ncol = ))
two-dimensional grid of plots
facet_grid( ~ , ncol = )
adjust axes range
xlim() & ylim()
convert axes to log scales
scale_x_log10() & scale_y_log10()
edit titles, subtitles, and captions
labs(), xlab(), ylab(), ggtitle()
display multiple graphs
grid.arrange() in gridExtra
numeric var descriptive stats code
summary()
numeric var distribution displays
histograms, box plots
correct for skewness
log transformation
categorical var descriptive stats code
table()
categorical var graphical displays
bar charts
numeric v numeric descriptive stats code
cor()
numeric v numeric graphical display
scatterplot
numeric v categorical descriptive stats
???
numeric v categorical graphical display
split boxplots, histograms
categorical v categorical descriptive stats code
table()
discrete var
restricted to certain values
continuous var
can assume any value in theory
levels
predefined values of a categorical var
supervised learning
understand relationships of predictors and target var
unsupervised learning
no target var; solely var relationship extraction
numeric target predictive model
regression model
categorical target predictive model
classification model, classifier
training/test split
70-80%/20-30%
root mean squared error
aggregated prediction errors to measure regression accuracy
test classification error rate
measures classifier accuracy
cross-validation
technique to select hyperparameters
hyperparameters
parameters that have to be supplied in advance and are not optimized as part of the model training process
bias-variance tradeoff
more complex models have lower bias but higher variance than a less flexible model
bias
difference between the expected value and the true value of the signal function
variance
quantifies the amount by which f(x) would change if a different training set is used;
irreducible error
variance of the noise
more complex model has
lower bias but higher variance
overfitting
when a model is unnecessarily complex, resulting in the misinterpretation of noise as the underlying signal
underfitting
when a model is too general/basic, resulting in little or no capturing of the signal
feature
derivations from the original variables and provide an alternative, more useful view of the information contained in the dataset
variables
raw measurement that is recorded and constitutes the original dataset prior to any data transformation
feature generation
the process of developing new features based on existing variables in the data
feature selection
the procedure of dropping features with limited predictive power and therefore reducing the dimension of the data
combining sparse categories with others
ensures that each level has a sufficient number of observations / preserves the differences in the behavior of the target variable among different factor levels
simple linear regression
regression using one predictor
multiple linear regression
regression using more than one predictor