Data Science Flashcards
What is precision?
TP/(TP+FP). Proportion of correctly classified positives.
What is recall?
TP/(TP+FN). Proportion of correctly classified positives and actual positives
What is sensitivity?
Same as recall: (TP/(TP+FN))
What is specificity?
TN/(TN+FP). “Negative” variant of recall or sensitivity.
What is accuracy?
(TP+TN)/(TP+TN+FP+FN). Proportion of correctly classified samples over all samples.
What is the coefficient of determination (r-squared)?
The proportion of the variation in the dependent variable that is predictable or explainable from the independent variable(s). 1-(residual sum of squares/total sum of squares)
What is a residual in regression analysis?
The vertical distance between a data point and the regression line
What is the total sum of squares?
The sum over all squared differences between the observations and their overal mean
What is bootstrapping?
A resampling technique used to estimate the sampling distribution of an estimator by sampling with replacement from the original sample
What is sampling with replacement?
Sampling with replacement means that after a unit (e.g., a person, an observation, etc.) is selected from the sample, it is returned to the sample before the next unit is selected. This means that the same unit can be selected more than once in the sampling process
What is bagging?
a machine learning ensemble method that stands for “bootstrapped aggregation.” It is a technique used to improve the stability and accuracy of machine learning models by training several models on different subsets of the data and averaging their predictions
What is a greedy algorithm?
An algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum
What is dynamic programming?
An algorithmic paradigm that solves problems by breaking them down into smaller subproblems, storing the solutions to these subproblems, and then using these stored solutions to solve the larger problem
What is bias?
The error that is introduced by simplifying assumptions made by the model
What is variance?
The error that is introduced by the sensitivity of the model to small fluctuations in the data