Terms Flashcards
explanatory variable
IV = predictor = regressor
response variable
DV
Guassian distribution
normal distribution. It is a continous probability distribution for a real-valued random variable.
stratum
is a subset of the population, which is being sampled
stratification
process of dividing members of the population into homogenous subgroups before sampling
Coefficient of determination
R squared
pearson correlation coefficient
One of the most widely used correlation coefficients. Graphically, this can be understood as “how close is the data to the line of best fit?”
r = 1 is perfect fit r = 0 is no fit r = -1 is perfect negative fit
p-value
probability of the value of a test-statistic being at least as extreme as the one observed in our data under the null hypothesis
How does beta behave if the estimates are consistent?
Betas converge to the true values with increasing sample size
heteroscedasticity
refers to the circumstances in wich the variability of a variable is unequal across the range of a second variable that predicts it
homoscedasticity
= having the same variance.
logarithmic scale
= log scale
Often exponential growth curves are displayed on a log scale, otherwise they would increase too quickly to fit within a small graph
cross-entropy
- commonly used in ML as a loss function
- calculates the difference between two probability distributions for a given random variable
- can be used to calculate the total entropy between the distributions
- Cross-entropy builds upon the idea of entropy from information theory and calculates the number of bits required to represent or transmit an average event from one distribution compared to another distribution.
Sensitivity
= True Positive Rate
refers to the proportion of those who have the condition that received a positive result on the test
Specificity
= True Negative Rate
refers to the proportion of those who do not have the condition that received a negative result on this test
Sensitivity vs. Specificity
For all testing, both diagnostic and screening, there is usually a trade-off between sensitivity and specificity, such that higher sensitivities will mean lower specificities and vice versa.
Standard Error
tells you how accurate the mean of any given sample from that population is likely to be compared to the true population mean. When the standard error increases, i.e. the means are more spread out, it becomes more likely that any given mean is an inaccurate representation of the true population mean.