terms Flashcards
b) Standard error
is a measure of how close the sample mean
estimates the population mean
Standard deviation of sampling distributions is the standard error
a) Normal distribution
a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. The normal distribution appears as a “bell curve” when graphed.
c) Adjusted R-squared
Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases when the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected.
d) p-value
The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.
heteroskedasticity
A violation of the assumption of homogenous variance of the
residuals (or homoskedasticity)
standard normal distribution
The standard normal distribution is a specific type of normal distribution that has a mean of 0 and a standard deviation of 1. It is often referred to as the Z-distribution. This distribution is a special case of the general normal distribution, which can have any mean and standard deviation.
residuals
Residuals in data analysis are the differences between observed values and the values predicted by a statistical or machine learning model. They are a crucial diagnostic measure used to assess the quality and validity of a model.
null hypothesis
representing a default or baseline assumption that there is no effect, no difference, or no relationship between variables being studied. serves as a starting point for hypothesis testing.
statistical significance
Statistical significance is a concept used in hypothesis testing to determine whether the observed data in a study are unlikely to have occurred by chance alone. It helps researchers decide whether to reject the null hypothesis, which typically posits that there is no effect or no difference between groups.
standard deviations
Standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of data values. It is a crucial concept in statistics, used to understand how spread out the values in a data set are around the mean (average).
parameter
A parameter in statistics is a numerical value that describes a characteristic of an entire population. It is a fixed value that summarizes or describes an aspect of the population, such as the mean, standard deviation, or proportion.
leverage point
A leverage point is an observation with predictor values that are far from the mean of the predictor values for all observations. This makes the fitted regression model likely to pass close to these points
logistic regression
- Logistic regression is a technique that enables us to investigate
dummy variables on the left hand side.
interaction effect
An interaction effect describes a situation where the effect of one variable on the outcome variable changes depending on the value of another variable
normal distribution
continuous probability distribution that is symmetrical around its mean.