Week 1 Flashcards
Review of Statistical Concepts
Correlation coefficient
The correlation coefficients try to measure how well a linear model can describe the strength of the relationship between two variables.
positive/negative or none at all
The start point for analyzing the relationships between two variables; regression builds on this and goes beyond
Sampling distributions
A probability distribution of a statistic that is obtained through repeated sampling of a specific population
Regression analysis
A technique to understand and quantitatively summarize relations between variables
A variable is both …
- An operationalized dimension of a concept
- An attribute of an observation
Linear fit (linear model)
Describes the relationship between a continuous response variable and one or more explanatory variables using a linear function.
Mean
Center of a distribution (the average of a dataset)
Deviance
An observation’s deviance is how far it lies from the mean
Variance
The spread between numbers in a data set
The Variance is the sum of the squared deviances
Variance is a statistical measurment used to determine how far each number is from the mean and from every other number in the set
Covariance
A measure of the relationship between two random variables and to what extent, they change together.
A change in one variable is equal to change in another variable
- Covariance(X,Y) = sum of products of deviances in X and Y for all datapoints I
- Covariance and variance lead to correlation
Central limit theorem
As your sample size gets larger, the sampling distribution will get more and more normally distributed
Normal distribution
A symmetrical, bell-shaped distribution, with increasingly fewer observations the further from the distribution
Parameter
Characteristics of a population
Statistic
Characteristics of a sample