Final Flashcards
Residual
distance between the line and any given point
SSE
takes residuals, squares them and adds them up.
The regression shows us the best fitting line in terms of sum of squared errors
Homoskedasticity
When the random variable, X, has the same variance for all observations of X. This isn’t a problem.
Heteroskedasticity
When the random variable, X, does not have the same variance for all observations of X.
Multicollinearity
When an independent variable is highly related to other independent variables, the variance of the coefficient we estimate for that variable will be high.
Variables in a data set come in different forms:
- Dummy (AKA binary or dichotomous) variables • Discrete versus continuous variables
- Ordinal variables
- Nominal variables
Data sets themselves also come in different forms:
- Cross-sectional data
* Panel (time series) data
Logged variables
one version of a transformed variable
Binary (dummy) variables are very useful.
- They are regularly used to control for specific effects (Think about the South in Larry Bartels’ article)
- They are used in experiments to identify the treated (1) and control (0) units.
Binary variables make difference
in means across two groups, or ‘average treatment effects’, very easy to calculate.
Discrete data
Comes in ’bins’ or groups.
• Example: On a scale of 1 to 5, how much do you like this class? 1, 2, 3, 4, 5. (5! Obv.)
• Polity score is another example.
• Discrete data lacks precision, The bin’s may be clear but they may not. We may not know what going from 1-2 means.
Continuous data
Can take any value in a sequence. Examples:
• Annual income
• votes for each candidate • percentages.
Descriptive Data
it describes how the world is. Often, categorical data comes from qualitative research.
Categorical data can be ordinal or nominal.
• Ordinal can be ordered (low, medium high—Comparable to discrete data)
• nominal cannot be ordered (majors: political science, economics, sociology, states).
class(data$variable)
to find out if something is a factor
lev
to find out the levels of that factor and store it as an object