Stata Concepts Flashcards
Alternative Hypothesis
Ha, In statistical hypothesis testing, the alternative hypothesis is a position that states something is happening, a new theory is true instead of an old one (null hypothesis).[1] It is usually consistent with the research hypothesis because it is constructed from literature review, previous studies, etc. However, the research hypothesis is sometimes consistent with the null hypothesis.
Bivariate Regression
2 variable regression, Bivariate Regression Analysis involves analysing two variables to establish the strength of the relationship between them. The two variables are frequently denoted as X and Y, with one being an independent variable (or explanatory variable), while the other is a dependent variable (or outcome variable).
Chi-Square Test
or χ2, looks at how the cases are dispersed across values of the dependent variable. Form of cross-tabulation. Interpreting the χ2 statistic depends on the degrees of freedom and the level of significance/level of confidence chosen by the researcher. The degrees of freedom and level of significance determine the critical value: the upper plausible boundary of random error. If the χ2 statistic is greater than the critical value, then we reject the null hypothesis of there being no relationship between the variables in favor of the alternative that there is a relationship. Doesn’t show direction or magnitude of relationship
Confidence Intervals
a range of values which you are X percent confident captures the population parameter (or the true value of the variable). We are confident about the interval itself (technically how we computed it), and not the true value of the variable. We aren’t saying we are X% confident that the true value of the variable falls within the interval. Sample statistic ± (t-value x standard error of the sample statistic). For the 95 percent confidence interval, we use the formula: Sample statistic ± (2 x standard error of the sample statistic)
Confidence Level
The p-value of 0.05 corresponds to the 95 percent confidence level
Correlation Coefficient
r, ranges from -1 to +1, sign of r refers to direction of relationship (positive or negative), 0 means no linear relationship, further away from 0: stronger linear correlation.
Cross-Tabulation
shows dist. of cases across the values of a DV for cases that have diff. values of IV, (when IV=value x, how often is it paired with values of y?). IV: column, DV: row, calculate percentages of IV, compare percentages across columns at same level of DV and make comparisons where we see changes in IV. How does IV affect DV? Not comparing DV vs. DV.
Dummy Variable
have any 2 possible values (0 or 1), 0= base/excluded category, 1 unit change= change in category
Interval Variable
numeric codes indicate precise quantities and communicate exact differences b/t units of analysis and differences in value, necessary for correlation coeff.
Multicollinearity
such a strong relationship b/t IVs that its difficult to estimate partial effects of each IV on DV. Another way of thinking about multicollinearity: the independent variables aren’t sufficiently independent from one another. If the correlation coefficient of two variables is 0.8 or higher, then including both in the multiple regression will lead to poor estimates. You can also look at the change in adjusted R-square value when a variable that is correlated with one of the IVs is included: if there two IVs are strongly related, then there will not be much change in the adjusted R-squared.
Multiple Regression
The big innovation of multiple regression is that it lets us isolate the effect of one independent variable on the dependent variable while controlling for the effects of the other independent variables. What we are now calculating are the partial regression coefficients, which estimate the mean change in the DV for every one unit change in the IV, controlling for the other independent variables in the model.
Nominal Variable
numeric codes that indicate categories, not actual quantities i.e. 1-North, 2-East, 3-South, 4-West
Null Hypothesis
Ho, In inferential statistics, the null hypothesis is a general statement or default position that there is nothing significantly different happening, like there is no association among groups or variables, or that there is no relationship between two measured phenomena.
Ordinal Variable
numeric codes indicate rank and relative differences b/t units of analysis, don’t know exact values
P-Value
<0.05: reject the null hypothesis, 5% or smaller probability that our sample statistic would be observed if the null hypothesis about the population parameter were true. In statistical hypothesis testing, the p-value or probability value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct.