Stata Concepts Flashcards

1
Q

Alternative Hypothesis

A

Ha, In statistical hypothesis testing, the alternative hypothesis is a position that states something is happening, a new theory is true instead of an old one (null hypothesis).[1] It is usually consistent with the research hypothesis because it is constructed from literature review, previous studies, etc. However, the research hypothesis is sometimes consistent with the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bivariate Regression

A

2 variable regression, Bivariate Regression Analysis involves analysing two variables to establish the strength of the relationship between them. The two variables are frequently denoted as X and Y, with one being an independent variable (or explanatory variable), while the other is a dependent variable (or outcome variable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chi-Square Test

A

or χ2, looks at how the cases are dispersed across values of the dependent variable. Form of cross-tabulation. Interpreting the χ2 statistic depends on the degrees of freedom and the level of significance/level of confidence chosen by the researcher. The degrees of freedom and level of significance determine the critical value: the upper plausible boundary of random error. If the χ2 statistic is greater than the critical value, then we reject the null hypothesis of there being no relationship between the variables in favor of the alternative that there is a relationship. Doesn’t show direction or magnitude of relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Confidence Intervals

A

a range of values which you are X percent confident captures the population parameter (or the true value of the variable). We are confident about the interval itself (technically how we computed it), and not the true value of the variable. We aren’t saying we are X% confident that the true value of the variable falls within the interval. Sample statistic ± (t-value x standard error of the sample statistic). For the 95 percent confidence interval, we use the formula: Sample statistic ± (2 x standard error of the sample statistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Confidence Level

A

The p-value of 0.05 corresponds to the 95 percent confidence level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Correlation Coefficient

A

r, ranges from -1 to +1, sign of r refers to direction of relationship (positive or negative), 0 means no linear relationship, further away from 0: stronger linear correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cross-Tabulation

A

shows dist. of cases across the values of a DV for cases that have diff. values of IV, (when IV=value x, how often is it paired with values of y?). IV: column, DV: row, calculate percentages of IV, compare percentages across columns at same level of DV and make comparisons where we see changes in IV. How does IV affect DV? Not comparing DV vs. DV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dummy Variable

A

have any 2 possible values (0 or 1), 0= base/excluded category, 1 unit change= change in category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interval Variable

A

numeric codes indicate precise quantities and communicate exact differences b/t units of analysis and differences in value, necessary for correlation coeff.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multicollinearity

A

such a strong relationship b/t IVs that its difficult to estimate partial effects of each IV on DV. Another way of thinking about multicollinearity: the independent variables aren’t sufficiently independent from one another. If the correlation coefficient of two variables is 0.8 or higher, then including both in the multiple regression will lead to poor estimates. You can also look at the change in adjusted R-square value when a variable that is correlated with one of the IVs is included: if there two IVs are strongly related, then there will not be much change in the adjusted R-squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multiple Regression

A

The big innovation of multiple regression is that it lets us isolate the effect of one independent variable on the dependent variable while controlling for the effects of the other independent variables. What we are now calculating are the partial regression coefficients, which estimate the mean change in the DV for every one unit change in the IV, controlling for the other independent variables in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nominal Variable

A

numeric codes that indicate categories, not actual quantities i.e. 1-North, 2-East, 3-South, 4-West

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Null Hypothesis

A

Ho, In inferential statistics, the null hypothesis is a general statement or default position that there is nothing significantly different happening, like there is no association among groups or variables, or that there is no relationship between two measured phenomena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ordinal Variable

A

numeric codes indicate rank and relative differences b/t units of analysis, don’t know exact values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

P-Value

A

<0.05: reject the null hypothesis, 5% or smaller probability that our sample statistic would be observed if the null hypothesis about the population parameter were true. In statistical hypothesis testing, the p-value or probability value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Population Parameter

A

characteristics of population (entire universe of cases a researcher wishes to study)

17
Q

Regression

A

a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost).

18
Q

Regression Coefficient

A

b, the slope of the regression line, shows both direction and magnitude of relationship, interpret by looking at p-value

19
Q

Regression Line

A

y=a+b(x), command regress variables

20
Q

R-Squared

A

tells us the proportion of variation in DV that we can account for with the IV, ranges from 0-1, tells size of contribution IV makes to DV or completeness of relationship

21
Q

Sample Statistic

A

estimate of a population parameter based on a sample drawn from a population

22
Q

Standard Error

A

s/√ n, s= standard deviation, n= sample size, a measure of the statistical accuracy of an estimate, equal to the standard deviation of the theoretical distribution of a large population of such estimates. Also takes the size of the sample into account.

23
Q

Statistical Significance

A

p-value must be ≤ 0.05 in order for the results of our hypothesis tests to reach statistical significance. Ex. If Stata produced a p-value of 0.23, then we would say that we don’t have statistically significant evidence to reject the null hypothesis.

24
Q

T-Tests

A

(one-sample and two-sample): what we are doing in order to test our null hypotheses and do descriptive inference are called t-tests. So, is this t the same as the t-value that we use to calculate the 95 percent confidence interval? No. (Sorry.) The t in the t-test refers to the ratio of the difference between the sample statistic and the hypothesized population parameter to the standard error of the sample statistic. This is the Student’s t-statistic.