Steury PreReqs Flashcards

1
Q

Full list

A
  • Hypothesis testing and why we use it - What a normal distribution is and why we use it - What a p value is and how it is used - Null hypothesis how is it used - Regression, what’s it used for, how do we do it - Sum of squared error - Sum of square due to - R squared - Confidence interval - t test - ANOVA, what it is, how to do it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

F-statistic

A

Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means.

Variance is the square of the standard deviation. For us humans, standard deviations are easier to understand than variances because they’re in the same units as the data rather than squared units.

However, many analyses actually use variances in the calculations. F-statistics are based on the ratio of mean squares. The term “mean squares” may sound confusing but it is simply an estimate of population variance that accounts for the degrees of freedom (DF) used to calculate that estimate.

F-Statistic = Numerator: Variation Between Sample Means/ Denominator: Variation within the samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

F-Statistic Continued

A

The F-statistic is the test statistic for F-tests. In general, an F-statistic is a ratio of two quantities that are expected to be roughly equal under the null hypothesis, which produces an F-statistic of approximately 1.

The F-statistic incorporates both measures of variability discussed above. Let’s take a look at how these measures can work together to produce low and high F-values. Look at the graphs below and compare the width of the spread of the group means to the width of the spread within each group.

The low F-value graph shows a case where the group means are close together (low variability) relative to the variability within each group. The high F-value graph shows a case where the variability of group means is large relative to the within group variability. In order to reject the null hypothesis that the group means are equal, we need a high F-value.

For our plastic strength example, we’ll use the Factor Adj MS for the numerator (14.540) and the Error Adj MS for the denominator (4.402), which gives us an F-value of 3.30.

Is our F-value high enough? A single F-value is hard to interpret on its own. We need to place our F-value into a larger context before we can interpret it. To do that, we’ll use the F-distribution to calculate probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T test

A

The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design. The question the t-test addresses is whether the means are statistically different. What does it mean to say that the averages for two groups are statistically different? Consider the three situations shown in Figure 2. The first thing to notice about the three situations is that the difference between the means is the same in all three. But, you should also notice that the three situations don’t look the same – they tell very different stories. The top example shows a case with moderate variability of scores within each group. The second situation shows the high variability case. the third shows the case with low variability. Clearly, we would conclude that the two groups appear most different or distinct in the bottom or low-variability case. Why? Because there is relatively little overlap between the two bell-shaped curves. In the high variability case, the group difference appears least striking because the two bell-shaped distributions overlap so much. This leads us to a very important conclusion: when we are looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their scores. The t-test does just this. The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ANOVA, what it is, how to do it

A

An ANOVA test is a way to find out if survey or experiment results of three or more groups are significant. In other words, they help you to figure out if you need to reject the null hypothesis or accept the alternate hypothesis. Basically, you’re testing groups to see if there’s a difference between them. Examples of when you might want to test different groups: - A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You want to see if one therapy is better than the others. - A manufacturer has two different processes to make light bulbs. They want to know if one process is better than the other. - Students from different colleges take the same exam. You want to see if one college outperforms the other. One Way/Two Way One-way or two-way refers to the number of independent variables (IVs) in your Analysis of Variance test. One-way has one independent variable (with 2 levels) and two-way has two independent variables (can have multiple levels). For example, a one-way Analysis of Variance could have one IV (brand of cereal) and a two-way Analysis of Variance has two IVs (brand of cereal, calories). Groups/Levels Groups or levels are different groups in the same independent variable. In the above example, your levels for “brand of cereal” might be Lucky Charms, Raisin Bran, Cornflakes — a total of three levels. Your levels for “Calories” might be: sweetened, unsweetened — a total of two levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regression, what’s it used for, how do we do it

A

Regression analysis is used in stats to find trends in data. Regression analysis will provide you with an equation for a graph so that you can make predictions about your data. It will also give you a slew of statistics (including a p-value and a correlation coefficient) to tell you how accurate your model is.

Linear regression is the most widely used statistical technique; it is a way to model a relationship between two sets of variables. The result is a linear regression equation that can be used to make predictions about data.

Multiple regression analysis is used to see if there is a statistically significant relationship between sets of variables. It’s used to find trends in those sets of data. Multiple regression analysis is almost the same as simple linear regression. The only difference between simple linear regression and multiple regression is in the number of predictors (“x” variables) used in the regression. - Simple regression analysis uses a single x variable for each dependent “y” variable. For example: (x1, Y1). - Multiple regression uses multiple “x” variables for each dependent variable: (x1)1, (x2)1, (x3)1, Y1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R squared

A

Regression gives you an R squared value. This number tells you how good your model is, or how much of the variance in y is explainable by x. The values range from 0 to 1, with 0 being a terrible model and 1 being a perfect model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What a normal distribution is and why we use it

A

A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the mean: • 68% of the data falls within one standard deviation of the mean. • 95% of the data falls within two standard deviations of the mean. • 99.7% of the data falls within three standard deviations of the mean. Properties - The mean, mode and median are all equal. - The curve is symmetric at the center (i.e. around the mean, μ). - Exactly half of the values are to the left of center and exactly half the values are to the right. - The total area under the curve is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What a p value is and how it is used

A

The p-value or probability value is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary would be greater than or equal to the actual observed results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Null hypothesis how is it used

A

The hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Confidence interval

A

A Confidence Interval is a range of values we are fairly sure our true value lies in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Total sum of squares

A

In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. It is defined as being the sum, over all observations, of the squared differences of each observation from the overall mean. There is another notation for the SST. It is TSS or total sum of squares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sum of squares regression

A

It is the sum of the differences between the predicted value and the mean of the dependent variable. Another common notation is ESS or explained sum of squares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sum of squares error

A

The error is the difference between the observed value and the predicted value. it is also known as RSS or residual sum of squares. Residual as in: remaining or unexplained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Chi square

A

The Chi Square statistic is commonly used for testing relationships between categorical variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population; they are independent. An example research question that could be answered using a Chi-Square analysis would be: Is there a significant relationship between voter intent and political party membership?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Post-hoc tests

A
17
Q

Standard error

A

The standard error(SE) is very similar to standard deviation. Both are measures of spread. The higher the number, the more spread out your data is. To put it simply, the two terms are essentially equal — but there is one important difference. While the standard error uses statistics (sample data) standard deviations use parameters (population data). (What is the difference between a statistic and a parameter?).

In statistics, you’ll come across terms like “the standard error of the mean” or “the standard error of the median.” The SE tells you how far your sample statistic (like the sample mean) deviates from the actual population mean. The larger your sample size, the smaller the SE. In other words, the larger your sample size, the closer your sample mean is to the actual population mean.

18
Q

Variance

A