Key Terms part 2 Flashcards

1
Q

What is Kurtosis?

A

It is a measure of the degree to which a distribution is “peaked” or flat, in comparison to a normal distribution whose graph is characterized by a bell shaped appearance. This is part of the shape – one of the three characteristics that completely describe any distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three most commonly used measures of central tendency?

A

Mode, Median and Mean. The goal of measuring central tendency is to describe a distribution of scores by determining a value/s that identifies the center of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What statistics do you use when hypothesis testing for t statistics ? What is included in the 4 part process?

A

1 - state the hypotheses (H0 and H1, including the alpha level, for ex: .05)
2 - set the critical region for z, using the chart
3 - compute the statistics:
First, calculate the standard error
Then, compute the test statistic
4 - compare the value you computed for z to your decision criteria. Make a decision regarding your hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you report results following hypothesis testing? When are results significant?

A

If a result is significant, it is very unlikely to occur when the null hypothesis is true. You can reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a shape of the t-distribution?

A

It is usually flatter and more variable than a normal z distribution, because the bottom of the formula (the sample variance or s2) changes from one sample to the next, meaning the estimated standard error also varies. In a z distribution, the bottom of the formula does not change.

T statistics are more variable than z scores. T statistics are flatter and more spread out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two assumptions regarding hypothesis testing with t statistic?

A
  1. The values in the sample must consist of independent observations, where two events (or observations) are independent if the occurrence of the first event has no effect on the probability of the second event.
  2. The population that is sampled must be normal. This is important with very small samples. This assumption can be violated with larger samples, without affecting the validity of the hypothesis test.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When hypothesis testing with t statistic, when do you reject the null hypothesis?

A

You reject the null hypothesis when the difference between the data and the hypothesis (numerator) is much greater than expected (denominator) and we obtain a large value for t. We conclude the data is not consistent with the hypothesis and we reject H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When hypothesis testing with t statistic, when do you fail to reject HO?

A

When the difference between the data and the hypothesis is small relative to the standard error, we obtain a t statistic near zero and we fail to reject H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is degrees of freedom?

A

Df = n-1

It is defined as the number of observation in the data that are free to vary when estimating statistical parameters. (hat example).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the independent measures design?

A

A research design that uses a separate group of participants for each treatment condition. Also called between subjects research design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the repeated measures design?

A

A research design, also called within subject design, is one in which the dependent variable is measure two or more times for each individual in a single sample. The same group of subjects is used in all treatment conditions.

Main advantage is that it uses exactly the same individuals in all treatment conditions. Sometimes researchers will try to approximate this with a matched-subjects design in which each individual from one sample is matched with an individual in another sample, so that they are equivalent in the variable that the researcher is trying to control.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the confidence interval?

A

It is the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re sample the population in the same way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the confidence interval mean for hypothesis testing?

A

A mean difference of zero is exactly what would be predicted by the null hypothesis if we did a hypothesis test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the independent measures t test?

A

It compares two groups, to see if there is any difference. All formulas are doubled. You’ll need to compute the standard error for both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 assumptions regarding the independent-measures t-formula?

A
  1. The observations within each sample must be independent.
  2. The two populations from which samples are selected must be normal.
  3. The two populations from which the samples are selected must have equal variance. <– This is referred to as homogeneity of variance, it is most important when there is a large discrepancy between sample size. SPSS uses Levene’s Test for equality of variances.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Levene’s Test?

A

Levene’s Test tests the equality of variances between two populations that you are comparing in your independent-measures t formula. If the test is significant (p .05) the variances are not equal and you should use the t statistic computed with equal variances not assumed and then you should use the second row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a constraint of a T-test?

A

You are just comparing two groups. If you have 3 groups or levels to test, you should use ANOVA, which gives you a bigger, broader picture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is ANOVA?

A

Analysis of variance is a hypothesis testing procedure that is used to evaluate mean differences between two or more treatments/populations. It uses qualitative variables. The major advantage is that it can be used to compare two or more treatments.

ANOVA asks if there are differences between the groups and also looks at variance within each group.

You use the F distribution/statistic with ANOVA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is sum of squares?

A

It is the difference between mean value of the sample and a certain value. It gives you an idea of how much variance you’ll have.

s2 or ss = ss/n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the difference between an independent variable and a quasi-independent variable?

A

Independent variable: a research manipulates a variable to create the treatment conditions (ex Treatment A, B, C)

Quasi-Independent variable: a researched uses a non manipulated variable to designate groups (ex occupational status)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the factor?

A

In ANOVA, the variable that designates the groups being compared is called a factor. The conditions or values that make up a factor are called the levels of the factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the example statistical hypotheses for ANOVA?

A

If you are comparing three different conditions, the hypotheses would be as follows:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the test statistic for ANOVA?

A

The F ratio/ distribution.

If F is greater than 1, we will reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the goal of ANOVA and what are the steps?

A

The final goal is F ratio.

Step 1 - calculate the sum of squares
2 - compute the mean
3 - compute the F statistic and compare to F distribution chart. If the number is lower than the number we find in the table, we reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the logic of ANOVA?

A

The total variability includes both the between treatment variance and the within-treatment variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When looking at the F-ratio, what are you considering?

A
  • Numerator: Degrees of freedom between
  • Denominator: degrees of freedom within

When there are no systematic treatment effects, the differences between treatments (in the numerator) are entirely caused by random, unsystematic factors.

If the numerator and denominator roughly equal, the F ratio should be around 1, showing no variance and H0 is true.

In ANOVA the denominator of F ratio is called error term.

When the treatment does have an effect, and there are systematic differences between samples, then the numerator should be noticeably larger.

F values are always positive numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is degrees of freedom?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why do we not do multiple t-tests instead of ANOVA?

A

In a t-test, you set an alpha level that determines the risk of a Type I error. Because each test has a risk of a Type I error, the more tests you do, the more risks there is.

Advantage of ANOVA is that is performs all three comparisons at the same time, using one test with one alpha level to evaluate the mean differences.

29
Q

What post-hoc tests should you be aware of for ANOVA and when should they be done?

A

Post-hoc tests should be done after ANOVA when you reject H0 and there are three or more treatments.

They are done to determine exactly which mean differences are significant and which are not.

The three to know are:
Tukey’s
Bonferroni’s
Scheffe’s

30
Q

What are the 4 types of tests covered in the course and what are they used for?

A

Independent T Test - used for 2 qualitative categories with quantitative variables.

ANOVA - used for more than 2 qualitative categories with quantitative variables.

Correlation/Regression - used to compare quantitative variables.

Chi-Square - used to compare qualitative data.

31
Q

What is correlation?

A

A statistical technique used to measure and describe the relationship between two variable on an interval or ratio scale (quantitative variables).

Usually the variables are simply observed in the environment, not manipulated.

32
Q

What does the direction of the linear relationship show?

A

Positive Correlation: two variable tend to change in same direction, as the value of X increases, the Y variable also increases, as well as when X decreases, Y will decrease.

Negative Correlation: two variables go in opposite directions. As X variable increase, the Y variable decreases.

No Correlation: no relationship between variables.

33
Q

In linear relationship, what is the strength of the relationship?

A

The consistency of the relationship is measured by the numerical value of the correlation.

r is Pearson correlation.

34
Q

What is the Pearson correlation?

A

The Pearson correlation (r) measures the degree and the direction of the linear relationship between two variables. For this, we need sum of products deviation, or SP.

35
Q

What is the sum of products deviation, SP?

A

The sum of products deviation measures the amount of covariability between two variables. It is needed to calculate the Pearson correlation.

36
Q

In the Pearson’s correlation, what does the size of correlation mean?

A

A correlation coefficient of:
.10 is weak/small
.30 moderate correlation
.50 or larger is strong/large correlation

37
Q

What 4 things are kept in mind when interpreting correlation?

A
  1. Correlation simply describes a relationship between two variables, it does not explain why they are related It should not be interpreted as cause and effect.
  2. When correlation is computed from scores that do not represent the full range of possibilities, you must be cautious in interpreting the correlation. Do not generalise.
  3. A single outlier drastically alters the value for the correlation. Always check the scatter plot and remove outliers.
  4. Coefficient of Determination, the value r2, measures the proportion of variability in one variable that can be determined from the relationship with the other variable. It ranges from 0-1.
38
Q

What is the coefficient of determination, r2?

A

r2 measures the proportion of variability in one variable that can be determined from the relationship to the other variable and ranges from 0-1. It is the square correlation (r) between predicted y scores and actual y scores.

0 = shows no relationship
1 = shows a perfect relationship
Anything in between may show some relationship, but not strong.

39
Q

What are the critical values in Pearson’s correlation?

A

Using the table, to be significant the sample correlation, r, must be greater than or equal to the critical value in the table.

40
Q

What is an alternative to Pearson’s correlation?

A

Spearman correlation: used in two situations, when both x and y variables are measured on ordinal scales or the distribution is not normal, and when you would like to measure the consistency of the relationship, independent of its form.

Monotonicity is “less restrictive” than that of a linear relationship.

41
Q

What is an example of when you would use Spearman correlation instead of Pearson’s correlation?

A

When one expects the data to show a consistently one-directional relationship, but not necessarily a linear relationship, for example in skill building and practice.

42
Q

What are the assumptions of ANOVA?

A
  1. Observations are independent.
  2. Population are normal.
  3. Populations must have an equal variance.

If there are unequal sample sizes, ANOVA is still valid, as long as the samples are large.

43
Q

What is the linear equation?

A

Y = a +bX

44
Q

What is regression?

A

It is the statistical technique for finding the best fitting straight line for a set of data. The resulting straight line is called the “regression line.”

45
Q

How do you find the best fit in regression?

A

The first step is to define mathematically the distance between the line and each data point.

Distance = Y-Y^

Then you square each distance to obtain a uniformly positive measure of error. The best fitting line is the one that has the smallest total squared error.

Y^ = a+bX

46
Q

What is the goal of the regression equation?

A

Y^= a+bX

The goal is to find the line that has the smallest total squared error.

47
Q

How to you analyse regression/ test the significance?

A

The regression analysis uses an F ratio to determine whether the variance predicted by the regression equation is significantly greater than would be expected if there were no relationship between X and Y.

H0: the regression equation does not account for a significant proportion of the variance in the Y scores.

H1: the regression equation accounts for a significant proportion of the variance in the Y scores.

48
Q

What are the assumptions underlying correlation?

A
  1. The values in the sample must consist of independent observations.
  2. For any fixed value of X, Y is normally distributed. (check skewness and kurtosis, check outliers, Kolmogorov Smirnov and Shapiro Wilkes tests and QQ plot.
  3. Linearity: the relation between the dependent variable and the independent variable is linear when all other independent variables are held constant.
  4. Homoscedasticity: The variance of the residual is the same for any value of X.
  5. Absence of collinearity (in multiple regression). Condition in which some variables are highly correlated, and therefore the variance of an estimated regression coefficient is increased because of collinearity. This is because there is a redundancy between predictor variables.
49
Q

What is Homoscedasticity?

A

It is an assumption underlying regression that states the variance of the residual is the same for any value of X.

50
Q

What is collinearity?

A

In a multiple regression, it is a condition in which some variables are highly correlated, therefore the variance of an estimated regression coefficient is increased. This is because there is a redundancy between predictor variables. We are looking for absence of collinearity as part of our assumptions.

An example would be BMI correlating with Height.

51
Q

What is multiple regression analysis?

A

It allows us to use several variables at once to explain the variation in a continuous dependent variable. You are able to isolate the unique effect of one of the variables on the continuous dependent variable while taking into consideration that other variables are affecting it too. You can control for other variables to demonstrate whether bivariate relationships are spurious (wrong).

52
Q

What is a Beta coefficient?

A

It compares the strength of the effect of each individual independent variable to dependent variable.

Unstandarized coefficients are used in making predictions. It is usually noted as Beta.

Standarization makes things comparable. We use it to compare variables. It is usually noted as B.

53
Q

What is multiple coefficient of determination/ adjusted R^2?

A

R2 is the percentage of estimation error that we have been able to explain away by using the regression model.

It measures the proportion of variability in one variable that can be determined from the relationship with other variables.

R2 adjusted is the adjusted coefficient of determination that takes into consideration the number of predicting variables. It increases only when new independent variables are added, which increases the power of the regression equation.

54
Q

What is a residual?

A

The difference between the observed value of the dependent variable (y) and the predicted value (y^) is called residual (e). Each data point has one residual.

You check the residuals of a regression, to see if it has achieved its goal of explaining as much variation as possible in a dependent variable while respecting the underlying assumption.

55
Q

What does homoscedasticity mean?

A

It means having the same scatter. You check it as an assumption of residuals and also regressions.

The opposite is heteroscedasticity or different scatter.

56
Q

What are dummy variables?

A

Dummy variables are categorical variables that represent only two groups, for example gender. It is then coded at 0 and 1 and can be used as independent variables in regression analysis.

57
Q

When do you use dummy variables?

A

You use dummy variables in linear regression model to examine differences between categories or groups, such as males and females, two age groups, etc.

You are looking for the mean difference between groups coded as 0 and 1.

58
Q

How do you code dummy variables?

A

As 0 and 1. The reference group is coded as 0.

59
Q

What is a Chi Square test?

A

It is used for both qualitative and quantitative variables and uses the frequency data from a sample to evaluate the relationship between two variables in the population. A contingency table is created by classifying each individual sample on both of the two variables. The frequency distribution for the sample is then used to test the hypotheses.

It can only compare categorical variables.

60
Q

What is the hypothesis testing in Chi Square?

A

The null hypothesis states that the two variables being measured are independent. The value obtained for one variable is not related to or influence by the value for the second variable.

H1 says there is a relationship.

61
Q

What are the assumptions of a Chi Square test?

A
  1. Two categorical variables;
  2. Two or more categories (groups) for each variable;
    3 Independence of observations (no relationship between subjects in the group, they are not paired in anyway like pretest/post test);
  3. Size of Expected Frequencies: A chi square test should not be performed when the expected frequency of any cell is less than 5.
62
Q

What is the first step in a chi square test?

A

First step is to construct a hypothetical sample that represents how the sample distribution (expected frequencies) would look if it were in perfect agreement with the proportions stated in the null hypothesis.

Next, you compute the statistic to determine how well the data (observed frequencies) fit the null hypothesis (expected frequencies).

63
Q

What does the Chi-square formula measure?

A

It measures the discrepancy between the data (f0 values = observed) and the hypothesis (fe values = expected). It simply measures how well the data fit the hypothesis.

64
Q

What does Chi-square distribution look like?

A

Typical distribution is positively skewed.
All chi-squares are 0 or larger (no negative values).
When H0 is true, you expect the data to be close to the hypothesis, meaning you expect the chi square values to be small when H0 is true.

The shape of the distribution changes based on the different values for df.

65
Q

How do you use the Chi Square Table?

A
66
Q

What is validity?

A

Validity is that a measurement measure what it is supposed to measure.

67
Q

What is reliability?

A

Reliability produces consistent results over and over again (assuming that what we are measuring isn’t changing).

68
Q

What is validity and reliability?

A

Validity is accuracy: the extent to which the instrument measures what it has to measure.

Reliability is the overall consistency of a measure. A measure has high reliability if it produces similar results under consistent conditions.

69
Q

What are the z-test statistic critical values?

A

0.05 – +/- 1.96
0.01 – +/- 2.58
0.001 – +/- 3.30