Research Methods II Flashcards
Frequency distribution
Organized tabulation of individuals in each category on the scale of measurement.
f (frequency)
Frequency of a particular score.
Cumulative frequency
Accumulation of individual scores as one moves up the scale. Lowest score, sum of all frequencies, and all scores below it. Highest score should have cumulative frequency equal to total sample size.
Cumulative percentile rank
Accumulation of percent of all scores as one moves up the scale. Start with lowest score divide the cumulative frequency of a particular score by the total sample size.
Mean
Sum of scores divided by number of scores. Average.
Median
Score divides distribution in half. 50th percentile.
Mode
Score in distribution with greatest frequency.
Degrees of Freedom (v or df)
Number of values used to estimate a parameter minus the number of parameters to be estimated.
Why use df in sample SD?
All scores in a set of scores are free to vary, EXCEPT for the last score. Last score is restricted if the mean (or the sum) and the number of scores are known. The correct way to get the UNBIASED estimators of the population is to divide the square deviations by N - 1.
Transformation of the Scale for the SD
- Adding or subtracting a constant to each score in a distribution will not change the standard deviation but will change the mean by the same constant.
- Multiplying or dividing each score by a constant causes the standard deviation and the mean to be multiplied or divided by the same constant.
Symmetrical distribution
Distribution which the left side of the distribution “mirrors” the right side of the distribution.
Skewed distribution
Distribution is skewed if one tail is longer than the other. Long tail in positive direction = positive skew. One tail in negative direction = negative.
Order of Mode, Median, & Mean for Positive skew
Left to right:
- Mode
- Median
- Mean
Order of Mode, Median & Mean for negative skew
Left to right:
- Mean
- Median
- Mode
Kurtosis
“Peakedness” or “flatness” of distribution. How fat or thin a distribution is. Degree of frequency distribution is flat (low kurtosis) or peaked (high kurtosis).
Mesokurtic
Distribution with zero kurtosis. Normal distribution.
Leptokurtic
Distribution with positive kurtosis. Acute peak around the mean, fat tails.
Platykurtic
Distribution with negative kurtosis. Small peak around mean, thin tails.
Bimodal distribution
2 modes.
Rectangular distribution
Mean, median & no mode. No mode because all scores have the same frequency.
Sampling error
The amount of error between statistics calculated from sample to corresponding population parameter.
A sampling distribution
Distribution selected by all possible samples of a specific size from the population.
Distribution of sample means
Collection of sample means for all possible random samples of a particular size (n) that can be obtained from the population.
Central Limit Theorem
For any population with mean µ and standard deviation σ, the distribution of sample means for a sample size of n will approach normal distribution with a mean of µ and a standard deviation of σ/ square root n as n approaches infinity.
The Standard Error of Xbar
The standard deviation of the distribution of sample means.
Law of Large Numbers
The larger the sample size, the more probable it is that the sample mean will be close to the population mean.
Confidence Intervals
Used to help estimate or feel confident that the actual µ is within a certain range of the mean of means from many samples.
Statistical Model
Statistical representation of the real world.
A Simple Statistical Model
The mean is a hypothetical value (i.e., doesn’t have to be a value that actually exists in the data set). As such, the mean is a simple statistical model.
Measuring the “Fit” of the Model
The mean if a model of what happens in the real world: typical score. It is not a perfect representation of data.
Null Hypothesis
The predicted relationship does NOT exist. The symbol is Ho.
Alternative Hypothesis
The predicted relationship does exist. The symbol is H1.
Type I error
The rejection of the null hypothesis is true. Saying there is a relationship when it does not exist.
Type II error
The acceptance of the null hypothesis when the null hypothesis is false. Saying there is no relationship when there is a relationship.
Alpha Level
Minimizing risk of a Type I error.
Power
The probability of correctly rejecting the null hypothesis when the null is false. It is the probability that Type II error is not committed.
Factors Affecting Power
- The alpha level. Increase the alpha level x increase power of statistical test.
- One tailed vs. 2-tailed alpha test.
- Sample size. As sample size increases, so does power.
- Reduce error variance will increase power. (Test everyone in same quiet room rather than different room, different noises.)
Increasing effect size of the independent variable…
Would increase power.
Greater subject variability will…
Decrease power.
Parametric tests based on normal distribution requires 4 basic assumptions
- Normally distributed sampling distribution.
- Homogeneity of variance/ homoscedasticity.
- Interval or ratio data.
- Independence of scores.
Normal dsitribution
A probability of distribution of a random variable with perfect symmetry and a skew of 0 and kurtosis of 0.
Non-parametric tests
A family of statistical tests that do not rely on the restrictive assumptions of parametric test. Does not assume sampling distribution is normally distributed.
Homogeneity of variance (HOV)
Assumption that the variance 1 continuous variable is stable/consistent between treatment groups of a discrete variable for t-tests & ANOVAs.
Homoscedasticity
Assumption the variance of 1 continuous variable is stable/consistent across scores of another continuous variable. For regressions.
Independence of scores
One data point does not influence another data point.
Big advantage of Parametric Tests
More powerful (statistically speaking in reflecting the null hypothesis when it is false) compared to non-parametric tests.
Big advantage of Nonparametric tests
More freedom! Not restricted by assumptions to do data analysis.
Kolmogorov-Smirnov Test
- Tests if data differ from a normal distribution.
- Significant = non-normal data.
- Non-significant = normal data.
Histograms
Frequency distribution with bar drawn adjacent. Gives continuous figure that emphasizes the continuity of variables. Good for continuous variables.
Q-Q Plots
Quantile-quantile plot. Plots quantiles of a particular distribution. If value falls on the diagonal plot, the value shares the same distribution as the normal distribution.
Quantile
The proportion of case we find below a certain value.
A perfect normal distribution would…
Have a skewness of 0 and a kurtosis of 0.
What does a significant Levene’s test mean?
- Tests if variances in different groups are the same.
- Significant = variances not equal (bad)
- Non-significant = variances are equal (good)
Log transformation (log (xi)) to…
Reduce positive skew.
Square root transformation (square root Xi) to reduce…
Positive skew and to stabilize variance.
Reciprocal transformation ( l / xi) can also reduce skewness.
Dividing 1 by each scores also reduces the impact of large scores. This transformation reverse the scores; you can avoid this by reversing the scores before the transformation, 1 (xhighest - xi)
Potential problems with transforming data
Transforming the data helps as often as it hinders the accuracy of F.
Use Robust methods (e.g.. Bootstrap) to…
account for violations of assumptions (e.g., normality).
Correlations
Way of measuring the extent to which two continuous variables are related. Measures pattern of responses across variables.
Scatterplot
A perfect linear relationship (r = +1.00 or - 1.00) is when all the data points lie on a straight line in a scatter plot.
Covariance
Average cross-product deviations.
- Calculate the error between the mean and each subjects’ score for the first variable (x).
- Calculate the error between the mean and their score for the second variable (y).
- Multiply these error values.
- Add these values and you get the cross product deviations.
Problems with Covariance
- Depends upon the units of measurement. e.g. the covariance of the two variables measured in miles might be 4.25, but if the same scores are converted to kilometers, the covariance is 11.
- Standardize it. Divide by the standard deviations of both variables.
- The standardized version of covariance is known as the correlation coefficient. H is relatively unaffected by units of measurement.
The Correlation Coefficient
Measures the degree and direction of linear relationship between two variables in terms of standardized (z-scores with a mean of 0 and a SD of 1).
Sum of Products
To calculate the Pearson Correlation, you need to first calculate the Sum of Products.
Correlation simply describes… a relationship between 2 variables, not causality.
a relationship between 2 variables, not causality.
Third-variable Problem
Causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results.
Direction of causality
Correlation coefficients say nothing about which variable causes the other to change.
Measurement error affecting r
The more error there is in a measure, the smaller the correlation can be. This measures the relationship or correlation between two parallel measures of a variable of interest.
Coefficient of Determination
- r squared = coefficient of determination
- It measures the proportion of variability in one variable that can be determined from the relationship with the other variable.