Statistics and Data Analysis in Quantitative Study Designs Flashcards
What are the levels of measurement/types of data?
Nominal, ordinal, interval, ratio
Each level has an additional characteristic, and contains all of the characteristics of the previous level.
Nominal and ordinal contain non-parametric data.
Interval and ratio contain parametric data.
Nominal
Provides information about difference, but not much more.
Used to name identify, or classify into categories.
Ordinal
Shows direction of difference, but we don’t know amount of difference.
Numbers indicate rank or order.
Allows for greater than or less than.
Interval
Intervals or distances between numbers are known, but it’s not known how far any of the numbers are from zero.
Equality of units, but no true zero.
Ratio
Each number can be thought of as a distance measured from zero.
There is an absolute zero point, which represents the absence of the variable being measured.
Statistics
Objective means of interpreting data
Inform people about reliability and meaningfulness.
Measures of central tendency and variability are the most fundamental components of most statistical techniques.
Central Tendency
Attempts to describe a set of values by placing focus on the central position in the data set.
A single score.
Measures of central tendency include mean, median, mode.
Mean
The sum of all scores in a data set, divided by the number of scores in the data set.
The average.
The most common measure of central tendency.
The mean can be effected by outliers (skews data depending on if the outlier is higher or lower than the rest of the data).
Median
The number occurring at the midpoint of the data set.
If there is an even number in a data set, take the mean of the two mid-point scores.
Not affected by skewed data.
Drawback is that median may not represent all data.
Mode
The most frequently occurring number in a data set.
Mode is the most popular option (see frequency of a number)
Drawback is that 2 modes or no two of the same responses (can get 2 modes)
Variability
Best estimate of the spread of scores.
Indicates how spread out the scores are or how close they are to the mean.
If variance = 0, all values in the data set are the same.
If variance is low, the values are close to the mean and the range is small.
If variance is high, the range of values is vast and some of the values are far from the mean.
Standard Deviation
Is the square root of the variance.
Normal Distribution
Many statistical tests are based on the assumption of a normal distribution.
Distributions provide a visual way to see scores disperse from the mean.
Normal distributions are symmetrical around the mean.
Graph of distributions found in notes on statistics.
t-Test
Comparing means.
If comparison is focused on two groups/samples, the appropriate stat is a t-test.
Aims to understand the difference between the means of those two groups.
Groups may be different or same people measured twice.
Includes independent t-test and dependent t-test.
Independent t-test
Used when the study has two groups of different participants
Can be used for experimental and quasi-experimental designs.
Compare two independent samples.
Dependent t-test
When two data points collected over time or when participants are exposed to the two experimental conditions.
Compare scores from same or matched sample.
What happens where there are more than two groups/samples that are different?
One way ANOVA (extension of t-test).
Can include repeated measure (repeated measure ANOVA is used when two different time points or more).
What happens after finish statistical analysis?
We revisit the hypothesis.
P-values
Probability of error = p-value
Researchers aim to reduce chances of error as much as possible.
P-values of 0.05 means there is 5% room for error or chance (researchers are also 95% confident that the results are genuine and not a chance finding).
Written in research p less than 0.05 = significance or ability to reject the null hypothesis.
Type I Error
Researchers make the decision that a manipulation or treatment has been successful when it in fact has not been.
Type II Error
Researchers make the decision that the manipulation has failed when it worked.
What are some other types of distribution?
Skewness: scores spread out more at one end of the distribution (positive or negative).
Kurtosis: peakedness of distribution.
Correlation
A measure of the strength of the relationship between two variables (x and y).
Pearson product-moment correlation
Assumes x and y are normally distributed and are interval or ratio scale scores.
Pearson r
Spearman rank-order correlation
Used for ordinal scale data or interval or ratio scale data that deviate substantially from normality.
Pearson r
Pearson r is a relationship independent of number of scores, size of scores, dispersion of scores.
Pearson r is derived from covariance.
What information can you get from Pearson r?
Direction of relationship: positive correlation means x increases and y increases; negative correlation means x increases and y decreases or vice versa.
Strength of the relationship: closer to line/mean = stronger relationship
Correlation Coefficient
A measure of the direction or strength of a linear relationship (relationship shown from Pearson r).
Can be positive (closer to 1.0) or negative (closer to -1.0).
Can also be no relationship (closer to 0.0).
What is the significance of r in a correlation?
Significant correlations do NOT provide the basis for drawing causal conclusions.
Causation depends on methodology, not analysis.
Null Hypothesis
The observed correlation is due to error.
Alternative Hypothesis
There will be a significant correlation between x and y