Research Basics pt. 2 Flashcards
Parameter
-descriptive value for a population
Statistic
-descriptive value for a sample
Mean
-average
-most commonly used
-only used with interval/ratio
-influence by outliers
-toward the tail oppositte of mode
-Mu u
Variance
-SD^2 or (distance from mean)^2/ n-1
-Sigma^2
Standard Deviation
-distance between score and mean
- Square root (distance from mean)^2/ n-1
-Sigma
Frequency Distribution
-organized picture of an entire set of scores
-histogram, smooth curve, stem and leaf
Smooth Curve
-shows tthat the exact frequency is not being shown
-want it to be symmetrical (normal curve, mean and median are equal)
Histogram
-shows all the frequencies of the distribution
Skew
-non symmetrical distribution
-named for tail
Positive: scores pile up at low values, tail point to high values
Negative: scores pile ip at high values with tail at low
Kurtosis
-peakedness of tthe distrubution
Leptokurtic
-skyscraper
-higher and thinner peak
-low variability
-easier to get significance
Platykurtic
-hill
-lower peak
-higher variability
-harder to get significance
Stem-And-Leaf Display
-each score devided into a stem (first digit) or leaf (last digit)
Mode
-most frequent
-used in all data
-located on one side near peak, other farthest from mean
Median
-middle
-used for ordinal, intterval, or ratio
-unnaffected by outliers
-can’t show sig dif
-between mean and mode
Variabiltiy
-how spread out the data is
-descriptive (how spread out) and inferential stats (how accurate to pop)
-meaured by range or SD
More variability: less significance (platykurdic)
Range
-total distance
SD in Normal Distribution
-70% of scores 1 SD of mean (35+/-)
-95% of scores 2 SD of mean
-99% of scores 3 SD of mean
standardized, mean is 0
Z Score
-where a score is located relative to other scores
-# of SD above or below mean
-descriptive (where in curve) and inferential stats (reference to population)
z= score-mean/SD
Inferential Statistics
-infer things about the population based on sample
Probability
-proportion under the curve
-z score creates % as body or tail
Critial Limit Theorem
-30 sample with be closly related to real pop
T-Test
-compare 2 groups
-used fo smaller samples
-flater curve than normal distibution
-1 tail only considers 0.05 in one tail= higher chance of significance
-2 tail considers 0.05 in both (0.025 in each)= lower chance of significance
F-Distribution
-ANOVA
-more than 2 or factorial research design
Chi-Square Distribution
-comparing proporttions of people in diff groups
-comparres observed frequencies to expected
Standard Error of Mean (SEM)
-value that describes the diff between the sample mean and true pop mean
-always smaller than SD
-smaller=less sampling error
-sample SD/Square root (n)
Point Estimate
-mean of sample, estimates pop
-boarder of box-plot
Interval Estimate (CI)
-confidence interval
-range of sample that can include the real pop mean
-span of box plot
Box-Whisker Diagram (boxplot)
-Whiskers: range of scores
-Box: median (line), upper and lower quatile (25 and 75%)
Bar Graph
-nominal or ordinal
-similar to hisogram with space
Error Bars Charts
-bar shoes mean score
Can show
-CI, SD, or Stardard error of mean
Scatterplots
-correlation
-can be grouped (R is important)
Parametric Statistics
Analyzes quantitative data
-t test, anova, regression
-must meet assumptions
-based on distributions so must be normalized
Non-Parametric Statistics
Analyze qualitative data
-spearman, mann-whitney u (independent t test, takes mean rankings), friedman’s ANOVA, wilcoxson (independent t test, takes mean rankings)
-violates the assumptions or have nominal/ordinal data
Linear Regression
-show relationships
-make predictions
Parametric Assumptions of T-Test
I/R Data
Normality
Homogenity of Variance
Free of Extreme Outliers
Independence of Observations
Normality
-concern with smaller studies <30
-check skewednwss (<or> 2 is a problem)
-check histograms</or>
Non-Parametric
-Shapiro-Wilk Test: >0.5 is not significant
Homogeneity of Variance
Differences in variance should be equal
Non-Parametric:
-Levene’s test: want it to be not significant >0.5
Free of Influential Outliers
Regression: cook’s distance (>1 is bad)
Independence of Observations
-scores must not follow a pattern over time
-scores from one participant cant influence another persons scores
Non-pArametrics:
Durbin Watson
Regression Assumptions
Linearity
Homoscedasticity
Outlier testing in regression
Homoscedasticity
-relationship statistics
-seen in a scatterplot’s residual score
-variance must be the same at all levels
-how close are all points to the line
-r^2
-heteroscedasticity is opposite
Linearity
-data points arranged in a linear pattern
-seen in a scatterplot
Residual Score
-distance of score from regression line on y axis
-ouliers are large
Standardized Residual
-distance from line in terms of SD
- negative= under the line
-positive = over line
-0= on line
Solutions to Violated Assumptions
Trim the data
Windoring: substitute outlier with highest score
Transform the data: take the log of the data
Bootstrapping is SPSS:
Non parametric Data
Critical Region
-in the tails
-outcomes unlikely caused by chance
How To Increase Power
-increase effect size
-decrease variability
-increase sample size
-increase alpha
-use a 1 tail test
Independent T-Test
-compares 2 means of independent data
-different groups
-1 IV and 1 DV
-Man Whitney U
Repeated Measures T-Test (Dependent)
-compaires matched pairs
-same participant twice
-more likely to be significant
-wilcoxon signed ranks
-does not need HOV
Bonferroni Correction
-limits alpha inflation when testing the same data set multiple times and makinf a type 1 error
- divides alpha by number of tests run