module 5 Flashcards
data fishiness assumptions
- assumption of normality
- assumption of homogeneity of variance
- independence of observation
assumption of normality general definition
scores on the dependent variable within each group are assumed to be sampled from a normal distribution
NHST for evaluating normality general definition
- tests if sample distribution is sig different from normal distribution (same mean and SD)
what tests are used for NHST tests for assumption of normality
- shapiro wilkes test
- kolomogorov smirnov test
skew and kurtosis definition and cut offs
- skew: asymmetry of distribution (0=normal) for descriptive approach >2
- kurtosis: measure of how heavy/light distribution tails are (heavy=high kurtosis/many outliers, light=low kurtosis/no outliers) for descriptive approach >7
- for both, 1.96 or above is non normal
limitations of stat tests of normality
- big difference needed for small samples, small difference for large sample
- non-normality is less of a concern in small samples
- doesnt take type of non normality into account
descriptive approach for evaluating normality definition
- looks at descriptives and or graphic displays to quantify magnitude and nature of non-normality
____ kurtosis is more problematic than ____ kurtosis in t tests, ANOVAs, correlations, and regressions
positive, negative
which approach makes more sense for normality testing; NHST or descriptives
descriptives bc it combines threshold of values and qq plots
thin vs fat tails for normality distributions
- thin: fewer extreme observations than normal distributions
- fat: more extreme observations than normal distributions
if data is normal, scatterplot should resemble a _____
straight line (as opposed to cloud shape)
if the middle of the scatterplot line is straight and the ends flatten, it _____
indicates thin tails and is not problematic
if the middle of the scatterplot line is straight and the ends have a steep slope, it _____
indicates fat tails and is problematic
assumption of homogeneity of variance definition
variance of scores on dependent variable with in each group (condition) are the same across all groups (conditions)
evaluating homo of variance; NHST approach definition
- tests if variances in groups are significantly dif from one another
evaluating homo of variance; descriptive approach
- looks only at imperfection
- looks at descriptive stats and or graphic displays to quantify magnitude of differential variances (largest vs smallest SD)
- looks at threshold ratio of largest to smallest variances
tests for homo of variance
- levenes tests
- hertley variance ratio test or f-max tests
limitations of NHST approach for homo of variance
- role of sample size (dif in variance is less concern for small and more concern for larger sample sizes)
- insensitive to dif in variance in small and sensitive to big
- dif in variance is a magnitude problem
if variances are equal, scatterplot should resemble a straight line with a slope of ___ and the intercept is ____ whereas when the variances are not equal, scatterplot will not cluster around the line and will be different from __
1, the difference between means,1
independence of observation definition
- each observation (between subjects) or set of observations (repeated measures) from the dataset is independent of all other observations/sets
- ex of independance= roommates/partners
positive associations inflate ___ and negative associations inflate ___
alpha, beta
evaluating independence of observation
- examine structural properties of data to see if basis exists for questioning validity of assumption
- if no evident basis, its okay to carry on
- thresholds are up for debate
- if basis exists, independence can be assessed by computing interclass correlation for the part of data that is assumed to have lack of independence
- if correlation is very small (<0.10), its fine to use t test/ANOVA
address violation for normality
- use alt stat procedures that dont need normality
- evaluate level of measurement assumptions
- identity and remove outliers
- transform data to normalize distribution
address violations of homo of variance
- use alt procedures that dont need normality
- evaluate level of measurement assumptions
- identity and remove outliers
addressing violations of independence of observations
- alt stat procedures
- ex multi level modeling (MLM) or hierarchal linear modeling (HLM)
outliers definition
- extreme values that differ largely from other other observations in dataset and suggest theyre drawn from another population
examples of common outliers
- data entry/encoding error (less common now, no longer manual data entry)
- response latency data (longer response time due to distortion of error, due to distraction etc)
- open ended estimate data
problems with outliers
- responsible often for violations of homo variance/normality
- conceptual validity
- disproportionate influence on stat results
identifying outliers
- impossible values in frequency tables/histogram
- steep tails in normal qq plots
- standardized residuals for observations
- studentized deleted residuals
standardized residuals for observations
- index of deviation from the mean
- follows z distribution
- normal distributed N=100, 1 value should be >2.6
- normal distributed N=1000, 4 values should be >3.0
- general threshold of 4 or 5 is suggested
studentized deleted residuals
- index of deviation from mean (not including target observation in mean and SD calculation)
- follows t distribution of df=n-2
- sample of 100, value of >3.6 = outlier
- sample of 1000, value of >4.07 = outlier
response to outlier
- correct or treat impossible values as missing data
- possible but highly discrepant values can be trimmed or capped to most extreme value/specified values
- highly discrepant values are treated as missing
philosophical issues w outliers
- minimalist perspective: never touch the data, strong rational needed for deletion/alteration of data (due to potential abuse)
- maximalist perspective: routine altering/deleting of values, outliers violate assumptions, hard to interpret, must set clear rules/procedures to avoid abuse
- intermediate perspective: justifiable w/ clear rules/procedures and high thresholds for outliers
levels of measurement
- nominal: # assignment is abt group membership/categorical (ex nationality)
- ordinal: # assignment is abt rank order on scale but is not reflective of mag of dif (ex favs, difference between top 1-2 and 4-5 may be different)
- interval: # assignment is abt rank order and mag of dif but no ratio (ex C degrees scale, 0 for freezing, 100 for boiling)
- ratio: # assignment is abt rank order, mag and ratio dif (ex mass, length)
what level of measurement has an absolute meaning ful zero (0) point
ratio
before conducting analysis (t test/ANOVA) and descriptive stats, its only meaningful independent variable has at least _______ properties
interval