Test 2 Flashcards
Making a prediction steps
-compose hypothesis
-generate predictions
-test predictions
-evaluate hypotheses
MUST MAKE TESTABLE PREDICTIONS
Deductive reasoning
- starts with a theory, test, revise
- top down approach
- general–>specific
Inductive reasoning
- starts with observations, form a theory
- specific–>general
- can be falsified with contradictory evidence
Lakatos (1978)
- individual tests are risky and arbitrary
- should have multiple competing hypotheses
Kuhn paradigm (1970)
- not linear discovery, but series of paradigm shifts
- scientists aren’t objective but rather come to a consensus
Manipulative data
-when you’ve changed something and gather information
Observational data
-when you observe what’s happening in a system
A priori
Ahead of time, before collection of data
measures of central tendency
mean
median
t test equation
t = x - µ / SEM t = current mean - comparison mean/ standard error of the mean
standard error of the mean
variance/n
Confidence interval
- use confidence interval to calculate sample size
- also need variance, alpha,t,df
t test assumptions
- Independent
- random sample
- normally distributed
- equal variances (homogeneity)
- must test these before any stats can be done!
How can we test for normality?
shapiro-wilk
kolgomorov-smirnov
Testing for variance?
Levene’s test for equality of variances
-similar bell curve shape
What if normal distribution, but unequal variances
indep t test with equal variances not assumed
not normal dis, but similar variances
- non parametric Mann Whitney u test
- doesnt consider parameter of calculated mean
- ranks data and calculated u stats, based on difference in rankings
data not independent
paired t test
Steps for t tests
- identify question
- state H0 and Ha in respect to your samples
- alpha level and direction of relationships
- choose test after exploring data to understand if it complies
Statistical tests vary in:
- number of IVs and DFs
- levels of measurements (ordinal, continuous, category)
- variable: univariate tests, vectors, matrices in multivariate tests (scalars)
- role of variables: DVs, IVs, Covariates?
Univariate
single dependent variable
Multivariate
employ one or more dependent variable
Vectors and matrices
vectors- variables with magnitude and direction
matrices–2D array of vectors
Power
-important to high enough power to detect an effect
need to know:
-effect size
-alpha
-sample size
-data dispersion
Amount power = % chance can detect an effect
OR probability of not committing type II error (false negative rate)
Effect size
- power
- alpha
- n
- s data dispersion
- known
G power
-allows you to calculate the sample size needed for univariate and multivariate tests
post hoc power calc
- usually when your results were almost significant
- often in poor taste
Linear relationship
-predictor and response
-bivariate = x and y
positive, negative and no relation = zero
scatterplot
-scatter diagram is graphical method to display relationship between two variables
Fitting a line
-least squares method.
-distance from potential line (residuals) squared and added up for all points to try to get lowest number possible
-always passes through the mean of y and x
WHY
to convert a value
standardize: calibration curve!
regression significance
can we distinguish line with slope from line with no slope
zero slope or no relation is our null
R^2
coefficient of determination
- how much variation in y is determined by x
- want 1 or -1
Assumptions of regression
- each x and y are independent and random
- normal distribution of x values
- homogenity
- linear relation
- measurements of x are free of error or small compared to y (error will make a relation hard to understand)
Applications of Regression line
-can be used to predict
-R measure of strength of linear assocation between x and y
-R is NOT sqrt(R^2)
-Want 1 or -1
R > 0 direct linear
R
Spearman Rank Correlation
- doesn’t meet normality
- homogenity of variance
- rank correlation also used when one or both consists of ranks
- can also have multiple values y for x
Parametric tests
-indep and paired t test
-correlation analysis
-linear regression
ANOVAS
Non Parametric tests
(also have their own assumptions!)
- Mann Whitney u
- Spearman Rho
Transformation
-take an abnormal distribution to normal
-there’s a number of ways to do this depending on original distribution
-WONT MAKE UP FOR POOR SAMPLING specifically non random sampling, very sensitive to outliers
-KNOW YOUR LIT/FIELD
prepare to defend your choice
Log Transformation
heterogeneity of variance (base 10 or natural)
Square Root Transformation
heterosadastic variance ( data with non-constant variance) commonly used on count data
Arcsine transformation
-binomial dis
-yes/no
-proportions or percentages
-sqrt of a number
radians range from 0 to 1
Back transform
-even though you’ve transformed, means nothing to readers, have to go backwards for writing it up
Outliers
- data value different from majority
- need to report and state why you throw them if you trim your data set
- Need to think about them
- can’t discard due to inconvenience
- rerun analysis without outlier to see if its the same
- run an rank test? categories?
- transforming may help
Lost, corrupted, removed data
- reduces sample size
- small size decreases power and increases chances of extremes
quantitative data
discrete data
3 of something
Continuous data
3.14579 of something
categories
I am a human
convert data into bins
Types of data can be divided into groups
race age sex
-put into contingency table
-categorical variables
-chi square analysis
must always use frequencies and see how it compare to expected
can use models! Mendelian genetics used Hardy Weinberg
Chi square things
Odds ratio
odds success/odds failure
Mosaic plot
graphical way to look at frequencies
- column = “treatment”
- row variable = “response”
ANOVA
- statistical test that exploits variance (s^2)
- uses normally dis sets to compare differences between groups
Basic one way ANOVA
Two variables: -categorical -quantitative Question: Do the means of the quantitative variable depend on which category the individual is in? IF ONLY 2 values 2 sample t test but you can have 3 or more :) -determines p value from f statistic
What does ANOVA do?
Tests these hypotheses:
- means of the groups are equal (H0)
- not all means are equal (Ha)
* doesn’t tell us which differ, have to follow up with post hoc testing
ANOVA assumptions
-each group is approx normal
check graphically, or with normality tests. Can withstand some weirdness but not crazy outliers
-STDEVS are approx equal between each group
ratio of largest to smallest sample’s stedv should be less than 2:1
Levene’s test takes care of this
ANOVA notation
n = number of total individuals
I = number of groups
x = individual
X bar = mean for entire data set
How does one way ANOVA work?
measures variation
- between groups (group mean and overall mean)
- within groups (value between value and mean of group)
ANOVA f statistic
ratio of between group mean square variation/mean square within group variation
between/within
MSG/MSE
R^2 statistic
sum of squares between/ sum of squares total
SSB/SST
If ANOVA groups don’t have the same means
- compare in twos: pairwise using two sample t test
- need to adjust p value threshold because multiple tests same data
Turkey’s Pairwise comparisons
- if family error rate is 0.05 then
- individual alpha = 0.0199 w/ 95 % CI
ANOVA data not normal?
kruskal-Wallis Test
nonparametric procedure used to test the claim that 3+ indep samples come from pops with the same distribution
kruskal-Wallis Test
- STRONGER hypothesis than ANOVA which only compares means
- samples are simple random samples from 3+ pops
- data can be ranked
- principle is dumping all the data together and seeing if its a normal dispersion
- large values of H indicate Ri (sum of ranks of samples) are different than expected
- If H is too large then we reject the null
- K-W is always right tailed
K-W test critical values
- 3 populations or sample size of 5 or less, value from K-W table
- 4 or more or sample size from one pop is greater than 5, value is chi^2
K-W hypothesis test steps
step 0: samples are indep random, data can be ranked
step 1: box plots to compare data
step 2: hypotheses.
H0 data dis is the same
H1 data dis is not the same
step 3: rank observations smallest to largest
step 4: level of signifigance–either K-W or chi^2
step 5: compute test stat
step 6: compare critical values
test stat must be bigger than crit value