exam 2 Flashcards
descriptive, differences, correlation/regression, association projects (differences, projects)
descriptive - do not carry out hypothesis, the goal is to describe the situation (various statistical measures may be important), histogram, density and boxplots
differences - compares two or more sets of data (hypothesis will relate to differences you believe may exist), bar charts, side by side boxplots
correlation/regression - attempts to link variables (looking for strength and direction of links between variables), scatterplots, line plots
association - emphasis on links between variables that are categorical, bar charts, pie charts
null and alternative hypothesis
we test the null hypothesis, data is gathered to test null
we do not prove the alternative hypothesis, the most we can do is find support for it
possible outcomes from hypothesis testing
reject and fail to reject null - reject = null is not accurate, fail to reject = null is accurate
p-value - the probability that the null hypothesis is correct from the data gathered
histogram
descriptive test
boxplots
descriptive, difference (side by side)
bar charts
differences, association
scatterplots
correlation and regression
line plots
correlation and regression
pie charts
association
histogram in r
hist(object)
boxplot from object in r
boxplot(object)
set scale of axis in r
ylim=c(0,0) xlim=c(0,0)
add axis label and graph title in r
xlab = “Title”
ylab=”Title”
main=”Title”
change colors of bars or boxes in r
col=”Color”
popperian philosophy
we learn by being wrong, no amount of evidence can prove something is true (empirical falsification)
testing a null/ reshuffling
to determine what no change would look like, create data that would be reasonable for the system (after plenty of research about what is realistic) to come up with more data
level of probability that scientists use as a threshold for deciding how to interpret hypothesis
.05 p-value
basic study set up for a t-test
create hypothesises, collect data, data must be normally distributed, each data point must be independent
what happens to t when variables are changed
when t increases, mean difference increases, when t decreases, standard deviation increases, when t increases, n increases
what test to do to determine if data are appropriate for t-test, how to interpret
find if the data are normal (boxplot or shapiro-wilk test)
greater than .05 = the data is normal and a t-test can be done
t-test in r
t.test(object)
one tailed vs two tailed t-test
one tailed - more power to detect directional effect (greater than or less than)
two tailed - shows evidence that the difference between means is greater than expected
paired t-test
repeated observations collected for a single variable with 2 levels (differences between sample point 1 and sample point 2 are compared for the same sample unit)
non-parametric test
use the rank of data and rank from smallest to largest, compare the ranks
mann-whitney (two sample) and wilcoxon (paired) tests
import data as .csv into r
data<-read.csv(file.choose())
how to take one column and create an object in r
object<- dataset$column
plotting boxplots and histograms in r
boxplot(object)
hist(object)
checking for normality in r
shapiro.test(object) OR
wilcox.test(object)
basic code for t-test in r
t.test(object)
how to tell if data are normal/not normal
parametric - do a shapiro wilk test
non parametric - do a mann-whitney or wilcoxon test
how to deal with not normal data
can be log transformed OR use non-parametric tests
how to use a non-parametric test
mann-whitney test - equivalent of 2 paired t test, compares the observed difference in mean of ranks to the maximum possible difference in the mean of ranks
wilcoxon test - matched pairs, compares the ranks of differences
how to decide how much data is needed to collect
preliminary sampling - small study to refine and evaluate sampling size, acceptability, feasibility, and cost of larger study (determines problems and best methods)
dummy data - learn everything and then make up what you think are plausible data
primary literature investigation - learn everything you can about your system and others like it (what are other people doing)
update analysis as you collect data
when do we have independent replicates and when do we not
pseudoreplication - if replicates are “tied” to each other in some way
all independent data points must have no connection to other data points
simple pseudoreplication
only a single replicate per treatment and subsamples are collected from each area
sacrificial pseudoreplication
experimental units are replicated
temporal pseudoreplication
only a single replicate per treatment and subsamples are collected from it repeatedly over time
phylogenetic pseudoreplication
closely related individuals are the units being sampled (seeds, tadpoles, insect larvae)
technical replication
different observers or instruments are used for different parts of the experiment
converting continuous variables to catagories
can limit the amount of data visible, can give an inaccurate and varying perspective of the results
true positive (p-value)
when Ho is true and we fail to reject
true negative (p-value)
when Ho is false and we reject it
false positive (p-value)
when Ho is true and we reject it
false negative (p-value)
when Ho is false and we fail to reject it