Midtern Review Flashcards
How are variables classified?
Value
Numerical or categorical.
Continuous variables: give an example
infinite, usually containing fraction or decimals, uncountable Ex: cow weight, core body temp in dogs
Discrete variables
finite, usually integers, countable ex: # of eggs in a nest, # of star around a planet
What are categorical variables?
isn’t numeric, data fits into categories
How can quantitative variables be broken broken down?
As either continuous or discrete
Nominal variables, give an example
Have values that are named categories, ex: coat colors, biological sex
How are categorical variables broken down?
Nominal or ordinal
Ordinal variables, give an example
ordered name categories. ex: stages of disease (cancer), levels of pain, BMI category
Independent variable
- effect, predictor or explanatory variable
- exert an influence on outcome you wish to measure
- can be actively manipulated
Depdendent variable
- Outcome or response variable
- What your measure or record
Frequency
how often a data point shows up
What does a histogram show you?
Center, spread, shape
Taxonomy of frequency histogram shapes (6)
a. symmetric, bell-shaped
b. symmetric, not bell-shaped
c. skewed to the right (positively skewed)
d. skewed to the left (negatively skewed)
e. negative exponential
f. bimodal
Why look at frequency distributions?
- insight into sample
- detect outliers
- check assumptions of statistical tests
What does a bivariate scatterplot show?
The relationship between 2 quantitative variable, shows strength and direction
What are the three measures of central tendency?
Mean, median, mode,
4 Measures of Dispersion
- Range
- Mean deviation
- Standard deviation
- Variance
Define mean
average of the data set
Median
Middle measurement in set of observations
Draw and label a box plot
What are the advantages of a box plot (make 4 points)
- visual representation
- comparison
- identify central tendency and spread
- identify outliers
What is the standard deviation (s)
The data spread, measures how far from the mean the observations typically are. Large = observations farther from mean.
Variance = s^2
Used to calculate the SD
Statistical population
Aggregate of all units under study, has the actual mean, SD, population parameters
Sample population
The specific group you will collect data from
Define blocking in experiments, examples
Grouping experimental units into similar subsets, ex: location, family, genotype
Describe two-step blocking procedure
- divide experimental unit in homogenous subsets
- randomly assign treatments
What are poor sampling desgins?
- Haphazard sampling
- Convenience or opportunity sampling
- Pseudoreplication
Discuss pseudoreplication
when observations are not statistically indepdent but are treated as if they are
Results in altering of the sample size (n)
ex: treating multiple cells from the same animals as independent
2 benefits of random sampling
- unbiased
- high precision
Discuss high bias
Repeated samples give estimates that systematically diverge from the population parameter in the same fashion, aiming in the wrong place
Frequency distribution
how often a specific value show up in a data set
What is a probability distribution
all possible values and distributions for a random variable in a given range
Normal distribution (make 4 points)
1.most common
2.symmetry around the mean,
3. bell shaped,
4. 68-95-99.7 rule
IQR
interquartile range, range of middle 50% of sample
What does variance measure?
Variability from the average mean
Standard deviation
Measure of how dispersed the data is about the mean
Coefficient variation
measure of disperson of data points around the mean expressed as a percentage
Confounding variable
Unmeasured third variable affecting both the independent and dependent variable
Spurious Association
When two variables are correlated but don’t have a causal relationship
Extraneous variables
Not measured, effects dependent variable
Estimation
using sample data to make inferences about the population
Point estimate
an exact value
Interval estimate
A range of values for a parameter; gives an interval as an estimate for a parameter
Confidence Interval
Likelihood interval estimate contains the true population parameter being estimated
Central Limit Theorum
The distribution of the sample means approaches normal the larger the sample gets, regardless of the population’s distribution
68-95-99.7 rule
68% w / in 1 SD
95% w/ in 2 SD
99.7% w/ in 3 SD
Steps for Hypothesis Testing (for a t-test with pooled variances)
- State formal statistical hypothesis
a)Biological question
b )null hypothesis
c) alternate Hypothesis - Choose an appropriate statistical test, justify your choice
a) 1 sample t -test
H0: u = specifiec mean value
HA: u ≠ specified mean value
b) independent samples t-test
H0: u1 = u 2
HA: u1 ≠ u2
c) paired dependent t-test
H0: udiff = 0
HA udiff ≠ 0
- Check normality of assumptions
Test of normality:
H0: sample data comes from a normal population distribution (p>0.05)
HA: sample data does not come from a normal population distribution
Shapiro-Wilk
Kolmogorov-Smirnov
Check homogenity of variances
2 sample dependent t-test
a) box plot
b) variance ratio
c)Levene’s test
H0: variance of 2 groups is equal
HA: variance of 2 groups is not equal
4) run analysis, comprare to reference set
5) evaluate evidence against null
p>0.05 reject
p<0.05 reject
6) write summary statement
Type 1 error
Incorrectly reject true null hypothesis
false positive
Type 2 error
H0 accepted but it’s false, false negative
How to limit type 1 errors
Only reject H0 if alpha <0.05
How to limit type 2 errors
Maximize statistical power
When do you use a one-sample t-test?
when you want to compare the mean of a sample to a known or hypothesized population mean, and you only have data from a single sample
What does the Independent samples t-test compare?
Compare means between two unrelated samples
What does the paired sample dependent t-tests compare?
the means of two variables for a single group
What do 2-tailed tests allow you to detect?
Allow you to detect differences in either direction
Discuss 1-tailed tests
not common, must be specified before data is collected, detect difference in only one direction
Name two tests of normality and discuss what they tell you
Test how well sample data fits a normal distribution
Shapiro-Wilk
Kolmogorov-Smirnov
How to check the homogeneity of variances?
- side by side box plots
- calculate variance ratio (largest/smallest variance in spss)
- levene’s test
What does Levene’s test compare? What is the H0 and HA for levene’s test? When do accept and reject the null hypothesis?
checks to see if samples to be compared come from population with same variance
H0 - the variance (the spread) of the two groups is the same
HA - the variance (the spread) of the two groups is not the same
p <0.05 reject null (two samples do not have equal variances)
p>0.05 accept null (two samples do have equal variances)
how are degrees of freedom calculated?
sample size (n) minus the number of parameters estimated
Why not do lots of t-tests?
Inflate type 1 error
What is the statistical hypothesis for ANOVA?
A0: all means are equal
HA: not all means are equal
2 teps of ANOVA
1) Global F-Test
2) Post-hoc tests
What is the ANOVA test statistic ratio
between group variation:within group variation
ANOVA summary statement
name of test, degrees of freedom, f statistic, p value
The Nonparametric independent 2-sample t-test twin
Wilcoxon Mann-Whitney Rank Test
What does Tukey’s HSD compare?
compares all possible pairs of means, tells which specific groups means are different (from each other)
Transformation only changes the what?
Distribution of the values
What tests to do if your data passes assumptions
1) t-test
2) ANOVA
4 Qualities of Non-Parametric Test
1) no mean
2) no or fewer assumptions
3) not sensitive to outliers
4) based on ranks of data value
Which has more statistical power: parametric or nonparametric
Parametric
What type of error is a nonparametric test more likely to have?
Type II error - reject a false H0, less likely to detect true effect
The nonparametric equivalent of the Dependent t-test
Wilcoxon-Signed Ranks Test
The non-parametric equivalent of the ANOVA
Kruskal-Wallis Dunn’s (post-hoc)
For ANOVA, what does it mean when
F = 0
F = 1
F is large
F = 0 groups are identical
F = 1 small difference among groups means
F is large = big among between groups means
Advantages of Non-parametric tests (make 3 points)
1) more widely applicable,
2) not sensitive to outliers
3) generally any sample distribution OK
Disadvantages of non-parametric tests
1) lower statistical power
2) if assumptions of parametric test mets, parametric tests more powerful
State H0 and HA for a 1-sample t-test
H0: population mean = specified value
HA: population mean ≠ specified value
State hypothesis for independent samples t-test
H0: u1 = u2
HA u1 ≠ u2
State hypothesis for paired dependent t-tests
H0 udiff = 0
HA udiff ≠ 0
H0 and HA for ANOVA
H0 = there is no difference between the means of the populations being studied
HA = there is a difference between the means of the population being studied