Midterm Flashcards
what is population
set of all “subjects” relevant to scientific hypothesis
what are parameters
quantities describing a population
what are variables
characteristics that differ among individuals
describe two categorical variables
nominal and ordinal
describe two numerical variables
interval and ratio scaled
what is nominal scale
categorically discrete (species, sex, diet)
what is ordinal
ordering (small, medium, large)
what is interval scaled
intervals, arbitrary 0 (celsius, years BC)
what is ratio scaled
natural zero point (mass, abundance, duration)
what is a sample
subset of “subjects” selected from statistical population that are actually examined during particular study
what are sample statistics
calculated from collected sample to estimate population parameter
what is random sampling
equal chance of being selected, independent
what is haphazard sampling
does not follow systematic way of collecting samples
what are descriptive statistics
quantities that capture important features of frequency distribution
what is frequency distribution
describes the # of times (frequency) each value of a variable occurs in a sample
what are 3 measures of location
mean (average), median (middle value), mode (commonly occurring value)
what is measures of spread
description of variation around the typical individual
what is a residual
difference btw observation and mean
what are degrees of freedom
describe the # of values in a calculation that are free to vary
what is a positive skew
tail on right
what is a negative skew
tail on left
what is estimation
process of inferring a population parameter from sample data
what is sampling distribution
probability distribution of all values for an estimate that we might have obtained when we sampled the population (pot the distribution of means calculated from sampling the population)
what is bias
difference between the mean of the sampling distribution of a sample statistic, and the true population value
what is standard error
standard deviation of the sampling distribution of a sample statistic measures sampling error (standard deviation (s) of sample/square root of sample size (n))
what is a 95% confidence interval
if you sampled repeated, 95% of the time the resulting interval would contain the true population
what is probability
likelihood of hypothesis given data
what is hypothesis testing
inferring whether statistical claims about the parameters (statistical hypotheses) are true or not
assumption of null hypothesis
any variation we see is due to sampling error alone
what is null hypothesis
specific statement about population parameter made for purpose of argument
what is alternate hypothesis
represents all other possible parameter values except that is stated in the null hypothesis (mutually exclusive and exhaustive)
what is the null distribution
probability distribution of test statistic values when a random sample is taken from a hypothetical population for which the null hypothesis is true
how to determine p value
compare test statistic value to the null distribution and determine probability of obtaining the data
what is significance value
probability used as a criterion for rejecting the null hypothesis
P-value = significance level
reject null
p-value>significance level
fail to reject
example of good biological conclusion
The probability of winning a wrestling match differs significantly from 0.50 when the athlete is wearing a red shirt (name of test, test statistic, n (or df), P
what is a critical value
boundary btw values that support null and those that lead to reject null (if test statistic more extreme you reject null)
at critical values, area under tails
one tailed - 5%
two tailed - 2.5%
what is type 1 error
reject true null (false positive)
what is type 2 error
failing to reject false null (false negative)
what does power depend on
what alternative hypothesis is true
type 1 error rate sample size (precision)
what is power
the probability we will reject a false null
what is a binomial distribution
discrete distribution that arises from the outcome of a number of “Bernoulli Trials”
what are 3 characteristics of bernoulli trials
- only 2 possible outcomes
- outcome of each bernoulli trial is independent
- probability of success is identical for all trials
what are 3 characteristics of binomial distribution
- distribution is completely determined by the parameters p (probability of success) and n (# of trials)
- mean of binomial distribution is mean = np
- variance of binomial distribution ??????
what is a poisson distribution
distribution describes the frequency distribution of events that occur rarely and randomly. success are described in blocks of time or space (rather than probability for a given trial)
what are the 4 conditions of poisson
- probability of 2 or more occurrences in a single sample subdivision is negligibly small
- probability of 1 occurrence in a sample subdivision is proportional to the size of the subdivisions (in time or space)
- outcome in 1 subdivision of the sample unit is independent of the outcome in all other subdivisions
- probability of an occurrence is identical for all sample subdivisions
what are two analysis of frequency tests and what do they analyze
goodness of fit
contingency analysis
analyze frequency of observations in different categories (compares observed freq of observations in different categories to expected freq under some observation; if they are sufficiently different, reject null
what are two tests for the analysis of categorical data
chi squared test
log likelihood ratio test (G test)
for chi-squared what do Oi Ei and k mean
Oi-observed frequency (# of observations) in category i
Ei-expected frequency
k-total number of categories
what steps do both tests involve
-calculate expected freq for each category
calculate test stat based on dissimilarity btw observed and expected
-compare test stat to chi squared distribution w/ appropriate df
what is goodness of fit
compares frequencies/counts to discrete probability distribution
tests whether observed distribution of counts across classes consistent w/ what you’d expect based on hypothesized probability distribution
asks if hypothesized prob. is a good “fit” to observed data
what is the null hypothesis for goodness of fit
proportion of observation in a category are equal to expected proportions
how does the contingency test differ from goodness of fit
asking whether two categorical variables are associated with each other (variable is “contingent” or “dependent on” the other)
what are extrinsic expectations/hypotheses
are when expected freq are derived from info other than the data analyzing
what are intrinsic expectations/hypotheses
when expected freq are derived from the data you are analyzing (no info asumed prior to study)
what is a contingency table
analyze whether which row observations into is contingent on which column it falls into and vice versa