Midterm 1: Ch 1-10 Flashcards
What is statistics?
quantitative technology for empirical science – logic and methodology for the measurement of uncertainty, and for an examination of that uncertainty
What are the goals of statistics (2)?
- estimate the values of important parameters
- test hypotheses about those parameters
What is data?
measurements of one or more variables made on a collection of individuals
What is a variable?
characteristic measured on individuals drawn from a population under study
What are the two types of variables?
- response variable (dependent variable)
- explanatory variable (independent variable)
What is a response variable?
(dependent variable – y-axis) variable that we try to predict or explain from the explanatory variable
What is an explanatory variable?
(independent variable – x-axis) variable used to predict or explain the response variable
What are parameters?
descriptive measures of an entire population
- population parameters are constants
- ie. mean length of salmon
What are estimates?
descriptive measures of a sample
- random variables – change from one random sample to the next, from the same population
- ie. mean of some sample of salmon
Do samples look exactly like the population?
no
What is a sample of convenience?
collection of individuals that happen to be available at the time – biased
What is bias?
systematic discrepancy between estimates and the true population characteristic
What are the goals of estimation? (2)
- accuracy
- precision
What is accuracy?
on average gets the correct answer
- accurate = unbiased
- inaccurate = biased
What is precision?
gives a similar answer repeatedly
What are some determinants of precision (when unbiased)?
- sample size
- precision of instrument
Unbiased and Precise
- on average, answer is correct
- repeated samples/estimates have very similar results
Unbiased and Imprecise
on average, anwer is accurate, BUT each individual estimate is off
Biased and Precise
- most dangerous – may not even realize there’s a problem, and may have a lot of false confidence in the answer
- repeated samples/estimates have very similar results, BUT average value of estimates is off
Biased and Imprecise
- on average, answer is incorrect
- unconfident in the answer, but best guess would be wrong anyways – not as deadly as being confident and wrong
What are properties of a good sample? (3)
- independent selection of individuals
- random selection of individuals
- sufficiently large
What is a random sample?
each member of a population has an equal and independent chance of being selected
What is independent sampling?
chance of an individual being included in the sample does NOT depend on who else is sampled
What is sampling error?
difference between the estimate and average value of the estimate
measurement of precision
Do smaller or larger samples have smaller sampling error?
larger samples → smaller sampling error
on average
What is high sampling error?
every new measurement is different each time we do it
low precision – large difference
What is low sampling error?
- higher precision – small differences
- low variance between different estimates (each time we do a study)
What are the two types of data?
- categorical variables (class or nominal variables)
- numerical variables (quantitative variables)
What are categorical variables?
fall into categories
What are the 2 types of numerical variables?
continuous: can be measured – ie. arm length, height, weight, age*
discrete: can be counted – ie. number of limbs, number of offspring, number of petals
What is a frequency table?
frequency is NOT a variable – not measuring, just gathering data
What graph do you use for graphing categorical variables?
bar graph
What graph do you use for graphing numerical variables?
- histogram
- cumulative frequency distribution (CDF)
What data do histograms graph?
continuous numerical variable
- no gaps between bars – conveys that these are continuous variables running together
- widths are the same
What is cumulative frequency of a value?
proportion of individuals equal to or less than that value
- 0 = none of the individuals are less than that value
- 1 = all individuals are less than that value
What is a contingency table?
describes association between two (or more) categorical variables by displaying frequencies of all combinations of categories
What graphs are used for graphing the association between two categorical variables?
- contingency table
- grouped bar graph
- mosaic plot
What data do mosaic plots use?
relative frequencies scaled to 1 – does NOT use discrete numebrs
width of bars indicates number of individuals in the treatment
What data do stacked bar plots use?
discrete numbers or frequency
What graphs are used for graphing the association between a categorical (x-axis) and numerical (y-axis) variable?
- multiple histogram
- cumulative frequency distribution (CDF)
- box plot
What graphs are used for graphing the association between two numerical variables?
scatter plot`
What are two common descriptions of data?
- location: central tendency
- width: spread – how variable the data is
What are 3 measures of location?
- mean
- median
- mode
What is the mean (or average)
add all numbers together and divide by total amount of data points – centre of gravity
What is the median?
odd number: middle measurement in a set of ordered data
even number: average of two middle numbers in a set of ordered data
What is the mode?
most frequent measurement
Why might the mean and median be different?
skewed data – lot of the weight is on one side of the distribution
Why might the mean and median be the same or similar?
symmetrical distribution of data – bell-shaped
Mean vs. Median
- mean has nice statistical properties, can be quantified easily using theories
- mean has good predictive behaviours
What are the 4 measures of width?
- range
- variance
- standard deviation
- coefficient of variation
What is the range?
maximum minus minimum
- poor measure of distribution width – useless in statistics
Is sample range a biased estimator of the true population range?
yes, smaller sample → lower estimates of range
- sample range is not expected to match population range
In the equation for variance, why do we square the value
if we took unsquared value, negative and positive deviations cancel out
What is sample variance?
unbiased estimator of population variance – used to try to learn about population variance
What is standard deviation?
positive square root of the variance
σ: true standard deviation
s: sample standard deviation – unbiased estimator of population standard deviation
What is the coefficient of variation (CV)?
good for comparing distributions of different magnitudes
What is skew?
measurement of asymmetry – refers to pointy tail of distribution
right-skewed: pointy tail is on the right
left-skewed: pointy tail is on the left
Mean – Nomenclature
population parameter: µ
sample statistic: Ȳ
Variance – Nomenclature
population parameter: σ^2
sample statistic: s^2
Standard Deviation – Nomenclature
population parameter: σ
sample statistic: s
Manipulating Means
Mean of Sum of Two Variables
E[X + Y] = E[X] + E[Y]
Manipulating Means
Mean of Sum of Variable and Constant
E[X + c] = E[X] + c
ie. temperature conversions
Manipulating Means
Mean of Product of Variable and Constant
E[c X] = c E[X]
ie. measurement conversions