Module 1 Flashcards
Chapters 1,3,4
Define “Sample”
Subset of individuals from a population of interest
Define “Estimation”
The ability to approximate an unknown quantity of a target population using sample data
-All estimates have a sampling distribution.
Define “Parameter”
Why is it subject to error?
Quantity describing a population from sample measurements/estimations.
Subject to error due to usage of incomplete data (a sample)
Define” Random Sampling”
A sampling method that assures that the sample chosen from the population is chosen by giving everyone an equal and independent chance of being chosen.
-Minimizes bias and allows for standard error calculations.
Define “Sampling error”, in terms of bias and independence.
What is its relationship to precision?
A discrepancy that arises due to chance from sampling the population
SE = 1/precision
Define “Convenience Sampling”, in terms of bias and independence.
A sampling method that chooses the sample group from individuals/groups that are easily available.
-Introduces bias.
-The sample being unbiased & independent is not guaranteed.
-
Define “Volunteer Sampling”, in terms of bias and independence.
A sampling method that allows for the population of interest to give themselves up for sampling.
-Introduces bias and can’t guarantee independence.
Why are larger samples better?
They are more precise and have lower sampling error
Define “Bias”
Discrepancy that arises due to the improper sampling of the population
What are the 2 major goals of sampling?
To reduce SE and bias & to allow for precision to be measured
Define “precision”
When the variables of the sampled population fall within the same range as one another (Clumped together).
Define “accuracy”
When the variables of the sampled population fall within/on the range of the true population (On the mark).
Define “Census”
The sampling of an entire population (rare)
Define “variables”
Characteristics that differ amongst individuals
Define “Categorical variables”
Qualitative measurement that can be sorted into groups
Define “Numerical variables”
Quantitative measurements.
Two types: discrete (integers) and continuous (any real #).
Define “Nominal variable “
Categorical variables that have no inherent order (ex. colour of fur)
Define “Ordinal variable”
Categorical variable that has an order, despite no quantification (ex. small, medium, large).
Define “Interval variable”
A numerical variable that has an order on a numerical scale, with defined differences between points. No true 0. ex. year.
Define “Ratio Variable”
A numerical variable with defined ratios. True 0 (physically meaningful). ex. Mass.
Define “Observational study”
Nature assigns values, researches only observes activity and points to associations. No control of treatment assignment.
Define “Experimental study”
Researcher assigns treatment values randomly to individual units of study (reminder: a unit of study can be a group).
Define “ Explanatory variable”
Independent. Treatment being applied.
Define “Response variable”. How is it determined?
Dependent on the explanatory variable. Determined by examining associations between variables in test groups.
Define “distributions”
The different measurements of different individuals in a sample
Define “Frequency distributions”
How often each value occurs in a sample
Define “ Probability distribution”
How often each value occurs in the whole population
Define “Descriptive statistics”. How are they described?
Quantities that capture key features of frequency distributions.
-Described by location, spread, proportion.
Define “location”
Indicates where observations are centred in numerical data . ex. mean, average, mode
Define “Spread”
Indicates how dispersed observations are from the centre. ex. variance, SD, interquartile range.
Define “mean”
The average of the numerical data
Define “Median”
Middle value in a set of data (largest to smallest).
-Sensitive to extreme data
Define “mode”
The most frequently occurring observation
Define “variance”
Sum of squares of all residuals divided by dof.
Remember its s^2.
-Produces squared results!!!
*Look at equation
Define “Standard Deviation”
Measures how far observations deviate from the mean.
- Remember its s.
- NEVER negative.
- Same units as the observation being analyzed.
- Look at equation
Define “Interquartile range”
Measurement of variability in the middle 50% of the data (1st, 2nd, 3rd).
-Good indicator of SD when data is skewed/extreme.
What does a normal distribution on a histogram look like?
Bell shaped and centred.
Define “residual”. What does the sum of all residuals produce?
Difference between an observation and a mean (e). Sum of residuals will always equal 0.
*look at calculation formula
Define “Degrees of Freedom”
The number of valuables in a calculation that are free to vary (n-1)
Define “Skew”
A measure of asymmetry
Define “Positive skew”
Right skew, tail faces right (positive side of graph)
Define “Negative skew”
Left skew, tail faces left (negative side of graph).
Define “Sampling distribution”
The probability distribution of of all values of an estimate that might be obtained when a population is sampled
How does sample size affect sampling distribution?
Larger sample sizes produce narrower sampling distributions that fall more accurately on the true population parameters
Define “Standard error”. What is it used for?
Standard deviation of an estimate’s sampling distribution.
- Measures precision of a sample estimate, acquired from the population mean.
- Look at formula
What is the relationship of sample size and standard error?
As sample size increases, standard error decreases.
Define “Confidence Interval”
A range of values surrounding the sample estimate that likely contains the true population parameter.
What does 95% CI mean?
We are 95% confident that true population mean lies within the upper and lower limits of this interval (NOT that there’s a 95% probability)
What is the purpose of error bars?
Demonstrate the precision of estimates (typically SE, not always).
Define “Coefficient of variation”. What does it mean, in terms of variability?
Calculates SD as a percentage of the mean
-Low CV = less variability