Statistics and Samples Flashcards
statistics (2)
- study of methods to describe and measure aspects of nature from samples
- allows us to determine likely magnitude of a measure’s distance from the truth or to quantify uncertainty
estimation
- process of inferring an unknown quantity of a population using sample data
parameter
- quantity describing a population
estimate/statistic
- approximation of the truth (the true population paramter), subject to error, calculated from a sample
population (2)
- entire collection of individual units that a researcher is interested in
- usually too large to directly measure
sample
- smaller set of individuals selected from the population of interest
sampling error (2)
- the chance difference between an estimate and the population parameter being estimated, caused by sampling
- larger samples, less affected by chance, have lower sampling error
bias (2)
- systemic discrepancy between the estimates we obtain from our samples and the true population characteristic
- occurs when the sampling process favours some outcomes over others and systematically under/overestimates the population parameter
precision (2)
- the spread of estimates resulting from sampling error
- larger populations are less affected by chance and will have higher precision
accurate
- unbiased: the average of all estimates that may be obtained are centred on the true population value
precision and sampling error
- the lower the sampling error, the higher the precision
random sampling (2)
- each member of the population has an equal and independent chance of being selected
- minimizes bias and makes it possible to measure the amount of sampling error
random sampling procedure (4 steps)
- create a list of every unit, or group of non-independent units, in the population of interest and number them
- decide on number of units in each sample (n)
- use a random number generator to generate n random integers in population range
- sample units whose numbers match those produced by the generator
sample of convenience (2)
- collection of individuals that are easily available to the researcher
- researcher must assume sample of convenience is unbiased/independent, but not way to guarantee it
volunteer bias
- bias resulting from systematic differences between the pool of volunteers (the volunteer sample) and the population they belong to
how might volunteers differ from others (5)
- more health conscious/proactive
- low-income (if volunteers are paid)
- more ill (may die anyway, so willing to take the chance)
- more likely to have free time (retirees, unemployed)
- more angry, less prudish (aren’t afraid to speak up)
variables (2)
- characteristics that differ among individuals or other sampling units
- estimates are variables
data
- measurements of one or more variables made on a sample of individuals
categorical variables (2)
- describe membership in a category or group
- describe qualitative characteristics of individuals that do not correspond to a degree of difference on a numerical scale/magnitude
nominal
- describes a categorical variable with categories that have no inherent order
ordinal (2)
- describes a categorical variable that can be ordered
- the magnitude of difference between each consecutive value is not known
numerical variables
- measurement of individuals are quantitative and have magnitude on a numerical scale
- variables are numbers with measurements that are numerical counts, dimensions, angle, rates, and percentages
continuous (3)
- numerical data that can take on any real-number value within some range
- between any two values of a continuous variable, an infinite number of other values are possible
- can be measured (arm length, height, weight)
discrete (2)
- numerical data that comes in indivisible units
- can be counted (# of limbs, offspring or petals)
explanatory variable
- the variable that predicts or affects the other variable
- the variable that is manipulated during an experiment (treatment variable)
response variable (2)
- the variable that is affected by the explanatory variable
- the measured affect of the treatment variable
properties of a good sample (3)
- random selection of individuals (each individual has equal probability of being selected)
- independent selection of indviduals
- sufficiently large
population parameters vs estimates
- population parameters: constants
- estimates: random variables that chance from one random sample to the next from the same population
bias vs error
- bias is a systematic discrepancy (TENDING toward a certain difference) between an estimate and the true population characteristic
- error is a RANDOM difference ( NOT TENDING toward any direction) between an estimate and the true population characteristic
reasons why estimates differ from parameters (4)
- measurement bias (property of individuals)
- sampling bias (property of sample)
- measurement error (property of individuals, results from imprecise measuring)
- sampling error (property of sample)
frequency (2)
- of a specific measurement in a sample and is the # of observations having a particular value in a measurement
- frequency IS NOT a variable, it is not a property of the individuals
frequency distribution
- number of times each value of a variable occurs in a sample
probability distribution (2)
- distribution of a variable in the whole population
- the truth probability distribution of a population in nature is almost never known and is usually theoretically approximated
normal distribution (2)
- familiar “bell-curve” shape
- a theoretical probability distribution used to approximate the true distribution of a continuous variable in a population
experimental studies
- researcher assigns treatments randomly to indviduals
observational study
- assignment of treatments if NOT made by the researcher
confounding variable
- variable that masks or distorts the causal relationship between measured variables in a study
- can limit influence by assigning treatments randomly to subjects