Statistics and Samples Flashcards
1
Q
statistics (2)
A
- study of methods to describe and measure aspects of nature from samples
- allows us to determine likely magnitude of a measure’s distance from the truth or to quantify uncertainty
2
Q
estimation
A
- process of inferring an unknown quantity of a population using sample data
3
Q
parameter
A
- quantity describing a population
4
Q
estimate/statistic
A
- approximation of the truth (the true population paramter), subject to error, calculated from a sample
5
Q
population (2)
A
- entire collection of individual units that a researcher is interested in
- usually too large to directly measure
6
Q
sample
A
- smaller set of individuals selected from the population of interest
7
Q
sampling error (2)
A
- the chance difference between an estimate and the population parameter being estimated, caused by sampling
- larger samples, less affected by chance, have lower sampling error
8
Q
bias (2)
A
- systemic discrepancy between the estimates we obtain from our samples and the true population characteristic
- occurs when the sampling process favours some outcomes over others and systematically under/overestimates the population parameter
9
Q
precision (2)
A
- the spread of estimates resulting from sampling error
- larger populations are less affected by chance and will have higher precision
10
Q
accurate
A
- unbiased: the average of all estimates that may be obtained are centred on the true population value
11
Q
precision and sampling error
A
- the lower the sampling error, the higher the precision
12
Q
random sampling (2)
A
- each member of the population has an equal and independent chance of being selected
- minimizes bias and makes it possible to measure the amount of sampling error
13
Q
random sampling procedure (4 steps)
A
- create a list of every unit, or group of non-independent units, in the population of interest and number them
- decide on number of units in each sample (n)
- use a random number generator to generate n random integers in population range
- sample units whose numbers match those produced by the generator
14
Q
sample of convenience (2)
A
- collection of individuals that are easily available to the researcher
- researcher must assume sample of convenience is unbiased/independent, but not way to guarantee it
15
Q
volunteer bias
A
- bias resulting from systematic differences between the pool of volunteers (the volunteer sample) and the population they belong to
16
Q
how might volunteers differ from others (5)
A
- more health conscious/proactive
- low-income (if volunteers are paid)
- more ill (may die anyway, so willing to take the chance)
- more likely to have free time (retirees, unemployed)
- more angry, less prudish (aren’t afraid to speak up)
17
Q
variables (2)
A
- characteristics that differ among individuals or other sampling units
- estimates are variables
18
Q
data
A
- measurements of one or more variables made on a sample of individuals
19
Q
categorical variables (2)
A
- describe membership in a category or group
- describe qualitative characteristics of individuals that do not correspond to a degree of difference on a numerical scale/magnitude
20
Q
nominal
A
- describes a categorical variable with categories that have no inherent order
21
Q
ordinal (2)
A
- describes a categorical variable that can be ordered
- the magnitude of difference between each consecutive value is not known
22
Q
numerical variables
A
- measurement of individuals are quantitative and have magnitude on a numerical scale
- variables are numbers with measurements that are numerical counts, dimensions, angle, rates, and percentages
23
Q
continuous (3)
A
- numerical data that can take on any real-number value within some range
- between any two values of a continuous variable, an infinite number of other values are possible
- can be measured (arm length, height, weight)
24
Q
discrete (2)
A
- numerical data that comes in indivisible units
- can be counted (# of limbs, offspring or petals)
25
explanatory variable
- the variable that predicts or affects the other variable
| - the variable that is manipulated during an experiment (treatment variable)
26
response variable (2)
- the variable that is affected by the explanatory variable
| - the measured affect of the treatment variable
27
properties of a good sample (3)
- random selection of individuals (each individual has equal probability of being selected)
- independent selection of indviduals
- sufficiently large
28
population parameters vs estimates
- population parameters: constants
| - estimates: random variables that chance from one random sample to the next from the same population
29
bias vs error
- bias is a systematic discrepancy (TENDING toward a certain difference) between an estimate and the true population characteristic
- error is a RANDOM difference ( NOT TENDING toward any direction) between an estimate and the true population characteristic
30
reasons why estimates differ from parameters (4)
- measurement bias (property of individuals)
- sampling bias (property of sample)
- measurement error (property of individuals, results from imprecise measuring)
- sampling error (property of sample)
31
frequency (2)
- of a specific measurement in a sample and is the # of observations having a particular value in a measurement
- frequency IS NOT a variable, it is not a property of the individuals
32
frequency distribution
- number of times each value of a variable occurs in a sample
33
probability distribution (2)
- distribution of a variable in the whole population
- the truth probability distribution of a population in nature is almost never known and is usually theoretically approximated
34
normal distribution (2)
- familiar "bell-curve" shape
- a theoretical probability distribution used to approximate the true distribution of a continuous variable in a population
35
experimental studies
- researcher assigns treatments randomly to indviduals
36
observational study
- assignment of treatments if NOT made by the researcher
37
confounding variable
- variable that masks or distorts the causal relationship between measured variables in a study
- can limit influence by assigning treatments randomly to subjects