Samples and Sampling Flashcards
why are statistics important
- to analyze data and draw conclusions
- quantify uncertainty
- making predictions
- assessing evidence
- sampling populations
what are two goals of statistics
estimation and hypothesis testing
parameters
quantities describing populations being studied
estimates relate to
samples
how are estimates and parameters linked
inferring a parameter is done through the use of estimates
examples of parameters
- averages
- numbers (size of pop)
- variants (spread of data)
- proportions (precent something is true)
how is a null hypothesis used in hypothesis testing
start with a null hypothesis stating/assuming there is no difference or effect regarding the testable quantity of a population and through the tests either support or reject the relationship
how are estimates and hypothesis testing related
your estimate is what is used for the hypothesis testing
what are reliable population estimates dependent on
a good sampling practice
what kind of samples are most desirable for science/ stats
random samples
why are random samples wanted for stats
limits possibility of bias
what is a population
the entire group/individual units being studied that are too large to measure individually
examples of populations
- all cats falling from buildings in a city
- all fish in a lake
- all genes in a genome
what is a sample
selection of the subset of a population used to draw conclusions that ideally apply to the whole population
are samples smaller or larger than the population
smaller
examples of samples
- cats taken to the vet (after falling from buildings)
- random selection of fish in a lake
are sampling errors mistakes
NO - just differences between the estimate and the true value seen in the population
how will estimates differ from population characteristics
by random chance
is sampling error related to precision or accuracy
precision
high vs low sampling error
high
- estimates are more spread out = imprecise = high error
low
- estimates are close together = precise = low error
what defines an unbiased sample
when the average of estimates MATCHES the true population value
what is bias a symptom of
sampling problem
is bias related to precision or accuracy
accuracy
high vs low bias
high
- estimates may be close together but FAR from the true value = inaccurate = biased
low
- estimates may be close or far apart, but are average or even on the true value in the pop = accurate = unbiased
what makes a sample random
unbiased collection of a sample
- equal and independent chance of being selected for a sample
what are some difficulties for a random sample being equal chance
- environmental affects (whether some units/individuals are easier to be chosen than others)
- sample of connivence
what are some difficulties for a random sample being independent
samples of connivence
(samples taken from one location
samples taken close together)
how to take random samples
- assign every individual with a random number between 1 and N (mas pop size)
- select random integers based on the sample size (n)
methods for getting random numbers
- roll dice
- flip coin
- random number generator
how can you NOT get random numbers
by thinking of the number yourself (self conscious patterns)
what is a sample of convenience
collection of easily available individuals
why are samples of convenience NOT desirable
- leads to bias
- might not reflect truthful over the whole population
what is volunteer bias
bias created based on how different people are more likely to volunteer than others for a study
(those that need money, those that are closer, those with time)
variables vs data
variables - characteristics that differ among individuals
data - measurements of one or more variables made on a sample of individuals
two types of variables
- categorical
- numerical
categorical variables
describe membership in a group (sort samples into different groups) based on qualitative analysis of individuals
categorical variable examples
eye colour
height (short, medium tall)
age group (young, old)
blood type
morphological traits (spots, strips)
two types of categorical variables
nominal and ordinal
compare the two types of categorical variables
nominal
- no ranking needed for the groups
(blood type, eye colour, morphological)
ordinal
- DO have a ranking for the groups
(height - short to tall NOT short, tall, medium
age- young, adult, old NOT old, young, adult)
numerical variables
measurements that are quantitative (have magnitude)
examples of numerical variables
height (cm)
age (years)
weight (g or kg)
number of trichomes per leaf
two types of numerical variables
continuous and discrete
compare the two types of numerical variables
continuous
variable can take on any value in a range
(height, age, weight)
discrete
variables can only have 1 value (counting) - they are integers
(number of trichomes in a leaf, petals on a flower, number of amino acids in a protein)
explanatory vs response variables
explanatory
the variable that is manipulated by the researcher
response
the measured effect or outcome of the experiment
experimental or response variable:
independent variable
experimental variable
experimental or response variable:
dependent variable
response
estimates
related quantity calculated from a sample
can the selection of one member of the population affect another in random sampling
NO
how is bias shown in a set of samples
sampling process would favour some outcomes over others which means the measurements on these samples would NOT be an accurate representation of the population
benefit of random sampling
minimizes the bias and makes it possible to measure the amount of sampling errors
frequency
the number of observations having a particular value of the measurement
frequency distribution
how often each value of the variable occurs in the same
what is frequency distribution used for
to inform about the distribution of the variable in the population it came from
probability distribution
distribution of a variable in the whole population
normal distribution
approximates the distribution of a variable in the population from where a sample came from
confounding variables
variable that masks or distorts the causal relationship between measured variables in a study
do two events being associated together raise or lower the probability of one being the cause of the other
raises it
how can variables correlate WITHOUT one being the cause of the other
result from a common cause
how do confounding variables affect studies
by giving misleading or false relationships between the measured variables
experimental artifacts
when bias in a measurement is produced unintentionally through experimental procedures
experimental studies vs observational studies
experimental studies:
- the researcher randomly assigns subjects to treatments
observational studies
- assigning subjects to treatments is NOT done by the researcher (like the cats falling from buildings in NYC)