vocab definitions Flashcards
sample of convenience
a collection of individuals that happen to be available at the time
variable
a measured characteristic on individuals from a population under study
data
measurements of one or more variables made on a collection of individuals
explanatory variable
a variable we use to predict or explain a response vairable
response variable
a variable that is predicted or explained from a explanatory variable
populations
a group of all individuals or groups that you want to study
sample
a subset ideally randomly chosen from a population you wish to study
parameters
things we want to know about the population
estimates
are calculated from a sample to help understand perameters
bias
a systematic discrepancy between estimates and the true population characteristic
volunteer bias
volunteers for a study are likely to be different on average from the poulation
sampling error
chance difference from the truth
precision
the spread of estimates resulting from sampling error
-gives a similar answer repeatedly
accurate or unbiased
the average of estimates that are obtained is on the true population value
-accuracy (on average gets the correct answer
random sample
in a random sample each member of a population has equal and independent chance of being selected
categorial variables (attribute or qualitative variables)
describe membership in a category or group
numerical variable
when measurements of individuals are quantitative and have magnitude. numbers
continuous
numerical data that can take on any real-number value within some range. Between any two values of a continuous variable, an infinite number of other values are possible.
discrete
numerical data that come in indivisible units. Example: number of amino acids in a protein and numerical rating of a statistics professor in a student evaluation are discrete numerical measurements
frequency
the number of observations having a particular value of the measurement
frequency distribution
shows how often each value of the variable occurs in the sample.
The frequency distribution describes the number of times each value of a variable occurs in a sample
independence
two events are independeent if the occurance of on egives no info about whether the second will occrur
multiplication principle
if two evens A and B are independent, then Pr[A and B] = Pr[A] xPr[B]
The addition principle
If two events A and B are mutually exclusive, then Pr[A or B]= Pr[A] + Pr[B]
Probibility distribution
A prob distribution describes the true relative frequency of all possible values of a random vairable
Mutually exclusive
if two events are mutually exclusive they cannot both be true
Pr(A and B)= 0
probability
The prob of an event is its true relative frequency, the proportion of times the event would occur if we repeat the same process over and over
pseudoreplication
the error that occurs when samples are not indepenent, but they are treated as though they are
standard error
estimate is the standard deviation of its sampling distribution. predicts the sampling error of the estimate
standard error of an estimate
the standard deviation of its sampling distribution
It predicts the sampling error of estimate
conditional probability
the conditional probability of an event is the probability of that event occurring given that a condition is met.
Pr[X|Y]
confidence interval
the 95% confidence provides a plausible range for a parameter. All values for the parameter lying within the interval are plausible, given the data, whereas those outside are unlikely
The 2SE rule-of thumb
the interval from Y-2SEy to Y+2SEy provides a rough estimate of the 95% confidence interval for the mean
what does a x^2 goodness of Fit test do?
compares count data to a model of the expected frequencies of a set of categories
-it is an approximation (don’t use when there’s little amount of data)
H0: the data come from a specified probability distribution
x^2= sum of all classes (observed-expected)^2/ expected
Degrees of freedom
the number of degrees of freedom of a test specifies which of a family of distributions to use
for x^2 df= number of categories-number of parameters estimated from the data-1
Critical value
the value of the test statistic where P= alpha