final exam reivew Flashcards
multiple choice review
probability
likelihood of something occurring
outcome
possible result
experimental trial
one iteration/attempt at an experiment
experimental/empirical probability
based on experimental trial
empirical probability formula
P(A) = n(A)/n(T)
sum of probabilities
must equal 1
subjective probability
ones own opinion
theoretical probability
analyzing probable outcomes
theoretical probability formula
P(A) = n(A)/n(S)
sample space
the set of all possible outcomes
compliment
set of outcomes not included in event (A’)
odds in favour
P(A) : P(A’)
odds against
P(A’) : P(A)
comparing probability and odds
P(A) = h/h+k
mutually exclusive
events that can not happen at the same time
non-mutually exclusive
events that can happen at the same time
independent events
outcome of first event has no influence on second
dependent event
second event is influenced by first
box method
using boxes to represent choices
arrangement
ordered list of items
factorial
multiplying sequential natural numbers going down
permutations
arrangement of n distinct terms in definite order
permutation notation
nPr
combinations
selection from a group with no regard to order
combination notation
nCr or n!/ (n-r)!n!
sum of nth row
2^n
tn,r
t n-1, r-1 +t n-1, r
unique items in subsets
count cases or 2^n-1
identical items in subsets
(p+1)(q+1)(r+1)…-1
probability distribution
representation of all possible outcomes of experiment or sample space
expected value
predicted average
uniform distribution
all outcomes are equally likely
binomial distribution
success or failure, independent, replacement
hypergeometric distribution
uses combinations, p of success changes after every trial, no replacement, dependent
discrete data
certain countable values like 1, 2, 3…
continuous data
values within any range, unlimited # of values, like time/money
categorical/qualitative data
distinct groups, can be represented as groups or percent
nominal data
no order necessary like colours
ordinal data
makes sense to be ordered/ranked
binary data
yes or no answers
sample
smaller set out of a population, hard to rep. whole pop., more bias
population
all data is important, takes time and money, gets outdated
primary source
collected directly by researcher, not yet manipulated or organized
microdata
individual sets of data about one respondant
secondary source
data used by someone other than the researcher
aggregate data
data combined/organized so microdata can not be identified
bias
any factor than influences/favours a certain outcome/response
sampling bias
sample chosen is not a good representation of population
non-response bias
certain groups choose not to participate
measurement bias
collection method skews results
response bias
participants change answer due to fear, embarrassment or what they think the questions wants
simple random sample
randomly choose specific number of people
systematic sample
put population in an ordered list and choose people at regular intervals, easy if you have a list in order
stratified sample
divide sample into groups with the same proportions as those groups in the population, works well if there are different groups
cluster sample
divide population into groups, randomly choose a number of the groups, sample each member of the chosen groups, not all groups may rep. the population
multistage sample
divide the population into a hierarchy and choose a random sample at each level
convenience sample
choose people who are easy to access, can be unreliable, cheap
voluntary sample
allow participants to choose whether to participate or not, could produce heavily biased results based on question
mean
the average, add up all terms and divide by # of terms
median
the middle number
mode
most repeated number
range
largest - smallest
measures of central tendencies
values around where a set of data tends to cluster
outliers
values significantly out of range, affect the mean the most
normal distribution graph
bell shape curve, mean median and mode are equal
bimodal distribution graph
2 curves, mode is peak, median and mean and lowest point
left skewed/negative graph
peak on right side, mean is lowest value
right skewed/positive graph
peak is on left side, mean is highest value
uniform distribution graph
straight line, all value are equal likelihood
exponential distribution graph
exponentially decreases
continuity correction
used with discrete data, used to make up for difference between discrete and continuous data
cause and effect relationship
change in one variable (independent) directly causes change in other variable (dependent)
common cause relationship
external variable causes 2 variables to change the same way
accidental relationship
based on purely coincidence
reverse cause and effect
relationship where independent variable and dependent variable are reversed
presumed relationships
relationship that makes sense but does not show a clear connection