exam 1 Flashcards
population
entire collection of all elements in which we are interested
sample
portion of population collected under homogenous conditions
simple random sample
-every member of the population has the same chance of being included in the sample
-members of the sample are chosen independently of eachother
t/f? categorical/ qualitative data can only be ordinal
false. categorical data can be ordinal or non-ordinal (nominal)
nominal variables are the (lowest/highest) level qualitative variable and the (lowest/highest) level of measurement
lowest, lowest
nominal measures
simply name, group, type, classify or categorize values of a variable
-only categorized
ordinal variables
-second level of measurement and highest level of qualitative variables
-typically used to order/ rank values of variables in addition to naming values
-categorized and ranked
ordinal scales
have all characteristics of nominal variables but also order/ rank data
example of ordinal
-agreement: strongly agree, agree, neither agree nor disagree, disagree, strongly disagree
-taste: scrumptious, okay, bland, a dog wouldn’t eat it
Discrete data
numerical type of data that includes whole concrete numbers with specific and fixed data values determined by counting
-concrete fixed numbers
continuous data
numerical type of data that includes complex numbers and varying data values measured over a particular time interval.
-always varying
example discrete data
- number of boys in a family
-number of deer killed on I-79
examples continuous data
-time
-weight
-height
variables of interest denoted by
capital letters
actual values denoted by
lower case letters/ subscript characters
ungrouped frequency
normally used for categorical data
grouped frequency
quantitative data combined in 5-15 classes depending on the amount of data
how should frequency distributions be graphed
as a histogram (bar chart)
frequency distribution
number of occurrences of each value in data set
relative frequency distributions
-frequency divided by the sample size
-tells you percentage in each class
cumulative frequency distribution
-counts the number of values at or below the upper class limit of each grouping
cumulative relative frequency
-percentage of values at or below the upper class limit
5 columns of complete frequency table
-group
-frequency
-relative frequency
-cumulative frequency
-cumulative relative frequency
how to find relative frequency
frequency/total
how to find cumulative frequency
adding up relative frequency
how to find cumulative relative frequency
add up relative frequency
symmetric frequency distribution shape
left half of graph mirror image of left half of graph
positive skew
tail of graph to right
negative skew
tail of graph to left
bimodal
two peaks on graph
where is mean median and mode on skewed graph
mode at peak on graph
-median after mode towards tail
-mean after median towards tail
statistically unethical diversion
-changing scales on one or both axes
-truncating frequency axis and starting frequency axis at number greater than zero
mean
average, point of balance on data
median
-middle value of sorted data (lowest to highest)
median is (greater than/ less than) mean when there is right skewed data
less than
symbol for mean
y with straight line above
symbol for median
y with tilde above
how to get trimmed mean
-delete upper and lower 10 percent of range and recalculate mean
measures of dispersionn
-range
-Q0
-Q1
-Q2
-Q3
-Q4
-IQR
how to find range
largest observation - smallest observation
Q1
median of first half of data
Q2
median of data
Q3
median of second half of data
Q0
minimum
Q4
maximum
IQR
Q3-Q1
how to find upper and lower fence
upper: Q3 +1.5(IQR)
lower: Q1 - 1.5(IQR)
how does one indicate an outlier
with an asterisks
what is standard deviation used to measure
typical distance of observations from the mean
which form of standard deviation is most accurate
computational standard deviation
what is the formula for coefficient of variation
(standard deviation/ mean ) x 100%
what does the coefficient of variation measure
-the amount of variability relative to the value of the mean
-usually used for comparisons of two data sets measured on different scales
what is the empirical rule
for unimodal unskewed distributions:
- 68% observations within 1 sd of the mean
-95% of observations within 2 sd of the mean
-greater than 99% observations within 3 sd of the mean
what is statistical inferencing used for
-drawing conclusions about a population based on observations in a sample
density curves
-a smooth curve re[resenting a frequency distribution
what is the total area of a density curve
1
two variables of proportion statistic estimate parameters
p hat (sample proportion) is about equal to p (population proportion)
two variables of mean statistic estimate parameters
y with bar above (sample mean) about equal to mu (fancy u = population mean)
two variables of standard deviation statistic estimate parameters
s (sample standard deviation) is about equal to sideways 6 aka sigma (population standard deviation)
three methods of counting techniques
-multiplication rule of counting
-permutations
-combinations
Multiplication rule of counting
-if event A can occur n ways and event B can occur m ways, then in sequence they can occur in mn ways
-ex: A=5 B=3 C=4, probability of ABC= 60
permutations
-number of ways to arrange in order n distinct objects, taking them r at a time
ex: 5 people in 3 chairs
-5 ppl could sit in chair 1, 4 in 2, 3 in 3
-still have 2 people left over so formula = 5!/(5-3)!
combinations
-the number of unordered ways to pick n distinct objects taking them r at a time
-formula: n!/(r!(n-r)!)
-ex: 6 ppl sitting in 3 chairs, how many ways can three people be chosen in any order:
-permutations/ # ways people can be arranged:
-(6x5x4)/(6)= 120/6=20
probability
numerical quantity that expresses likelihood/ chance that particular event will occur
P(E)
-number of ways event E can happen/ total possibilities
Notation of P(E)
-P(E)-probability of event E happening
-E={O1, O2, O3 etc} where Oi= outcome i
-outcomes mutually exclusive (cant occur at same time)
– P(E)= P(O1)+(PO2)+…
-probability event always between 0-1
-s denotes sample space: all possible outcomes
-P(E^c)= 1-P(E)
- 5% of the population has a certain
disease - Test for the disease has an 80% chance of
detecting a person actually has it. - And a 90% chance of detecting a person
that does not have it. - Q: Given that a person tests positive,
what is the probability they have the
disease?
probability tree
-Yes(have disease .05), No (dont have disease .95)
-Yes (test positive .8), Yes (test negative .2)
-No (test positive .1), No (test negative .9)
-have disease : .05 .8= .04
- test positive = yes(+) no (+)= (.05.8)(.1*.95)= .135
-have disease: .04
-probability= .04/.135= 29.6%
density curve x probability
for any two numbers a and b, Area under density curve between a and b is equal to the proportion of Y values between a and b
random variable
variable whose outcome depends on outcomes of chance operation
-probability distribution-list of random variable and probabilities associated with possible outcomes
requirements for binomial model
-series of n trials
-each trial is identical and can result in success or failure
-probability of success remains constant trial to trial
-each outcome is independent of other outcomes
-
Carrier of TB has a 10% chance of passing
disease to anyone in close contact. Suppose the
carrier comes in close contact with 10 people
what is the probability that 4 get TB?
(10!)/(4! (10-4)!) x (.10)^4 x (.90)^6= .01116