Exam 3 Flashcards
- EDA For categorical variables - 2 charts - 1st:
- BAR CHARTS
- -represent categories by ARBITRARY positions on horizontal line
- -construct bar over each category such that HEIGHT is proportional to #/% in category
- -shape, center, and spread DO NOT APPLY TO BAR CHARTS
- EDA For categorical variables - 2 charts - 2nd:
- PIE CHART
- -represent categories for ARBITRARY positions in pie
- -construct pie section such that AREA of section is proportional to #/% in category
which graph is better??
BAR CHART always better than pie
- -bc comparing bar’s heights is easier than comparing pie slice areas
- -bar charts are easier to label than pie charts
- -pie charts req. lots of colors, textures
Pictogram
picture enhanced bar chart
- -can be misleading
- -intended visual element is HEIGHT…but perceived visual element is area
For categorical variables:
p = population proportion (parameter) phat = sample proportion (Statistic)
phat = # of India. in category of interest / # of India. in sample
ex. p = proportion of all BYU students who are married
p hat = proportion of students in a random sample of 300 BYU students who are married
proportion sampling variability
- parameters typically UNKNOWN
- -bc usually impossible to know exactly what values a var. takes for every member of pop. - statistics are computed from the sample
- -vary from sample to sample due to sample variability
we want to understand how statistics behave relative to the parameter
sampling distribution of phat
–theoretical probability distribution
describes distribution of: ALL sample proportions from ALL possible random samples of the same size taken from a population
CENTER: Mean (phat) = p
SPREAD = st. dev. of sampling distribution of phat
= SD(phat) = radical ((p)*(1-p) / n)
SHAPE: approx. normal if n s large, but large depends on how close p is to .5
check: np > 10, n(1-p) > 10
- -need larger n for normality when p is close to zero of one
- one sample z confidence interval for proportions
- -C.I. estimate for the pop. proportion “p”
1. investigate sampling distribution of phat for SRS from pop. of interest
2. use sampling distribution to develop CI for p
SPREAD = radical (phat)(1 - phat) / n SHAPE = np >10, n1-p > 10
C.I. formula for proportion
phat +/- z(radical (phat1-phat)/n)
phat = point estimate of p (pop. proportion)
z* = multiplier
st. dev. part = standard error of phat = estimate using sample data, of st. dev. of sampling distribution of phat
everything after +/- = m (margin of error) - measures max. diff. that could exist btw phat and p at a specified level of confidence
= table value multiplier * standard error
4 steps for C.I. proportions
- STATE - specific parameter of interest
- PLAN - choose procedure, level of confidence
- SOLVE - collect data, check conditions, and calc. interval
- CONCLUDE - interpret C.I.
CI proportions example
US senators voted 54-46 against plan to expand background checks for gun buyers - NYT news poll taken 2013 asked 965randomly selected adults whether they favor/oppose federal law req. background checks on all potential gun buyers
–87% favored
STATE: what % of U.S. adults favor a federal law req. background checks for all potential gun buyers?
PLAN: Construct a 95% large-sample z confidence interval for p, proportion of all U.S. adults who favor background checks for potential gun buyers
phat = 87%, sample size = 965, confidence level = 95%
SOLVE: conditions:
1. SRS = yes! 965 randomly selected adults
2. sampling distribution approx. normal?
(965.87) = >10 YES, (965.13) = ?10 YES!
CI = phat +/- zradical (p1-phat)/ n
=.87 +/- 1.96radical (.87.13)/ 965 = (0.849 , 0.891)
CONCLUDE: we are 95% confident that the true proportion of US adults who favor background checks for buyers is btw. .849 and .891 in April 2013
sample size determination in proportions
margin of error:
m = zradical(p1-p) / n
—->
n = (z/m)^2 * p(1 - p*)
p* = best guess for p (bc not p hat bc haven’t taken sample yet and not p bc don’t know pop. parameter)
setting p* = .5 always produces sample size that, if anything, is a little too large (so no harm)
ex. with finding sample size with margin of error
want to estimate p with 95% confidence and margin of error of 3% - what size sample do you need?
n = (1.96 / .03)^2 * .5(1 - .5) = 1067.11 = (1068) —> ALWAYS round UP
p* look at prior info. if possible, otherwise use p* = .5 and 95% CI
if n INC. the m INC.
One sample z test for pop. proportion
- beg. with claim about value parameter
- -take SRS and compute statistic (s) value
- -use sampling distribution of stat —> compute prob. of getting stat. value if claim about parameter value is TRUE
- -if prob. unlikely, conclude that claim about parameter value is incorrect —> reject H0
STATE - specify claim about parameter of interest
PLAN - choose procedure, specify H0, Ha, alpha
SOLVE - check conditions, test stat. and p-value
CONCLUDE - compare p-value to alpha, interpret test results
conditions and test stat. formula in one sample z test for pop. proportion
conditions:
- SRS?
- Normality? np > 10, n(1 - p) > 10
test stat.
z = (phat - p0) / radical (p0(1 - p0)) / n
pval < alpha = reject H0 = statistically significant
- Role-type classifications; EDA or C to Q data
# of variables 1 = patter of interest: distribution # of variables 2 (for each indiv.) = patter of interest: relationship (want to study relationship btw variables using visual displays and numerical summaries)
relationships
goals: characterize relationship
- -predict one from other
- -investigate cause-effect relationship
if prediction or cause-effect analysis is the goal, one variable is the RESPONSE and one is the EXPLANATORY
Y - response = outcome of the study
X - explanatory = used to predict or explain changes in response variable
response and explanatory variables chart
RESPONSE
categorical. quantitative
EXPLANATORY. cat. C - C C - Q
quant. Q - C Q - Q
C-Q and Q - Q important in this class
whether women more talkative than men?
–explanatory = gender (categorical) and response = level of talkativeness (quantitative)
= C - Q
C - Q
categorical explanatory variable and quantitative response variable
–visual display tool: side by side box plots
–numerical summary tool: 5 # summary or 2 # summary (mean and SD) for each category
- Matched Pairs t-procedures for means
observational data:
- -Individuals grouped in sets of 2
- -1 individual. in each set has 1 of 2 conditions to be compared
experimental data
- -units come in sets of 2 (twins, pairs of arms)
- -1 unit in each set randomly assigned to each of 2 treatments