Exam 3 Flashcards

Question 1

Q

EDA For categorical variables - 2 charts - 1st:

Answer

A

BAR CHARTS
- -represent categories by ARBITRARY positions on horizontal line
- -construct bar over each category such that HEIGHT is proportional to #/% in category
- -shape, center, and spread DO NOT APPLY TO BAR CHARTS

Question 2

Q

EDA For categorical variables - 2 charts - 2nd:

Answer

A

PIE CHART
- -represent categories for ARBITRARY positions in pie
- -construct pie section such that AREA of section is proportional to #/% in category

Question 3

Q

which graph is better??

Answer

A

BAR CHART always better than pie

-bc comparing bar’s heights is easier than comparing pie slice areas
-bar charts are easier to label than pie charts
-pie charts req. lots of colors, textures

Question 4

Q

Pictogram

Answer

A

picture enhanced bar chart

-can be misleading
-intended visual element is HEIGHT…but perceived visual element is area

Question 5

Q

For categorical variables:

Answer

A

p = population proportion (parameter)
phat = sample proportion (Statistic)

phat = # of India. in category of interest / # of India. in sample

ex. p = proportion of all BYU students who are married
p hat = proportion of students in a random sample of 300 BYU students who are married

Question 6

Q

proportion sampling variability

Answer

A

parameters typically UNKNOWN
- -bc usually impossible to know exactly what values a var. takes for every member of pop.
statistics are computed from the sample
- -vary from sample to sample due to sample variability

we want to understand how statistics behave relative to the parameter

Question 7

Q

sampling distribution of phat

Answer

A

–theoretical probability distribution
describes distribution of: ALL sample proportions from ALL possible random samples of the same size taken from a population

CENTER: Mean (phat) = p

SPREAD = st. dev. of sampling distribution of phat
= SD(phat) = radical ((p)*(1-p) / n)

SHAPE: approx. normal if n s large, but large depends on how close p is to .5

check: np > 10, n(1-p) > 10
- -need larger n for normality when p is close to zero of one

Question 8

Q

one sample z confidence interval for proportions

Answer

A

-C.I. estimate for the pop. proportion “p”
1. investigate sampling distribution of phat for SRS from pop. of interest
2. use sampling distribution to develop CI for p

SPREAD = radical (phat)(1 - phat) / n
SHAPE = np >10, n1-p > 10

Question 9

Q

C.I. formula for proportion

Answer

A

phat +/- z(radical (phat1-phat)/n)

phat = point estimate of p (pop. proportion)
z* = multiplier
st. dev. part = standard error of phat = estimate using sample data, of st. dev. of sampling distribution of phat

everything after +/- = m (margin of error) - measures max. diff. that could exist btw phat and p at a specified level of confidence
= table value multiplier * standard error

Question 10

Q

4 steps for C.I. proportions

Answer

A

STATE - specific parameter of interest
PLAN - choose procedure, level of confidence
SOLVE - collect data, check conditions, and calc. interval
CONCLUDE - interpret C.I.

Question 11

Q

CI proportions example

Answer

A

US senators voted 54-46 against plan to expand background checks for gun buyers - NYT news poll taken 2013 asked 965randomly selected adults whether they favor/oppose federal law req. background checks on all potential gun buyers
–87% favored

STATE: what % of U.S. adults favor a federal law req. background checks for all potential gun buyers?

PLAN: Construct a 95% large-sample z confidence interval for p, proportion of all U.S. adults who favor background checks for potential gun buyers

phat = 87%, sample size = 965, confidence level = 95%

SOLVE: conditions:
1. SRS = yes! 965 randomly selected adults
2. sampling distribution approx. normal?
(965.87) = >10 YES, (965.13) = ?10 YES!

CI = phat +/- zradical (p1-phat)/ n
=.87 +/- 1.96radical (.87.13)/ 965 = (0.849 , 0.891)

CONCLUDE: we are 95% confident that the true proportion of US adults who favor background checks for buyers is btw. .849 and .891 in April 2013

Question 12

Q

sample size determination in proportions

Answer

A

margin of error:

m = zradical(p1-p) / n
—->
n = (z/m)^2 * p(1 - p*)

p* = best guess for p (bc not p hat bc haven’t taken sample yet and not p bc don’t know pop. parameter)

setting p* = .5 always produces sample size that, if anything, is a little too large (so no harm)

Question 13

Q

ex. with finding sample size with margin of error

Answer

A

want to estimate p with 95% confidence and margin of error of 3% - what size sample do you need?

n = (1.96 / .03)^2 * .5(1 - .5) = 1067.11 = (1068) —> ALWAYS round UP

p* look at prior info. if possible, otherwise use p* = .5 and 95% CI

if n INC. the m INC.

Question 14

Q

One sample z test for pop. proportion

Answer

A

beg. with claim about value parameter
-take SRS and compute statistic (s) value
-use sampling distribution of stat —> compute prob. of getting stat. value if claim about parameter value is TRUE
-if prob. unlikely, conclude that claim about parameter value is incorrect —> reject H0

STATE - specify claim about parameter of interest
PLAN - choose procedure, specify H0, Ha, alpha
SOLVE - check conditions, test stat. and p-value
CONCLUDE - compare p-value to alpha, interpret test results

Question 15

Q

conditions and test stat. formula in one sample z test for pop. proportion

Answer

A

conditions:

SRS?
Normality? np > 10, n(1 - p) > 10

test stat.
z = (phat - p0) / radical (p0(1 - p0)) / n

pval < alpha = reject H0 = statistically significant

Question 16

Q

Role-type classifications; EDA or C to Q data

Answer

A

# of variables 1 = patter of interest: distribution
# of variables 2 (for each indiv.) = patter of interest: relationship (want to study relationship btw variables using visual displays and numerical summaries)

Question 17

Q

relationships

Answer

A

goals: characterize relationship
- -predict one from other
- -investigate cause-effect relationship

if prediction or cause-effect analysis is the goal, one variable is the RESPONSE and one is the EXPLANATORY

Y - response = outcome of the study
X - explanatory = used to predict or explain changes in response variable

Question 18

Q

response and explanatory variables chart

Answer

A

RESPONSE
categorical. quantitative
EXPLANATORY. cat. C - C C - Q
quant. Q - C Q - Q

C-Q and Q - Q important in this class

whether women more talkative than men?
–explanatory = gender (categorical) and response = level of talkativeness (quantitative)
= C - Q

Question 19

Q

C - Q

Answer

A

categorical explanatory variable and quantitative response variable
–visual display tool: side by side box plots

–numerical summary tool: 5 # summary or 2 # summary (mean and SD) for each category

Question 20

Q

Matched Pairs t-procedures for means

Answer

A

observational data:

-Individuals grouped in sets of 2
-1 individual. in each set has 1 of 2 conditions to be compared

experimental data

-units come in sets of 2 (twins, pairs of arms)
-1 unit in each set randomly assigned to each of 2 treatments

Question 21

Q

one sample t-procedures for MU (in matched pairs t-procedures)

Answer

A

C.I.
= bar +/- t* (s / radical n)

test of significance
Ho: Mu = Mo
Ha: Mu > Mo (or

Question 22

Q

randomized block design with 2 treatments or 2 measurements

Answer

A

blocks (pairs)

-2 matched individuals
-one individual and 2 treatments
-one individual: pre and post measurements

randomization

-randomly assign treatments to individuals within each pair
-randomly assign order of treatments
-randomly select individuals

matched pairs: 2 subjects
or matched pairs: one subject, 2 treatments

mean and st. dev. are computed from the differences

Question 23

Q

procedures for mean difference: (Md)

Answer

A

C.I
dbar +/- t* (Sd / radical n)

test
Ho: Md = 0
Ha: Md > 0 (or < or not equal to)
t = dbar / (Sd / radical n)

state - plan - solve - conclude

Question 24

Q

C.I. example for Md (left vs. right)

Answer

A

have two identical knobs - one right (clockwise) turn and one left
–25 right handed students turn knob specified distance with right hand
(order of knobs random)
–time for each turn it response variable
–diff. of left-right computed and analyzed

STATE:
- -what is the mean difference in time required for right handed students to turn a knob to the left vs. to the right
PLAN:
estimate the Md with a 95% confidence interval

3. SOLVE:
data collected
dbar = 13.32 seconds, n = 25, level = 95%
Sd = 22.94 seconds
--plot data with dot plot

conditions? SRS, YES!, Normal? Yes! - dotpot had no OUTLIERS

interval: dbar +/- t* Sd / radical n, df = 25-1 = 24

= 13.32 +/- (2.064)*(22.94 / radical 25) = 13.32 +/- 9.47

CONCLUDE:
We are 95% confident that the true mean difference btw left and right times is btwn 3.85 and 22.79 seconds

Question 25

Q

ex. matched pairs t-test

Answer

A

Make cola - 1. right after produced, or 2. one month later
–diff. = fresh - stored, n = 10

STATE: is there evidence that cola lost sweetness during storage?
PLAN: two measurements on each batch = fresh and stored
- -perform matched pairs t test on Md

parameter: Md = mean difference in sweetness of all cola after one month
di = fresh - stored
H0 = Md = 0
Ha: Md > 0
alpha = .05

SOLVE: conditions = SRS yes, plot data: no outliers YES
-dbar = .30, Sd = 1.16
t = (dbar - Md) / (Sd/radical n)
t = (.30 - 0) / (1.16 / radical 10) = 0.818

p-value = .200 < value < .250

CONCLUDE: value > alpha, so fail to reject Ho and conclude that evidence is not strong enough to say cola lost sweetness after one month of storage

Question 26

Q

2 sample t-procedures for means

Answer

A

One sample inference: intervals an tests for mean (Mu)
–application: matched pairs intervals and tests for a mean diff. (Md)

two-sample inference: intervals and tests for a difference btw two means (M1 - M2)

matched pairs
–2 SRS of pairs, one individual for each condition, or experiment using paired units - 1 unit randomly assigned to each treatment

two sample inference

-2 SRS - 1 from each population or
-experiment using unpaired units - half randomly assigned to each treatment

Question 27

Q

population symbols for two sample inferences

Answer

A

Pop. 1. Pop. 2
pop. mean. Mu 1. Mu 2
COMMON pop. SD. sigma sigma. (only one thats same)
sample size n1 n2
sample mean xbar 1. xbar 2
sample SD s1 s2

Mu 1 - Mu 2 = diff. btw 2 population means
xbar 1 - xbar 2 = dif. btw sample means

investigate sample distribution of xbar 1 - xbar 2 for SRS from pop. of interest
use sample distribution to develop C.I. for Mu 1 - M2
use sample distribution to develop test of significance for Mu 1 - Mu 2

Question 28

Q

sampling distribution of xbar1 - xbar2

Answer

A

take SRS of size N1 from pop. 1
same from pop. 2 (n2)
both pop. normally distributed with no outliers check
find xbar1 - xbar2

center = mean distribution of xbar1 - xbar2 = Mu1 - Mu2

spread = SD = radical((sigma^2/n1)+(sigma^2/n2))
OR sigma*radical((1/n1) + (1/n2))

shape = approx. normal if both n1 and n2 are at least 30

how do we estimate sigma?
Sp = radical ((n1 - 1)s1^2 + (n2 - 1)s2^2) / n1 + n2 - 2

Question 29

Q

xbar1 - xbar2 C.I. and test formulas

Answer

A

C.I.
= xbar1 - xbar2 +/- t* Sp*Radical(1/n1) + (1n2)
df = n1 + n2 - 2

test!
Ho: Mu1 = Mu2 or Mu1 - Mu2 = 0
Ha: Mu1 >/does not = Mu2

t = (xbar1 - xbar2) / Sp*Radical(1/n1) + (1/n2)

conditions:

randomness of data collection? - SRS or treatment SRS
normality of pop. or large sample size - check by making sure there are no outliers or both sample sizes > 30
equal pop. st. dev. (Sigma) -
- –check by (larger s) /( smaller s) < 2

Question 30

Q

Example of xbar1 - xbar2 test

Answer

A

STATE: does antidepressant cause an INC. in water consumption? use alpha = .05
PLAN: Use a two-sample t test for means
–let Mud = mean water intake for rats in drug group
—Mup = mean water intake for rates in placebo group
(so this was SRS of two treatments)

parameter: Mud - Mup
Ho: Mud - Mup = 0, or Mud = Mup
Ha: Mud - Mup > 0 , or Mud > Mud
alpha = .05

SOLVE
Check: SRS?, Normal and no outliers, and same pop. st. dev. (check by large s / small s = .750 / .564 = 1.33 <2 so good

drug placebo
xbar = 8.48ml xbar2 = 7.93 ml
s = .750 ml s = .564 ml
n = 10 n = 10

test stat. = Sp = radical((n1 - 1)s1^2) + (n2 - 1)s2^2) / n1 = n2 - 2
= radical [(10 - 1).75^2 + (10 - 1).564^2] / 10 + 10 - 2
= .664

t = (xbard - xbarp - 0) /  Sp*radical(1/nd + 1/np)
t = (8.48 - 7.93 - 0) / .664*radical (1/10 + 1/10)) = 1.852
df = 10 + 10 - 2 = 18

p-value = .025 < pvalue < .05

CONLUDE: pvalue < .05 so reject Ho

Question 31

Q

Example of xbar1 - xbar2 C.I.

Answer

A

Sp = .6635, t* = 1.734
Xbard - Xbarp +/- t* spradical(1/ns) + (1/np)
= 8.48 - 7.93 +/- (1.734).6635*Radical(1/10) + (1/10)

CI does not include 0 (.036, 1.065) so thus Mud does not equal Mup
–this confirms significance test of rejecting Ho

Question 32

Q

One way ANOVA - comparing several means

Answer

A

remember the chart with C - C, C - Q, Q - C, Q - Q

One-sample inference - intervals and tests for a mean (Mu)

two-sample inference: intervals and tests for a DIFFERENCE btwn 2 means (Mu1 - Mu2)

multi-sample inference: intervals and tests for comparisons of 3 or more means (Mu1 - mu3, Mu1 - Mu2, Mu2 - Mu3, 1/2(Mu1 + Mu2))

Question 33

Q

diff. btw 2 sample inference and multi-sample inference

Answer

A

2 sample inference - 2 separate SRSs - 1 from each population - OR
–OR experiment using unpaired units (half randomly assigned to each treatment)

multi-sample inference

-3 or more separate SRSs (1 from each population) OR
-OR expertement using unblocked units (randomly assigned to 3 or more treatments)

most scientific studies involve 3 or more groups - However: inferences and related issues are much more complicated for multi-sample studies

-complete discussion beyond scope of the course
-we will discuss just 1 useful test of significance

Question 34

Q

three two-sample t-tests of significance

Answer

A

Ho: M1 = M2 –> (xbar1 - xbar2) / sp*radical(1/n1 + 1/n2) gives p-value 1

Ho: M1 = M3 –> (xbar1 - xbar3) / sp*radical(1/n1 + 1/n3) gives p-value 2

Ho: M2 = M3 –> (xbar2 - xbar3) / sp*radical(1/n2 + 1/n3) gives p-value 3

3 ho: and 3 p-value: don’t know which p-value to use

-multiple tests - the more tests performed…the
1. greater probability of observing an extreme statistic due to chance
2. the greater probability of declaring significance for at least one test when all diff. are really due to chance alone

needed: one overall test (one null hypothesis, one test stat, one p-value) to TEST EQUALITY OF 3+ MEANS

Question 35

Q

over all test and analysis for more than 1 mean and p-value

Answer

A

overall test
- -test procedure: one-way analysis of variance (ANOVA)
- -test stat: F ratio of variances
follow up analysis
–if overall test is significant: comparison of CI for individual means can shed some light on general question of difference among Sus by testing…
Ho: M1 = m2 = m3 vs. Ha: at least one Mi is diff. from the others

Question 36

Q

ANOVA test of significance

Answer

A

conditions:

random: SRS or random allocation
pop. normally distributed or large sample size = no outliers in plots of data or sample sizes > 30
st. dev. of pop. approx. =
- –so check that (largest s) / (smallest s) < 3

test stat called “F” or “ANOVA F”

-calc. F called analysis of variance (ANOVA)
-basic idea of D: compare variation among xbars to variation expected due to randomness
-formula for F and associated p-value = use one-way ANOVA software

IF

-p-value > alpha done!! (can’t reject hypotheses that pop. means are =)
-p-value < alpha - only know at least one campion of means is diff. from 0 - look at the CI or draw box plots to know which one is off
-HINT: F is always in box on top right and you never have to solve for it
-you will know you have to reject Ho but to see which is off look at box plots - if they overlap then diff. of means is not statistically significant - if do not overlap the means differ significantly

Question 37

Q

2 way tables and conditional distributions, C - C

Answer

A

2 categorical variables in each individual (ex. handedness and birth type [single vs. twins])
–investigate relationship btwn variables using visual displays and numerical summaries

two way table of counts
- -summarizes C-C relationship
the explanatory variable is usually the row variable (gender) and the response variable is the column (opinion on beards)
2-way rectangular table of combined categories
count individuals in each combined category
sum across rows and over columns to get marginal totals
roles of row and column variables can be switched

marginal total for females

-numerical summary tool: conditional distributions for rows and columns
-visual display tool: grouped bar chains, stacked bar chains, others)

Question 38

Q

conditional distributions

Answer

A

-divide cell counts by row total to get conditional distributions
-evaluate C-C relationship by comparing
-if conditional distributions are diff. there is a potential relationship or association

for visual display: grouped/stacked bar charts

Question 39

Q

C-C summary

Answer

A

-summarize in 2-way table
-calculate conditional distribution of response variable for each value of explanatory variable
-if continual distributions are diff, there is potential connection btw categorical variables

Question 40

Q

two sample z procedures for proportions

Answer

A

investigate sampling distribution of phat1 - phat2 for SRS from 2 populations of interest or randomized controlled experiment with 2 treatments
use sampling distribution to develop a CI for p1 - p2
use sampling distribution to develop a test of significance for p1 - p2

diff. btw proportion of doctors taking aspirin who had heart attacks and proportion of doctors receiving placebo who had heart attacks
p1 - p2 = .009 - .017 = -.008

Question 41

Q

sampling distribution of phat1 - phat2

Answer

A

take SRS of size N1 from pop. 1 - observe categorical variable
take separate SRS of size n2 from pop. 2 and observe categorical variable
compute phat1 - phat 2

center = mean is p1 - p2
spread = SD is radical (p1*(1 - p1))/n1.   +.  (p2*(1-p2))/n2

shape - approx. normal if n1 and n2 are large
–check by n1p1 >5, n1(1-p1) > 5, n2p2 > 5, n2(1-p2) >5

for “approx.” sampling distribution of phat1 - phat2

center = same (p1 - p2)
SD = same but use phat instead of p under the radical

shape = normal if n1phat1 > 5 and all others (same but use phat instead of p)

Question 42

Q

CI two sample z procedures for proportions

Answer

A

CI
estimate +/- margin of error
= phat1 - phat2 +/- zradical (phat1(1 - phat1))/n1. +. (phat2*(1-phat2))/n2

phat1 - phat2 = estimate
z* = table value
SD = standard error

Question 43

Q

test of significance for two sample z procedures for proportions

Answer

A

Ho: p1 = p2, or Ho: p1 - p2 = 0
test statistic = (estimate - hypothesized value of p1 - p2) / SD expected under Ho

z = (phat1 - phat 2 - 0) / radical (p1(1-p1))/n1. +. (p2(1-p2))/n2

problem?? we don’t know p1 and p2
==use phat1 pooled sample proportion to estimate p1 and p2 as we assume Ho: p1 = p2 to be true

standard error for phat1 - phat2

-is the whole SD formula under the radical when finding CI (used lots of times in cards)
-or use radical (phat1-phat)(1/n1 + 1/n2)) when calc. a test statistic assuming the null hypothesis is true

Question 44

Q

Chi-square test for independence

Answer

A

multi sample inference for proportions: chi-squaredfor tables of counts
C-C

1 sample inference
–intervals and tests for a proportion (p)

2 sample inference
–intervals and tests for a diff. btwn 2 proportions (p1 - p2)

multi-sample inference
–intervals and tests for comparisons of 3 or more proportions

Question 45

Q

multiple separate SRS.

Answer

A

-1 from each population, categorical variable or experiment using unblocked units
-randomly assigned to several treatments, categorical response variable or
-1 SRS, 2 categorical variables for each individual

Question 46

Q

multi sample test of significance proportions

Answer

A

Ho: there is NO association btw the 2 categorical variables (they are independent)
Ha: there is an association btw the 2 categorical variables (they are not independent)

conditions:

randomness: 1 SRS with 2 variables or multiple SRSs with 1 variable or randomized experiment with multiple treatments
large sample size = all > 5

Question 47

Q

chi-squared method

Answer

A

o = observed
e = expected (row total * column total) / grand total

expected refers to values that would be expected of the null hypothesis were true (NO association)

chi-squared method

calculate expected counts assuming Ho is true
calculate a test statistic to measure the difference btw what we observe and what we expect if Ho were true

test statistic = x^2 = sum of all cells (O - E)^2) / E

use a chi-square table w (r-1) and (c-1) degrees of freedom to get a p-value
–how likely is it to get such a big discrepancy btw observed and expected?

Question 48

Q

chi-squared method example

Answer

A

STATE: Is there an association btw type of religion and religious knowledge?
PLAN: use a chi squared test with
Ho: there is no association
Ha: there is an association
alpha = .05
SOLVE: check conditions
- -random? 4 pop. and 1 categorical variable (religion and answer to JS question)
- -large? all expected counts > 5

x^2 test = sum ((O - E)^2) / E
–df = (4-1) * (2-1) = 3

x^2 = 40 and df = 3
--pvalue = .0005

CONCLUDE: reject Ho - evidence of association btwn religion an religious knowledge

Question 49

Q

If chi squared answer is SMALL…

Answer

A

it supports Ho

Question 50

Q

What does this margin of error account for?

Answer

A

sampling variability