Exam 3 Flashcards

1
Q
  1. EDA For categorical variables - 2 charts - 1st:
A
  1. BAR CHARTS
    - -represent categories by ARBITRARY positions on horizontal line
    - -construct bar over each category such that HEIGHT is proportional to #/% in category
    - -shape, center, and spread DO NOT APPLY TO BAR CHARTS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. EDA For categorical variables - 2 charts - 2nd:
A
  1. PIE CHART
    - -represent categories for ARBITRARY positions in pie
    - -construct pie section such that AREA of section is proportional to #/% in category
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

which graph is better??

A

BAR CHART always better than pie

  • -bc comparing bar’s heights is easier than comparing pie slice areas
  • -bar charts are easier to label than pie charts
  • -pie charts req. lots of colors, textures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pictogram

A

picture enhanced bar chart

  • -can be misleading
  • -intended visual element is HEIGHT…but perceived visual element is area
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For categorical variables:

A
p = population proportion (parameter)
phat = sample proportion (Statistic)

phat = # of India. in category of interest / # of India. in sample

ex. p = proportion of all BYU students who are married
p hat = proportion of students in a random sample of 300 BYU students who are married

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

proportion sampling variability

A
  1. parameters typically UNKNOWN
    - -bc usually impossible to know exactly what values a var. takes for every member of pop.
  2. statistics are computed from the sample
    - -vary from sample to sample due to sample variability

we want to understand how statistics behave relative to the parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sampling distribution of phat

A

–theoretical probability distribution
describes distribution of: ALL sample proportions from ALL possible random samples of the same size taken from a population

CENTER: Mean (phat) = p

SPREAD = st. dev. of sampling distribution of phat
= SD(phat) = radical ((p)*(1-p) / n)

SHAPE: approx. normal if n s large, but large depends on how close p is to .5

check: np > 10, n(1-p) > 10
- -need larger n for normality when p is close to zero of one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. one sample z confidence interval for proportions
A
  • -C.I. estimate for the pop. proportion “p”
    1. investigate sampling distribution of phat for SRS from pop. of interest
    2. use sampling distribution to develop CI for p
SPREAD = radical (phat)(1 - phat) / n
SHAPE = np >10, n1-p > 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

C.I. formula for proportion

A

phat +/- z(radical (phat1-phat)/n)

phat = point estimate of p (pop. proportion)
z* = multiplier
st. dev. part = standard error of phat = estimate using sample data, of st. dev. of sampling distribution of phat

everything after +/- = m (margin of error) - measures max. diff. that could exist btw phat and p at a specified level of confidence
= table value multiplier * standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

4 steps for C.I. proportions

A
  1. STATE - specific parameter of interest
  2. PLAN - choose procedure, level of confidence
  3. SOLVE - collect data, check conditions, and calc. interval
  4. CONCLUDE - interpret C.I.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

CI proportions example

A

US senators voted 54-46 against plan to expand background checks for gun buyers - NYT news poll taken 2013 asked 965randomly selected adults whether they favor/oppose federal law req. background checks on all potential gun buyers
–87% favored

STATE: what % of U.S. adults favor a federal law req. background checks for all potential gun buyers?

PLAN: Construct a 95% large-sample z confidence interval for p, proportion of all U.S. adults who favor background checks for potential gun buyers

phat = 87%, sample size = 965, confidence level = 95%

SOLVE: conditions:
1. SRS = yes! 965 randomly selected adults
2. sampling distribution approx. normal?
(965.87) = >10 YES, (965.13) = ?10 YES!

CI = phat +/- zradical (p1-phat)/ n
=.87 +/- 1.96radical (.87.13)/ 965 = (0.849 , 0.891)

CONCLUDE: we are 95% confident that the true proportion of US adults who favor background checks for buyers is btw. .849 and .891 in April 2013

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sample size determination in proportions

A

margin of error:

m = zradical(p1-p) / n
—->
n = (z/m)^2 * p(1 - p*)

p* = best guess for p (bc not p hat bc haven’t taken sample yet and not p bc don’t know pop. parameter)

setting p* = .5 always produces sample size that, if anything, is a little too large (so no harm)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ex. with finding sample size with margin of error

A

want to estimate p with 95% confidence and margin of error of 3% - what size sample do you need?

n = (1.96 / .03)^2 * .5(1 - .5) = 1067.11 = (1068) —> ALWAYS round UP

p* look at prior info. if possible, otherwise use p* = .5 and 95% CI

if n INC. the m INC.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

One sample z test for pop. proportion

A
  • beg. with claim about value parameter
  • -take SRS and compute statistic (s) value
  • -use sampling distribution of stat —> compute prob. of getting stat. value if claim about parameter value is TRUE
  • -if prob. unlikely, conclude that claim about parameter value is incorrect —> reject H0

STATE - specify claim about parameter of interest
PLAN - choose procedure, specify H0, Ha, alpha
SOLVE - check conditions, test stat. and p-value
CONCLUDE - compare p-value to alpha, interpret test results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

conditions and test stat. formula in one sample z test for pop. proportion

A

conditions:

  1. SRS?
  2. Normality? np > 10, n(1 - p) > 10

test stat.
z = (phat - p0) / radical (p0(1 - p0)) / n

pval < alpha = reject H0 = statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. Role-type classifications; EDA or C to Q data
A
# of variables 1 = patter of interest: distribution
# of variables 2 (for each indiv.) = patter of interest: relationship (want to study relationship btw variables using visual displays and numerical summaries)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

relationships

A

goals: characterize relationship
- -predict one from other
- -investigate cause-effect relationship

if prediction or cause-effect analysis is the goal, one variable is the RESPONSE and one is the EXPLANATORY

Y - response = outcome of the study
X - explanatory = used to predict or explain changes in response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

response and explanatory variables chart

A

RESPONSE
categorical. quantitative
EXPLANATORY. cat. C - C C - Q
quant. Q - C Q - Q

C-Q and Q - Q important in this class

whether women more talkative than men?
–explanatory = gender (categorical) and response = level of talkativeness (quantitative)
= C - Q

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

C - Q

A

categorical explanatory variable and quantitative response variable
–visual display tool: side by side box plots

–numerical summary tool: 5 # summary or 2 # summary (mean and SD) for each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  1. Matched Pairs t-procedures for means
A

observational data:

  • -Individuals grouped in sets of 2
  • -1 individual. in each set has 1 of 2 conditions to be compared

experimental data

  • -units come in sets of 2 (twins, pairs of arms)
  • -1 unit in each set randomly assigned to each of 2 treatments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

one sample t-procedures for MU (in matched pairs t-procedures)

A

C.I.
= bar +/- t* (s / radical n)

test of significance
Ho: Mu = Mo
Ha: Mu > Mo (or

22
Q

randomized block design with 2 treatments or 2 measurements

A

blocks (pairs)

  • -2 matched individuals
  • -one individual and 2 treatments
  • -one individual: pre and post measurements

randomization

  • -randomly assign treatments to individuals within each pair
  • -randomly assign order of treatments
  • -randomly select individuals

matched pairs: 2 subjects
or matched pairs: one subject, 2 treatments

mean and st. dev. are computed from the differences

23
Q

procedures for mean difference: (Md)

A

C.I
dbar +/- t* (Sd / radical n)

test
Ho: Md = 0
Ha: Md > 0 (or < or not equal to)
t = dbar / (Sd / radical n)

state - plan - solve - conclude

24
Q

C.I. example for Md (left vs. right)

A

have two identical knobs - one right (clockwise) turn and one left
–25 right handed students turn knob specified distance with right hand
(order of knobs random)
–time for each turn it response variable
–diff. of left-right computed and analyzed

  1. STATE:
    - -what is the mean difference in time required for right handed students to turn a knob to the left vs. to the right
  2. PLAN:
    estimate the Md with a 95% confidence interval
3. SOLVE:
data collected
dbar = 13.32 seconds, n = 25, level = 95%
Sd = 22.94 seconds
--plot data with dot plot

conditions? SRS, YES!, Normal? Yes! - dotpot had no OUTLIERS

interval: dbar +/- t* Sd / radical n, df = 25-1 = 24

= 13.32 +/- (2.064)*(22.94 / radical 25) = 13.32 +/- 9.47

  1. CONCLUDE:
    We are 95% confident that the true mean difference btw left and right times is btwn 3.85 and 22.79 seconds
25
Q

ex. matched pairs t-test

A

Make cola - 1. right after produced, or 2. one month later
–diff. = fresh - stored, n = 10

  1. STATE: is there evidence that cola lost sweetness during storage?
  2. PLAN: two measurements on each batch = fresh and stored
    - -perform matched pairs t test on Md
parameter: Md = mean difference in sweetness of all cola after one month
di = fresh - stored
H0 = Md = 0
Ha: Md > 0
alpha = .05
  1. SOLVE: conditions = SRS yes, plot data: no outliers YES
    -dbar = .30, Sd = 1.16
    t = (dbar - Md) / (Sd/radical n)
    t = (.30 - 0) / (1.16 / radical 10) = 0.818

p-value = .200 < value < .250

  1. CONCLUDE: value > alpha, so fail to reject Ho and conclude that evidence is not strong enough to say cola lost sweetness after one month of storage
26
Q
  1. 2 sample t-procedures for means
A

One sample inference: intervals an tests for mean (Mu)
–application: matched pairs intervals and tests for a mean diff. (Md)

two-sample inference: intervals and tests for a difference btw two means (M1 - M2)

matched pairs
–2 SRS of pairs, one individual for each condition, or experiment using paired units - 1 unit randomly assigned to each treatment

two sample inference

  • -2 SRS - 1 from each population or
  • -experiment using unpaired units - half randomly assigned to each treatment
27
Q

population symbols for two sample inferences

A

Pop. 1. Pop. 2
pop. mean. Mu 1. Mu 2
COMMON pop. SD. sigma sigma. (only one thats same)
sample size n1 n2
sample mean xbar 1. xbar 2
sample SD s1 s2

Mu 1 - Mu 2 = diff. btw 2 population means
xbar 1 - xbar 2 = dif. btw sample means

  1. investigate sample distribution of xbar 1 - xbar 2 for SRS from pop. of interest
  2. use sample distribution to develop C.I. for Mu 1 - M2
  3. use sample distribution to develop test of significance for Mu 1 - Mu 2
28
Q

sampling distribution of xbar1 - xbar2

A
  1. take SRS of size N1 from pop. 1
  2. same from pop. 2 (n2)
  3. both pop. normally distributed with no outliers check
  4. find xbar1 - xbar2

center = mean distribution of xbar1 - xbar2 = Mu1 - Mu2

spread = SD = radical((sigma^2/n1)+(sigma^2/n2))
OR sigma*radical((1/n1) + (1/n2))

shape = approx. normal if both n1 and n2 are at least 30

how do we estimate sigma?
Sp = radical ((n1 - 1)s1^2 + (n2 - 1)s2^2) / n1 + n2 - 2

29
Q

xbar1 - xbar2 C.I. and test formulas

A

C.I.
= xbar1 - xbar2 +/- t* Sp*Radical(1/n1) + (1n2)
df = n1 + n2 - 2

test!
Ho: Mu1 = Mu2 or Mu1 - Mu2 = 0
Ha: Mu1 >/does not = Mu2

t = (xbar1 - xbar2) / Sp*Radical(1/n1) + (1/n2)

conditions:

  1. randomness of data collection? - SRS or treatment SRS
  2. normality of pop. or large sample size - check by making sure there are no outliers or both sample sizes > 30
  3. equal pop. st. dev. (Sigma) -
    - –check by (larger s) /( smaller s) < 2
30
Q

Example of xbar1 - xbar2 test

A
  1. STATE: does antidepressant cause an INC. in water consumption? use alpha = .05
  2. PLAN: Use a two-sample t test for means
    –let Mud = mean water intake for rats in drug group
    —Mup = mean water intake for rates in placebo group
    (so this was SRS of two treatments)

parameter: Mud - Mup
Ho: Mud - Mup = 0, or Mud = Mup
Ha: Mud - Mup > 0 , or Mud > Mud
alpha = .05

  1. SOLVE
    Check: SRS?, Normal and no outliers, and same pop. st. dev. (check by large s / small s = .750 / .564 = 1.33 <2 so good

drug placebo
xbar = 8.48ml xbar2 = 7.93 ml
s = .750 ml s = .564 ml
n = 10 n = 10

test stat. = Sp = radical((n1 - 1)s1^2) + (n2 - 1)s2^2) / n1 = n2 - 2
= radical [(10 - 1).75^2 + (10 - 1).564^2] / 10 + 10 - 2
= .664

t = (xbard - xbarp - 0) /  Sp*radical(1/nd + 1/np)
t = (8.48 - 7.93 - 0) / .664*radical (1/10 + 1/10)) = 1.852
df = 10 + 10 - 2 = 18

p-value = .025 < pvalue < .05

  1. CONLUDE: pvalue < .05 so reject Ho
31
Q

Example of xbar1 - xbar2 C.I.

A

Sp = .6635, t* = 1.734
Xbard - Xbarp +/- t* spradical(1/ns) + (1/np)
= 8.48 - 7.93 +/- (1.734)
.6635*Radical(1/10) + (1/10)

CI does not include 0 (.036, 1.065) so thus Mud does not equal Mup
–this confirms significance test of rejecting Ho

32
Q
  1. One way ANOVA - comparing several means
A

remember the chart with C - C, C - Q, Q - C, Q - Q

One-sample inference - intervals and tests for a mean (Mu)

two-sample inference: intervals and tests for a DIFFERENCE btwn 2 means (Mu1 - Mu2)

multi-sample inference: intervals and tests for comparisons of 3 or more means (Mu1 - mu3, Mu1 - Mu2, Mu2 - Mu3, 1/2(Mu1 + Mu2))

33
Q

diff. btw 2 sample inference and multi-sample inference

A

2 sample inference - 2 separate SRSs - 1 from each population - OR
–OR experiment using unpaired units (half randomly assigned to each treatment)

multi-sample inference

  • -3 or more separate SRSs (1 from each population) OR
  • -OR expertement using unblocked units (randomly assigned to 3 or more treatments)

most scientific studies involve 3 or more groups - However: inferences and related issues are much more complicated for multi-sample studies

  • -complete discussion beyond scope of the course
  • -we will discuss just 1 useful test of significance
34
Q

three two-sample t-tests of significance

A

Ho: M1 = M2 –> (xbar1 - xbar2) / sp*radical(1/n1 + 1/n2) gives p-value 1

Ho: M1 = M3 –> (xbar1 - xbar3) / sp*radical(1/n1 + 1/n3) gives p-value 2

Ho: M2 = M3 –> (xbar2 - xbar3) / sp*radical(1/n2 + 1/n3) gives p-value 3

3 ho: and 3 p-value: don’t know which p-value to use

  • -multiple tests - the more tests performed…the
    1. greater probability of observing an extreme statistic due to chance
    2. the greater probability of declaring significance for at least one test when all diff. are really due to chance alone

needed: one overall test (one null hypothesis, one test stat, one p-value) to TEST EQUALITY OF 3+ MEANS

35
Q

over all test and analysis for more than 1 mean and p-value

A
  1. overall test
    - -test procedure: one-way analysis of variance (ANOVA)
    - -test stat: F ratio of variances
  2. follow up analysis
    –if overall test is significant: comparison of CI for individual means can shed some light on general question of difference among Sus by testing…
    Ho: M1 = m2 = m3 vs. Ha: at least one Mi is diff. from the others
36
Q

ANOVA test of significance

A

conditions:

  1. random: SRS or random allocation
  2. pop. normally distributed or large sample size = no outliers in plots of data or sample sizes > 30
  3. st. dev. of pop. approx. =
    - –so check that (largest s) / (smallest s) < 3

test stat called “F” or “ANOVA F”

  • -calc. F called analysis of variance (ANOVA)
  • -basic idea of D: compare variation among xbars to variation expected due to randomness
  • -formula for F and associated p-value = use one-way ANOVA software

IF

  • -p-value > alpha done!! (can’t reject hypotheses that pop. means are =)
  • -p-value < alpha - only know at least one campion of means is diff. from 0 - look at the CI or draw box plots to know which one is off
  • -HINT: F is always in box on top right and you never have to solve for it
  • -you will know you have to reject Ho but to see which is off look at box plots - if they overlap then diff. of means is not statistically significant - if do not overlap the means differ significantly
37
Q
  1. 2 way tables and conditional distributions, C - C
A

2 categorical variables in each individual (ex. handedness and birth type [single vs. twins])
–investigate relationship btwn variables using visual displays and numerical summaries

  1. two way table of counts
    - -summarizes C-C relationship
  2. the explanatory variable is usually the row variable (gender) and the response variable is the column (opinion on beards)
  3. 2-way rectangular table of combined categories
  4. count individuals in each combined category
  5. sum across rows and over columns to get marginal totals
  6. roles of row and column variables can be switched

marginal total for females

  • -numerical summary tool: conditional distributions for rows and columns
  • -visual display tool: grouped bar chains, stacked bar chains, others)
38
Q
  1. conditional distributions
A
  • -divide cell counts by row total to get conditional distributions
  • -evaluate C-C relationship by comparing
  • -if conditional distributions are diff. there is a potential relationship or association

for visual display: grouped/stacked bar charts

39
Q

C-C summary

A
  • -summarize in 2-way table
  • -calculate conditional distribution of response variable for each value of explanatory variable
  • -if continual distributions are diff, there is potential connection btw categorical variables
40
Q
  1. two sample z procedures for proportions
A
  1. investigate sampling distribution of phat1 - phat2 for SRS from 2 populations of interest or randomized controlled experiment with 2 treatments
  2. use sampling distribution to develop a CI for p1 - p2
  3. use sampling distribution to develop a test of significance for p1 - p2

diff. btw proportion of doctors taking aspirin who had heart attacks and proportion of doctors receiving placebo who had heart attacks
p1 - p2 = .009 - .017 = -.008

41
Q

sampling distribution of phat1 - phat2

A
  1. take SRS of size N1 from pop. 1 - observe categorical variable
  2. take separate SRS of size n2 from pop. 2 and observe categorical variable
  3. compute phat1 - phat 2
center = mean is p1 - p2
spread = SD is radical (p1*(1 - p1))/n1.   +.  (p2*(1-p2))/n2

shape - approx. normal if n1 and n2 are large
–check by n1p1 >5, n1(1-p1) > 5, n2p2 > 5, n2(1-p2) >5

for “approx.” sampling distribution of phat1 - phat2

center = same (p1 - p2)
SD = same but use phat instead of p under the radical

shape = normal if n1phat1 > 5 and all others (same but use phat instead of p)

42
Q

CI two sample z procedures for proportions

A

CI
estimate +/- margin of error
= phat1 - phat2 +/- zradical (phat1(1 - phat1))/n1. +. (phat2*(1-phat2))/n2

phat1 - phat2 = estimate
z* = table value
SD = standard error

43
Q

test of significance for two sample z procedures for proportions

A

Ho: p1 = p2, or Ho: p1 - p2 = 0
test statistic = (estimate - hypothesized value of p1 - p2) / SD expected under Ho

z = (phat1 - phat 2 - 0) / radical (p1(1-p1))/n1. +. (p2(1-p2))/n2

problem?? we don’t know p1 and p2
==use phat1 pooled sample proportion to estimate p1 and p2 as we assume Ho: p1 = p2 to be true

standard error for phat1 - phat2

  • -is the whole SD formula under the radical when finding CI (used lots of times in cards)
  • -or use radical (phat1-phat)(1/n1 + 1/n2)) when calc. a test statistic assuming the null hypothesis is true
44
Q
  1. Chi-square test for independence
A

multi sample inference for proportions: chi-squaredfor tables of counts
C-C

1 sample inference
–intervals and tests for a proportion (p)

2 sample inference
–intervals and tests for a diff. btwn 2 proportions (p1 - p2)

multi-sample inference
–intervals and tests for comparisons of 3 or more proportions

45
Q

multiple separate SRS.

A
  • -1 from each population, categorical variable or experiment using unblocked units
  • -randomly assigned to several treatments, categorical response variable or
  • -1 SRS, 2 categorical variables for each individual
46
Q

multi sample test of significance proportions

A

Ho: there is NO association btw the 2 categorical variables (they are independent)
Ha: there is an association btw the 2 categorical variables (they are not independent)

conditions:

  1. randomness: 1 SRS with 2 variables or multiple SRSs with 1 variable or randomized experiment with multiple treatments
  2. large sample size = all > 5
47
Q

chi-squared method

A
o = observed
e = expected (row total * column total) / grand total

expected refers to values that would be expected of the null hypothesis were true (NO association)

chi-squared method

  1. calculate expected counts assuming Ho is true
  2. calculate a test statistic to measure the difference btw what we observe and what we expect if Ho were true

test statistic = x^2 = sum of all cells (O - E)^2) / E

use a chi-square table w (r-1) and (c-1) degrees of freedom to get a p-value
–how likely is it to get such a big discrepancy btw observed and expected?

48
Q

chi-squared method example

A
  1. STATE: Is there an association btw type of religion and religious knowledge?
  2. PLAN: use a chi squared test with
    Ho: there is no association
    Ha: there is an association
    alpha = .05
  3. SOLVE: check conditions
    - -random? 4 pop. and 1 categorical variable (religion and answer to JS question)
    - -large? all expected counts > 5

x^2 test = sum ((O - E)^2) / E
–df = (4-1) * (2-1) = 3

x^2 = 40 and df = 3
--pvalue = .0005
  1. CONCLUDE: reject Ho - evidence of association btwn religion an religious knowledge
49
Q

If chi squared answer is SMALL…

A

it supports Ho

50
Q

What does this margin of error account for?

A

sampling variability