new UNIT 7 stuff Flashcards

1
Q

What are the three chi-squared models?

A

goodness of fit, test for homogeneity, test for independence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do you know it is GOF test?

A

When you have ONE ROW or ONE COLUMN.

then it gives you a ratio , like 1:2:5 or it gives you expected percents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you find expected count if n=25 for a 1:3:1 ratio? What test is it?

A
GOODNESS OF FIT
find total: 1+3+1 = 5
divide all by five and that gives exp percents
1/5 : 3/5 : 1/5
.20 : .60   :.20
now multiply each by n and get expected counts.
Almost always not a whole number.
25(.20) : 25(.60) :25(.20)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to find expected cell count on a matrix?

A

ROW TOTAL* COLUMN TOTAL/ OVERALL TOTAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is diff between homogeneity and test for independence?

A

homogeneity is more than one sample and asking about one variable, independence is just one sample with two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is df for goodness of fit?

A

cells - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is df for chi squared homogeneity or independence?

A

(rows-1)(columns - 1) (remove a row and a colunmn an count the cells that are left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the conditions for chi squared?

A

counts, five or more in each expected, independent (random), <10%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the conditions for inference for slope? (hyp test or confidence interval for slope?)

A

straight enough (check residuals for random scatter), random and independent, and look at the HISTOGRAM OF THE RESIDUALS and make sure they are unimodal and symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you make a confidence interval with computer output?

A

STAT +/- CRIT SE The STAT and SE are given side by side in the output
the t crit is stilll INVT(area 1 tail, n-2),
Just put the +/- t crit between the actual slope and the given std. error. The calculation is simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

With regression computer output, how do you find the p-value for hypothesis test with null: slope=0

A

p value is given at the end of the row that the slope is in! It is the SLOPE/SE
(because the t is [slope - 0)/ SE]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

With regression computer output, how is the t-ratio and the p-value calculated?

A

T ratio is just SLOPE/ST ERROR and the p value is just TCDF(T ratio, 9999, n-2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you find Expected Count?

A

for GOF: Expected % times (total)..

For indep and homog: ROW*COL/TOTAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are conditions for chi squared?

A

indep, rand, <10%, 5 or more in EXPECTED cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when do you have to check conditions?

A

In all inference procedures:
ANY CONFIDENCE INTERVAL OR HYPOTHESIS TEST
(including chi squared and slope stuff)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For the following output for the association between test score and amount of time studied, what is the equation of the LSRL?
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

Score HAT = 45.3 + 5.82 (hours studied)

17
Q

For the following output for the association between test score and amount of time studied, Create and interpret a 95% confidence interval for slope.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

STAT +- CRIT (SE)
5.82 +- CRIT (2.66)
crit is just INVT(.025, 73)
df is n-2 for regression!!

18
Q

For the following output for the association between test score and amount of time studied, Test a hypothesis to see if there is a significant association between time studied and test score.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

Ho: B1 = 0
Ha: B1 not= 0 BE SURE THEY ARE BETAS (Greek B’s)
small b is a sample slope, Beta is population slope
p value is given (.0159)… just interpret it!

19
Q

For the following output for the association between test score and amount of time studied, Interpret the slope in context.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

On average, for each hour a student studied more than another student, their test scores were about 5.82 points higher.

20
Q

For the following output for the association between test score and amount of time studied, interpret the r-squared in context.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

77.5% of the variability in test score can be explained by the model with hours studied.

21
Q

For the following output for the association between test score and amount of time studied, interpret the S in context.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

S is the standard deviation of the residual, or the typical residual. It is how far off we expect our actual data value to be from the model (from the predicted value). In context: We can expect our actual test score to be about 7.7 points off from the test score predicted by our model, based on the amount of time we studied.

22
Q

For the following output for the association between test score and amount of time studied, interpret the y intercept in context.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

The model predicts that a person who doesn’t study at all will score about a 45.3 on the test.

23
Q

For the following output for the association between test score and amount of time studied, What is the “5.82?”
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

That is the slope.

24
Q

For the following output for the association between test score and amount of time studied, where does the “t stat” column info come from?
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

T stat is the T score, the test statistic for null=0:
(STAT-NULL)/SE
(45.3 - 0) / 4.3 = 10.53
(5.82-0) / 2.66 = 2.189

25
Q

For the following output for the association between test score and amount of time studied, Where does the “p” column come from?
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

The p is the p value from a hypothesis test assuming the y intercept and the slope are zero. If you put those T-STATS into tcdf, you get the p values from this column.

26
Q

For the following output for the association between test score and amount of time studied, make a 90% confidence interval for the predicted score for someone who studied for 8 hours.
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A
To make an interval for prediction, plug 8 into equation..  45.3 + 5.82 ( 8 ) =  91.86
stand there and go up and down CRIT  S
Use S because it is the st dev of resid.
STAT +- CRIT (SE)
91.86 +_ Tcrit (7.7)
Tcrit is invt(.05, 73)
27
Q

For the following output for the association between test score and amount of time studied, what is the correlation coefficient?
OUTPUT: n=75 dep var: test score
VAR coeff se coeff t stat p
intercept 45.3 4.3 10.53 0.000
study time 5.82 2.66 2.189 0.0159
r-square: 77.5 S=7.7

A

if r squared is .775, then you have to take sqrt of that..

So r= 0.8803

28
Q

What is the common misconception about confidence intervals?

A

They know you stand at your stat and reach up and down.. BUT… People think that their statistic is in the middle of the pile of p hats or x bars!
in reality, they are almost definitely NOT!!
their statistic is most likely out on one of the sides, and they are reaching up and down and trying to catch the center!!! YOU ARE NOT IN THE MIDDLE OF THE PILE!!