t tests Flashcards by D'Mayah Lewis

two ways to generate two gets of scores

repeated measures design
independent groups design

How well did you know this?

Not at all

Perfectly

repeated measures design

every subject is exposed to each treatment condition and scores measured
comparison are between scores for the same individuals under different conditions
analyze using paired t test

How well did you know this?

Not at all

Perfectly

independent groups design

each subject is exposed to a single treatment condition and scores measured
comparisons are between scores from different individuals under different conditions
analyze using the independent t test

How well did you know this?

Not at all

Perfectly

what are some sources of variation between scores?

effects of treatment

individual differences
- differences in baseline score
- differences in responsiveness to treatment

effects associated with uncontrolled variables
measurement error

repeated measures eliminates variations due to individual differences!

How well did you know this?

Not at all

Perfectly

repeated-measures calculated difference (D) scores

Di = yBi — yAi

difference= (condition B) - (condition A)

this converts two sets of scores (conditions A and B) into a single set of scores (D)
- can compare and see if there’s variation between scores or variation between D scores

How well did you know this?

Not at all

Perfectly

is there an effect of treatment?

Is there a difference between scores measured under the two conditions?

no effect of treatment, the sets of scores are identical
Di = yBi - yAi = 0

we wouldn’t expect all D scores to be exactly zero, but the mean of all D scores should approx to 0

null hypothesis: H0: muD=0

How well did you know this?

Not at all

Perfectly

how to test null hypothesis

determine the probability that observed sample mean would have been obtained from a population where muD=0
- p value is the probability of obtaining observed data, if H0 is true

How well did you know this?

Not at all

Perfectly

2 approaches to test H0

use sample data to obtain a sampling distribution (similar to DSM) with a mean of D-bar
- determine location of muD=0 (H0) within this distribution
- calc 95% CIs
position the same sampling distribution with a mean of muD=0 (H0)
- determine location of observed D-bar within this distribution
- hypothesis testing

How well did you know this?

Not at all

Perfectly

GLM for D scores

GLM is:
Di= D-bar + error

based on D scores, not raw scores

How well did you know this?

Not at all

Perfectly

bootstrapping 95% Cl for muD

use observed D scores to generate an infinite hypohtesis population of D scores
randomly sample n=16 D scores (match original sample size) from pop and calculate D-bar
repeat many time, generating new random sample and calculating D-bar
generate a distribution of D-bar
- instead of distribution of sample means (DSM), this is the distribution of mean differences (DMD

How well did you know this?

Not at all

Perfectly

bootstrapping 95% Cl in R

boot.t.test ()
- calculates standard deviation of DMD (standard error)
- defines a precise p value for probability of obtaining observed data, if null hypothesis is true

How well did you know this?

Not at all

Perfectly

how to test H0

DMD estimates the distribution of all sample means that would be obtained from a population that matched our sample

we can located 0 in the distribution and focus on difference between D-bar and 0

subtract value of D-bar from all values in distribution
- calculates difference between D-bar and 0 but now 0 is the mean of the DMD

If D-bar - muD is small == observed data are likely if H0 is true

If D-bar - muD is large == observed data are unlikely if H0 is true

How well did you know this?

Not at all

Perfectly

standardizing D-bar - muD

convert scores to z scores

zyi= (yi-y-bar)/(Sy)

subtract the mean of the DMD (0) then divide by standard deviation of the DMD

standard deviation of DMD represents the average value of the D-bar - muD
- standard deviation seeks to calculate the average deviation score

standardized values tell us the average difference that we would expect between D-bar and mu=0 if H0 is true
- if the observed difference between D-bar - muD is twice as large as the average difference that would be expected due to sampling variation, if H0 is true

How well did you know this?

Not at all

Perfectly

central limit theorem and t statistics

t= D-bar / (Sd-bar) = D-bar / (Sd/ sqrt n)

if t=2, the observed difference between D-bar and H0 is twice as large as the average difference expected due to sampling variation

How well did you know this?

Not at all

Perfectly

t distribution

start with noramly-distributed pop of D scores with muD=0
- pop has normal distribution, assumption of CLT
- pop can have nay standard deviation as the t statistic standardizes the values of D-bar - mud based on the observed value of Sd

define sample size (n)

randomly sample n D scores from poulation and calculate t based on sample (t= D-bar - muD/S d-bar)

repeat one million times and plot distribution

How well did you know this?

Not at all

Perfectly

what does the t distribution represent ?

Study These Flashcards

all values of t that would be expected based on the sample size if H0: muD=0, is true

what does the shape of the t distribution depend on?

Study These Flashcards

sample size!

when calculating t, Sd is being used to estimate the corresponding population parameter
- estimate is less accurate for samples with smaller n
- t statistics will therefore be less precise for smaller n, resulting in some estimates of t that are unusually large or small
- result is t distribution with smaller n have wider tails

applying the CLT t statistic 95% CI

Study These Flashcards

for 95% CI we modify the same t distribution so it has muD=D-bar and s= Sd-bar
- 95% CI defined by the boundaries of the central 95% of this distribution

95%CI = D-bar +/- (tcrit x Sd-bar)

using R to find 95% CI

Study These Flashcards

find tcrit
qt( p=0.025, df=15) — lower
qt(p=0.975, df=15) — upper
insert values into equation
95% CI= D-bar +/- (tcrit x Sd-bar)

paired test in R

Study These Flashcards

t.test (study 1$b, study1$a, paired= TRUE)
t.test (study 2$b, study2$a, paired= TRUE)
- running a paired t test for differences between a and b

results:
t statistic, df, p=value, 95%CI interval, mean difference (D-bar)

bootstrapping t test in R

Study These Flashcards

boot.t.test( study1$b, study1$a, paired=TRUE, R=100000)
boot.t.test( study2$b, study2$a, paired=TRUE, R=100000)
- from MKinfer package

results:
p value, standard deviation, 95%CI interval

independent groups design

Study These Flashcards

compare two sets of scores generated through independent groups design
ex. x= grouping variable (CTRL, DRUG), y= all measured scores

fitting GLM with independent groups designs

Study These Flashcards

recode xi with 0 or 1
- CTRL= 0
- DRUG= 1

use the model that predicts the value of y:
y-hat= b0 + b1x1

GLM is a linear model
- b0 is intercept, b1 is slope

hypothesis testing

Study These Flashcards

CTRL referred to as 0 group with DRUG as 1 group
H0: no difference between groups
- no difference between the means of the populations
H0: u1-u0= 0
H1: u1-u0 =/ 0

bootstrapping with independent t test

group 0 is expanded to infinite hypothetical pop group 1 is expanded to infinite hypothetical pop n0 scores are sampled at random from hyp pop 0 n1 scores are sampled at random from hyp pop 1 calc means of each sample, then calc difference between sample means repeat many times then plot distribution of the difference between sample means (DDSM)

CLT and independent t tests

NHST: mean of t distribution is 0 standard deviation of t distribution is the standard error - have to standardize the observed value of (y-bar1 - y-bar0) by diving by standard error

paired vs independent t statistic equation

paired: t= (observed difference between Dbar and H0)/ (average difference between D-bar and H0 expected due to sampling variation) = D-bar -muD/Sd-bar independent t= (observed difference between (y-bar1-ybar0) and H0)/ (average difference between (y-bar1-ybar0 )and H0 expected due to sampling variation) = (y-bar1-ybar0)- (mu1-mu0)/ (S(y-bar1-ybar0))

s(ybar1-ybar0)

with paired t statistic, we had one set of scores (D) with independent there's variation in (y-bar1-ybar0) due to variation in both y1 and y0 variance sum law: if we sum multiple values of y, the variance of the sum will be the sum of the variances of each value of y - variance of (y-bar1-ybar0) will be the sum of the variance of y1 and y0

independent t statistic

t= (ybar1-ybar0) - (mu1-mu0)/ sqrt( (Sy1^2/n1) +(Sy0^2/n0))

Welch's t statistic

t=(y-bar1-ybar0) / sqrt( (Sy1^2/n1) +(Sy0^2/n0))

pooled variance Sp^2

according to H0, there's no difference between our two samples - samples came from the population if two samples came from same population, Sy1^2 and Sy0^2 will both estimate the same parameter of the population variance generate a single sample static to estimate population variance

pooled variance equation

s2p = Sum(y1- y-bar1)^2 + Sum(y- y-bar0)^2/(n1-1) + (n0-1)

two sample vs welch's t-test

two-sample t test is appropriate if samples have similar variance welch's t test is appropriate if sample have very difference variances - if using Welch's t test, the estimate of (sigma^2) will be inaccurate, so t statistic will be more prone to error - to compensate the Welch's t test uses t distributions with fewer df than two-sample t test to convert test statistic to a p value - this makes it more difficult to achieve statistical significance, counteracting the increased type I error rate that would result from the increase potential error

independent test in R

t.test( y~x, data= independent) - (outcome ~ predictor) this is the formula - default, R runs the Welch's t test (doesn't assume equal variances) to run regular two group t, modify var.equal argument - t.test( y~x, data= independent, var.equal=TRUE) bootstrap - boot.t.test (y~x, data= independent, R=100000)

t tests Flashcards

(34 cards)