Stat Tests Flashcards
What is the use of the one-sample t-test?
Compare mean in sample to known mean
What is the purpose of an independent t-test?
Compare means of two independent samples
What is the purpose of paired sample t-test?
Compare mean from single sample at two points
How is the t-statisitc used in hypothesis testing?
- calculate test statisitc representing question
- compare sample value to sampling distribution under null
- test statistic = t-statistic under t-distribution
What do the numerator and the denominator of the t-statistic represent?
Numerator = difference in means
Denominator = estimate of variability
What does the t-statistic represent?
the standardised difference in means
What are the data requirements for a one sample t-test?
- continuous variable
- known mean to compare to sample mean
- sample of data to calculate sample mean
What is a t-distribution?
continuous probability similar to normal distribution
What are the key parameters of t-distribution?
- degree of freedom
- df = function of n
- n ^ as degrees of freedom decreases t-distribution approaches normal distribution
What is the code for the critical values?
- tibble(
LowerCrit = round(qt(0.025, 39),2),
UpperCrit = round(qt(0.975, 39),2),)
What is the code for a one sample t-test?
t.test(dat$Age, mu=65, alternative=”…..”)
What assumption tests are performed for a one-sample t-test?
- Descriptive statistics
- Shapiro-Wilks test
- QQ- plot
What are the requirements for assumption tests to be valid?
- DV is continuous
- Independence making sure data independent
- Normality -> data sufficiently large, n = 30
What are the three descriptive statistics?
- Skew
- Histogram
- Density plot
What are the general guidelines for skew statistics?
- Skew < 1 = Generally not problematic
- 1 > Skew < 2 = Slight concern
-Skew > 2 = Investigate impact
What are the three parts needed in histogram/density plot?
- ggplot
- geom_density
- labs
What are QQ-plots?
- plots sorted quantiles of one data set against expected data.
Distribution vs Distribution
How do we know if data is concerning on a QQ-plot?
- The dots are deviate away from the line
What is the purpose of the Shapiro-Wilks test?
- checks properties of observed data against properties normally expected from normally distributed data
What does H0 represent in Shpairo-Wilks?
Data came from a sample normally distributed
If we have H0 in a Shapiro-Wilk what do we do?
p-value < α = reject null, data not normal
What are the guidelines for Cohen’s D?
Small < 0.20
Medium = 0.50
Large > 0.80
What are the steps for calculating the standard error difference in independent t-tests?
- Calculate pooled standard deviation
- Use pooled SD to calculate SE of difference
What are the calculation steps for the independent t-test?
- calculate sample mean in groups x1 and x2
- Calculate pooled SD sp
- Calculate SE
- Check you know your n
- Calculate t
How many degrees of freedom is an independent t-test?
n-2
What is the code for indepedent t-test?
res <- t.test(threat$Score ~ threat$Group,
alternative = “less”
,
mu = 0,
var.equal = TRUE,
conf.level = 0.95)
What are the assumptions of the indepedent sample t-test?
- indepedence of observations within and across groups
- continuous variable normally distribution within both groups
What is the homogeneity of variance?
- test comparing variance of two groups
- F-test used
What does H0 mean in the homogeneity of variance?
H0: population variance is equal
What does it mean when ‘p-value < α’ in a variance test?
- rejecting variances differ across groups
What is the code for variance test?
var.test(threat$Score ~ threat$Group, ratio = 1)
Why is the ratio one in a variance test?
H0 means the variance is equal and that equates to 1
H1 the variance doesn’t equal one meaning the variance is not equal
What is used to measure if the homogeneity is violated?
Welch test
What is the difference between the welch test and independence t-test?
Independent t-test uses pooled standard deviation but the welch test doesn’t
What changes occur in the Welch test?
- estimation of SE of difference
- degree of freedom
What is the difference in the R code for the independent t-test and welch test?
in independent Var.equal is TRUE
Welch var.equal is FALSE
What are the calculation steps for paired t-test?
- calculate difference in difference scores for individual di (T1 and T2)
- Calculate mean of difference scores dbar
- calculate the sd of difference scores
- check n is known
- calculate SE of mean difference SEdbar
What is the R code to make data wide?
exam_wide <- exam %>%
pivot_wider(id_cols = ID,
names_from = time,
values_from = stress)
head(exam_wide)
What type of data does chi-squared word with?
- categorical data
- count data
What is chi-squared used for?
checks whether data grouped according to expectations
- compares frequencies across categories in data
How is chi-squared test similar to t-tests?
- compute test-statistic
- locate test-statistic on distribution reflecting probability of each test statistic value, given H0 = true
- Probability associated w test statistic small enough results are significant
How is Chi-squared and t-tests different?
chi-squared not computed using sample size, uses number of groups within data
List 3 characteristics of the chi-squared distribution?
- number of comparison groups increases, the distribution curve flattens
- chi-squared distribution begins at 0
- distribution can only have positive value
What direction is chi-squared p-value computed in and why?
p-value computed right-tailed as probability observing chi-squared statistic as big or bigger than one obtained
What is the data requirement for chi-squared tests?
- variables must be measured at ordinal or nominal level
What are the assumptions for a chi-squared test?
- expected counts greater than 5
- observations must be independent = mutually exclusive
What are the two types of chi-squared tests?
- goodness of fit
- test of independence
What does the goodness of fit chi-squared test?
- test whether proportions/ relative frequencies of observed values consistent with expected proportions/ relative frequencies
- looks at distribution of data across single category
What is the null hypothesis of a goodness of fit test?
expect proportions are equal
What is the alternative hypothesis of a goodness of fit test?
one of the populations not as specified in the null
What is the calculation process for a goodness of fit test?
- calculate expected accounts -> Ei = n x pi
- Calculate the difference between observed and expected
- Square each value
- Divide the numerator by the denominator
- Sum up all the values in the last column
What are residuals and when are they use ?
- Pearsons residuals
- used when results significant and inform which category had the biggest difference
What do residuals mean?
- positive residuals indiciate observed frequency higher than expected frequency
- negative residuals indiciate observed frequency lower than expected frequency
What is classed as an extreme residual and what does that represent?
Values less than -2 = observed frequency much lower than expected
Values greater than 2 = observed frequency much higher than expected
what does a chi-squared test of independence test?
- checks whether two categorical variables from single population are independent of each other
- tests whether membership in variable 1 is dependent upon membership upon variable 2
What is the calculation process for a test of independence ?
- calculate expected counts for each cell for each row or column -> Eij = Ri x Cj / n -> row total x column total / overall sum
- calculate difference between observed and expected
- square the difference
- Divide square values by expected
- sum up all the rows across all of the columns
How is the degrees if freedom calculated in chi-squared?
(rows -1)(columns - 1)
List the three ways to measure effect size for chi-squared?
- Phi coefficient
- Cramer’s V
- Odds Ratios
What are the cut off points for phi coefficient?
small effect = 0.1
medium effect = 0.3
large effect = 0.5
How is the degrees of freedom calculated for Cramer’s V?
min(r-1, c-1) -> use the minimum