Week3 | Matched test Z/t test Flashcards
What are the conditions for the matching Z/t-test.
i. The data is a random sample of independent observations (the before and after samples are not independent of each other)
ii. The variable of interest is quantitative and continuous
iii. The measurement scale is interval or ratio
iv. Either (Z-test) the population standard deviation of the differences is known and the sample mean of the differences is at least normally distributed
or (t-test) is unknown but the population of the differences is normally distributed (at least approximately)
What are the non-parametric alternatives for matching t or z-test?
Sign test and Wilcoxon signed test based on similar assumptions as their one sample versions, except the first assumption is like (i) data is from independent random samples and fourth assumption (iv) distribution of differences is symmetric
non parametric test hypothesis
Null: h0: median = 0
Alternative: hA: median < 0 / >0/ not equal to 0
When alpha less than 5% reject null
With regards to checking normality, what codes returns in the R results give you the assumptions for normality?
skew.2SE
kurt.2SE
if both of the above is larger than |-1| in absolute value then that implies non-normality
With regards to population variances, what 3 assumptions can we infer? This version is called the independent sample version of the t-test. What are its conditions?
i) standard deviation 1 and standard deviation 2 are known:
T-statistic: ((Xbar1 - Xbar2) - mu)/sigma (xbar1 - xbar2)
ii) standard deviation 1 and standard deviation 2 are unknown but equal: population variance is best worked out from the pooled sample/ or pooled variances
t= ((Xbar1 - Xbar2) - mu)/s (xbar1 - xbar2) , t~df, df = n1 + n2 -2
iii) standard deviation 1 and standard deviation 2 are unknown but different: population variances measured separately
t= ((Xbar1 - Xbar2) - mu)/s (xbar1 - xbar2) , t~df,
s(xbar1): (s1^2)/n1 s(xbar2): (s2^2)/n2
conditions:
i. The data consists of two independent random samples of independent observations.
iv. Either (Ztest) the population standard deviations, SD1 and SD2, are known and the sample means are at least approximately normally distributed, or (t-test) SD1and SD2are unknown but the sampled populations are
normally distributed (at least approximately).
ii. The variable of interest is quantitative and continuous.
iii. The measurement scale is interval or ratio.
Example question:
Automobile insurance companies take many factors into consideration when setting the rates. These factors include age, marital status, and kilometres driven per year. In order to determine the effect of gender (1: male, 2: female), 100 male and 100 female drivers were surveyed. Each was asked how many kilometres (KMS in thousands of kilometres) he or she drove in the past year.
what test of the independent welch test do we take? (given var1=var2)
What would the hypothesis be? if we wanted to see if male drivers drove more than female drivers
the test would be
ii) The variances are unknown but they are equal
hypothesis:
H0: mu1 = mu2 HA: mu1>mu2
What is a non parametric alternative to the two independent sample Z/t-test for differences between 2 population means?
Wilcoxon rank sum test, also known as Mann whitney test for differences between 2 population medians
i. The data consists of 2 independent random samples of independent observations
ii. The variable of interest is quantitative and continuous
iii. measurement scale is at least ordinal
iv. The 2 sampled populations differ at most with respect to their central locations measured by the medians (identical in shape and spread)
H0: median1 = median2
vs HA: median1 > median2 , median1 < median2, median1 (doesnt equal to) median2
Test statistic: T=T+
What are the differences between the Wilcoxon ranked sum test and wilcoxon signed rank test?
a) The Wilcoxon signed ranks and rank-sum tests are based on the same idea, they only differ in terms of classification. In the Wilcoxon signed ranks test classification is based on the position of each observation relative to the hypothesized median (smaller or larger), while in the Wilcoxon rank-sum test it is based on the values of a grouping variable (like gender in our example).
b) An alternative version of the Wilcoxon rank-sum test is the Mann-Whitney
test. These two tests have different test statistics and sampling distributions,
but they are equivalent and always lead to the same conclusion.
Which is better, the repeated measures design or the independent design?
In general the repeated measures design is potentially more efficient
because (everything else held constant) it reduces the standard error of the
estimator of the difference between the population means (or medians).
However, the repeated measures design also reduces the sample size and
the degrees of freedom, making less likely to reject a false null hypothesis.