Hypothesis Testing And Formulas In Excel Flashcards
One sample t-test
Used for data with a NORMAL DISTRIBUTION
Example: effectiveness of a medicine. The same group of people are tested, one time before treatment and one time after.
The measurements cannot be carried out at the same time and the data are gathered from the same subjects.
Nullhypothesis: there is no significant difference (the medicine is not effective).
Degree of freedom: (n-1)
In this example that would mean the amount of people participating in the study minus one.
P value: T.TEST (array1, array2, 2,1) Array 1: patients before treatment Array 2: patients after treatment Tail: 2 Type: 1
T value: T.INV.2T
Appropriate statistical parameter: T-value (p-value, degree of freedom)
Two sample t-test
Used for data with a NORMAL DISTRIBUTION
Used for two individual groups with one variable. Example: is there a significant difference in body height between men and women?
First you have to use a F-test to find out which t-test to use. The null hypothesis of a f-test is that the variances of the two datasets are equal variance/variance = 1.
The f-test/f-value is variance of one dataset (the variance with the biggest value) divided by the variance of the other dataset.
The p-value of the f-test: t-test function.
If the f-test gives a value over 5% the variances are equal, and if the f-test gives a value under 5% the variances are unequal.
Equal variance: t-test type 2
Unequal variance: t-test type 3
Nullhypothesis of the t-test: there is no significant difference.
Degree of freedom: count both datasets (different groups) and subtract 2.
P value: T.TEST (array1, array2,2,2) Array 1: height of men Array 2: height of women Tails: 2 Type: 2
Appropriate statistical parameter is T value: T.INV.2T (probability, degree of freedom)
Probability:
Degree of freedom: count both groups - 2
What is the f-value and when do you need it?
You only need the f-value when you have two different samples.
F-value: variance/variance. The biggest value is always on top, f-value is never below 0. If above 5%, variances are equal. Below, it’s unequal.
Variance in excel: Var.S
F.test gives p-value
P-value of two-sided f-test: F.dist.2t
How do you use frequency function?
Create intervals for your data.
Create “bins” for your data. The bins are the last, or highest, number in the intervals.
Example:
Intervals: 0-10, 11-20, 21-30, 31-40
Bins: 10, 20, 30, 40
Select empty boxes for the frequencies, remember to add an extra box for at the bottom of the colon.
Press =FREQUENCY(array;bins). Select you data as the array and your bins for bins. Do not directly press enter, but press instead ctrl + shift + enter.
Create a histogram with your frequency, and change the x-axis to the intervals. Under chart design, press “select data” and select the intervals.
Relative frequency
The frequency in that “class” or category divided by total number of elements
Frequency density
Frequency divided by the width of the “class” or interval.
Relative frequency density
Frequency density divided by total number of elements
Cumulative distribution
Sum the frequencies as you go down the colons.
Integral discriminator
Total element number minus cumulative distribution
Percentile
Function: percentile.inc
Array: data you want to find the percentile of
K= how many percent in decimals. (10% = 0,1)
Correlation (Pearsons) test
Used for data with NORMAL DISTRIBUTION
- usually the work correlation is mentioned in question.¨
1 group with 2 variables. Example: is there a correlation between blood potassium conc. and the length of musculus palmaris longus?
Degree of freedom: n - 2
Remember to only count one dataset if the same group is tested.
Correlation coefficient (r): CORREL function. Not important which dataset is in which array. It is always between 1 and 0. If r = 0, there is no correlation. Determination coefficient (r^2): tells how strong the relationship between the data is.
Appropriate statistical parameter: t-value = (r* root square (degree of freedom)) / (1- r^2)
P-value: T.dist.2t (t-value, degree of freedom)
First column: x, second column: y
Slope: SLOPE (x,y)
Intercept: INTERCEPT(x,y) (intercept is where the y axis is intercepted?)
y: a x + b
a: slope
b: intercept
If 5 is given as x;
y: a * x + b
If 5 is y;
x: (y - b) / a
Wilcoxon signed rank test
The dataset has NON-NORMAL DISTRIBUTION
1 sample and 1 variable. Example: weight before and after diet
Degree of freedom: count - 1
Mean: =AVERAGE
Standard deviation: =STDEV.S
Standard error: ST DEV/ square root (count)
T-value: (mean/ standard error)
P-value: T.DIST (t-value, degree of freedom)
Wilcoxon value: wilcoxon function (array 1, array 2, 2, 0)
TYPE 0
Array 1: weight before diet
Array 2: weight after diet
Tail (?): 2
Type: 0
Mann-whitney U- test
Used for dataset that has a NON-NORMAL DISTRIBUTION
When we have two independent samples. Example: change in headache in subjects given aspirin and subjects given a placebo pill.
Function: Mann-whitney function: (array 1, array 2, 2)
TYPE 2
Chi- square test
For datasets that has NON-NORMAL DISTRIBUTION
For independency:
- two of more groups. e.g. ‘frequency of pulmonary cancer amongst smokers and non-smokers’. (No dependency)
For homogenity:
- two or more groups:
Tests:
- frequency data
- probability chart
- observed frequency and expected frequency. e.g, ‘frequency of wearing glasses among girls and boys’.
- one group (known distribution), frequency data, testing how well the observed frequencies fit the expected frequencies. e.g, ‘testing the normality of frog red blood cells’.
Degree of freedom: (number of rows - 1 )* (number of columns - 1)
Chi-square test: CHISQ.TEST (actual range; expected range)
Actual range: INNER four squares of measured.
Expected range: the calculated expected range of the measured.
Appropriate statistical parameter: Chi-square value: CHISQ.INV.RT (probability; degree of freedom)
Probability: chi-square test
ANOVA
Analysis of variance.
Three or more independent group, separated from each other. One variable has normal distribution.