week 3 Flashcards

1
Q

Descriptive statistics vs Inferential statistics

A

Descriptive statistics are used to provide descriptions of the population. This is done either through numerical calculations or graphs or tables.

Inferential statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Summary Statistics

A

Summary statistics summarise and provide information about your sample data. This will tell you about the values in your data set. This includes where the average lies and whether your data is skewed.

Summary statistics fall into three main categories:

Measures of location (also called central tendency).
Measures of spread.
Graphs/charts.
Your aim therefore is to present sample data using graphs and descriptive measures to summarise points and characteristics in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential Statistics

A

Inferential statistics is used to make inferences about the characteristics of a populations based on sample data.

The goal is to go beyond the data at hand and make inferences about population parameters.
In order to use inferential statistics, it is assumed that either random selection or random assignment was carried out (i.e., some form of randomisation must is assumed).
There are two things to consider for inferential statistics: hypothesis testing & confidence interval. These are covered in more detail later in this Module.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Retain or Reject the Null Hypothesis

A

Hypothesis test is performed under the assumption that the null hypothesis is true and then we try to disprove it based on the data available.
The sample we have taken from the population either has true mean μ = μ0 or it has a different mean.
Consider the sampling distribution for the sample mean under the null hypothesis, i.e., the population mean μ is equal to μ0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cut off point: 95% area

A

If the sample mean falls within 95% of the middle area, then we say that it is close to the population mean under the null and any differences between sample mean and null hypothesized population mean is due to sampling variability or by chance.

If the sample mean falls either in the lower 2.5% area or in the upper 2.5% area, then we say that the sample mean is so far out that a sample mean this large would rarely occur just by chance when the null is true. So we will conclude that the sample data does not support the null hypothesis and we go with the alternative hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For a lower sided alternative hypothesis (one-sided), the lower 5% of the tail area is the rejection region and upper 95% of the area is the retention region.
T
F

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculating P value when the Population SD known

A

The p-value or the tail area of the observed sample mean can be obtained easily by making a transformation on the sample mean assuming that the null hypothesis is true.
Z score : sample mean -population mean / SE

SE = σ/√n.
The normal distribution (Table A1) is then used to find the p-value or tail area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Calculating P value when the Population SD unknown

A

If the population standard deviation is unknown, the recommended transformation for the sample mean is:
T score : sample mean -population mean /SE

The t-score has (n-1) degrees of freedom.
SE = SD/√n where SD is the sample standard deviation.
The t-distribution (Table A3) is then used to calculate the required tail area or the p-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

P value interpretation

A

If the p-value is greater than 0.05, we retain the null hypothesis.
If the p-value is less than or equal to 0.05, reject the null hypothesis and go with the alternative hypothesis.

If 0.001 ≤ p-value < 0.01, then the results are highly statistically significant.
If p-value < 0.001, then the results are very highly statistically significant.
If 0.05 ≤ p-value < 0.10, then the results are marginally statistically significant.

A general guideline about wording you may see in scientific journals:

“The results are statistically significant” – when the p-value < significance level
“The results are not statistically significant” – when the p-value > significance level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following statements regarding p value equalling 0.001 is/are CORRECT?

This means that if 1000 similar studies were undertaken on the same population, only 1 out of 1000 studies would result in a sample result as extreme as the one obtained in the study is due to sampling variability or by chance.

This means that if 1000 similar studies were undertaken on the same population, only 999 out of 1000 studies would result in a sample result as extreme as the one obtained in the study is due to sampling variability or by chance.

The study result is so rare that a chance factor can be ignored for the difference from the hypothesized value.

The study result is so rare that a chance factor can’t be ignored for the difference from the hypothesized value.

A

The study result is so rare that a chance factor can be ignored for the difference from the hypothesized value.

This means that if 1000 similar studies were undertaken on the same population, only 1 out of 1000 studies would result in a sample result as extreme as the one obtained in the study is due to sampling variability or by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistical Hypotheses

A

A statistical method which aims to analyse if the results of a study are due to chance alone.
p‐values and Confidence Intervals are used to determine whether a result is statistically significant.
A hypothesis may be defined simply as a statement about one or more study populations.
Example: A physician may hypothesise that a new drug will be more effective than a standard drug for reducing pain caused from prostate cancer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Statistical Hypotheses

A

There are two different statistical hypotheses involved in hypothesis testing, namely, null and alternative hypothesis.

Null hypothesis: No difference or association; H0
Alternative hypothesis: There is a difference or association; Ha
The majority of hypotheses in medicine are two-sided hypothesis. i.e.

Null hypothesis: No difference
Alternative hypothesis: There is a difference - in either negative or positive direction
Occasionally a one-sided hypothesis is used. This is when the Alternative hypothesis is only one direction.

Option 1: Negative alternative hypothesis

Null hypothesis: No difference, or positive difference
Alternative hypothesis: Negative difference only
Option 2: Positive alternative hypothesis

Null hypothesis: No difference or negative difference
Alternative hypothesis: Positive difference only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hypothesis testing process

A

Hypothesis testing is the process of deciding whether to accept or reject the null hypothesis based on sample data. Here’s a concise guide:

  1. State the Study:
    • Outline study objectives, importance, and implications.
    • Define null (H0) and alternative (Ha) hypotheses.
  2. Plan the Hypothesis:
    • Clearly state hypotheses and justify the alternative choice.
  3. Check Assumptions:
    • Ensure data follows approximate normality.
    • Confirm random sample selection and independence.
  4. Analyze the Data:
    • Calculate test statistic (e.g., t-score) using appropriate formula.
    • Use tables to find p-value and assess significance.
  5. Discuss Results
    • Interpret summary statistics and statistical significance.
    • Draw conclusions regarding the study population and its implications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

evaluating equality of group variances

A

Evaluating equality of group variances is required when performing a hypothesis test for two groups. The degrees of freedom for T-score (T-test) and the SE depends on the equality of variances.
The three methods of evaluation
Method 1: Present the data in each group on parallel histograms or box-plots and compare the dispersion. If the dispersions are similar, assume equal variances, otherwise assume unequal variances. There is no cut off, so use your judgment.
Note: Large data should be presented on a histogram, whereas small data should be presented on a boxplot.

Method 2: Take the ratio of larger to smaller variances,
i.e., RATIO = (larger SD)2/(smaller SD)2.
If this ratio ≥ 2, assume unequal variances, otherwise assume equal variances.
Note: Standard deviation is the square root of variance.
Method 3: Use a hypothesis test procedure known as the Levene’s test for testing the null hypothesis that the groups have equal variances against the alternative hypothesis that the groups have unequal variances.
If the resulting p-value ≤ 0.05, reject the null hypothesis, i.e., consider unequal variances.
On the other hand, if the p-value > 0.05, retain the null hypothesis, i.e., consider equal variances.

Note: Method 3 is the most appropriate method for comparing equality of variances. There is no strong theoretical backup for Methods 1 and 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

There are two types of errors that are possible with hypothesis testing: type I and type II.

A

If you reduce the risk of Type I errors, Type II errors increase and vice versa.

Retain the null hypothesis when in reality the null is true – CORRECT decision
Reject the null hypothesis when in reality the null is false – CORRECT decision
Reject the null hypothesis when in reality the null is true – Type I error
Retain the null hypothesis when in reality the null is false – Type II error

A common practice is to fix the type I error at some threshold value (e.g. at 0.05), called the significance level of the hypothesis test, and then minimise type II error or maximise power.
The power of a hypothesis test is not committing a Type II error, which is affected by sample size, significance level and the true value of parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the confidence interval?

A

A sample statistic is rarely the same as the parameter. A difference between the sample statistic and the parameter may occur purely by chance or sampling variability. So it is sensible to estimate the parameter by an interval centred on the sample statistic. This interval is called the confidence interval.

The key of obtaining the confidence interval is the sampling distribution for sample statistics. The confidence interval has an associated confidence level, for example 95%, to show how confident we are that this interval contains the parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Calculating confidence interval

A

The confidence interval can easily be obtained using the steps as follows.

Step 1: Calculate the sample statistic for the parameter of interest in the study population.
Step 2: Calculate the margin of error.
(Margin of error = multiplier X SE)
Step 3: Calculate the confidence intervals
Lower boundary = sample statistic – margin of error
OR: Lower boundary = sample statistic - multiplier X SE
Upper boundary = sample statistic + margin of error
OR: Upper boundary = sample statistic + multiplier X SE
The interval in Step 3 is called confidence interval and is likely to capture the parameter in the study population with some level of confidence, say 95%.

18
Q

What is Margin of error?

A
19
Q

The value of “multiplier” as the confidence level increases

A

The value of “multiplier” increases as the confidence level increases and hence the margin of error increases. If the margin of error increases, the CI becomes wider. The value of “multiplier” in the table below is appropriate only for large sample size.

The multiplier values shown in the above table are appropriate for sufficiently large sample sizes and are obtained from the normal distribution table.
For small sample sizes they are obtained from the t-distribution table.
The t-distribution and normal distributions are the same for large samples (df > 120).

20
Q

The width of a confidence interval can be reduced without reduction of confidence level by decreasing the sample size., true or false

A

False
If the sample size increases, the standard error decreases which results in a narrower confidence interval.

21
Q

If the confidence interval includes zero, the difference is

A

If the confidence interval includes zero, the difference is not significant, otherwise the difference is significant, as the diagram below demonstrates.

22
Q

Types of T test

A

The previous example has been “continuous” data with “one” group => one-sample t-test.
The next section of this module looks at “two” groups:

If the two groups are independent (i.e. different people in each group)
We will do an independent t-test (two-sample t-test)
Consider the difference between means
The test differs slightly depending on whether the variance is equal or unequal.
If the two groups are NOT independent (i.e. the same people are in each group)
We will do a paired t-test
Consider the mean of differences

23
Q

T test diagram

A
24
Q

t-Multiplier vs t-Statistic

A

The calculation of 95 % Confidence Intervals and hypothesis tests/p-values go together.

95 % Confidence Interval tells us the magnitude of the difference.
This requires the t-Multiplier.
Hypothesis tests & p-values tells us whether there is a statistical significance or not.
This requires the t-Statistic.

25
Q

Formulas for independent t-test (equal SD)

A
26
Q

Formulas for independent t-test with unequal SD

A
27
Q

Paired t-test

A
28
Q

what is ANOVA ?

A

The extension of the two-samples t-test to three or more samples is known as the Analysis Of variance or ANOVA.

29
Q

Types of ANOVA

A

One-way between groups ANOVA: Determining whether a there is a difference in means in 3 or more groups.
This module will only focus on this type of ANOVA.
One-way repeated measures ANOVA: Determining whether a difference exists when measuring a single group multiple times.
Two-way between groups ANOVA: More complex as it analyses the interaction of two independent variables on 3 or more groups.
Two-way repeated measures ANOVA: Repeated measures structure which includes ad interaction effect (secondary independent variable)

30
Q

Uses of ANOVA

A

As the name suggests, inferences about means are made by analysing variances.
ANOVA test lets you know if a difference exists between the groups.

Further statistical analysis (Turkey HSB test) is required to determine which of the group means are different.

31
Q

true or false :ANOVA cannot tell you which specific groups are significantly different from each other.

A

true

A one-way analysis of variance (ANOVA) helps determine whether there is a statistical significant difference between two or more unrelated groups. ANOVA cannot tell you which specific groups are significantly different from each other.

A secondary post hoc analysis is performed to determine which of the groups differ from each other.

32
Q

ANOVA assumptions

A

Before undertaking an ANOVA test, you will need to check the following assumptions.

The dependent variable is a continuous numerical variable and follows a normal distribution.

Each of the independent variables are independent from each other. Also, samples within each group are Independent from each other.
The variances in each of the groups are the same, therefore there is homogeneity of variances across the groups.

33
Q

F statistic

A

As we already know for testing any hypothesis we need a test statistic. For example, for testing the equality of means of two independent populations we use two samples t test.

Similarly, for testing the equality of means of several independent populations, we use a test statistic known as the F test.

The F statistic is defined as follows:

A large value of F indicates rejection of the null hypothesis and we conclude that the population means have not come from the same population.

Manual calculation of the F statistic is tedious. Therefore, we mostly rely on statistical package such as Graph Pad or SPSS.

34
Q

Post-hoc analysis

A

Once we reject the null hypothesis in an ANOVA test, we want to conduct additional tests to find out where the differences lie.

Many different techniques for conducting multiple comparisons exist; they typically involve testing each pair of means individually.

The most widely used techniques are the Bonferroni t-test and Tukey’s test.
Bonferroni = extremely simple and general but lacks power.

Tukey = the best for all possible pairwise comparisons when sample sizes are unequal.
If we perform a Bonferroni test on the coronary artery surgery and BMI study then we can identify where the difference lies.

35
Q

when non-parametric test is used ?

A

If the type of data is not continuous and the normality assumption is violated, an alternative approach is to use non-parametric tests.
Non-parametric tests are also referred to as distribution-free tests.
These tests have the obvious advantage of not requiring the assumption of normality or the assumption of homogeneity (equality) of variance.
They compare medians rather than means, as a result, if the data has outliers, their influence is negated.
Non-parametric tests can be performed for ordinal as well as discrete data.

36
Q

when to use non-parametric test and its limitations

A

When to use non-parametric tests
Data is skewed, where median is a better representation of central tendency.
You have a very small sample size.
You have ranked data, ordinal data, or outliers that you cannot be removed.

Limitations of non-parametric tests
Nonparametric tests are less powerful than parametric tests, thus you are unlikely to detect a significant effect when one truly exists.
Modification of the hypothesis is required, e.g. you are testing to see if the population medians (c.f means) are the same.

37
Q

Mann-Whitney U Test

A

The test is used when comparing two independent groups called experimental and control, and measurements in each group must be a continuous variable.
Subjects are randomly selected from the population and randomly assigned to two groups.
It is not necessary to have the same number of subjects in the two groups.
This is the non-parametric analogue to the independent samples t-test, and should be used if the distribution of data in both groups or in one of the two groups is non-normal.

38
Q

Hypothesis and Assumptions of Mann-Whitney U Test

A

Hypotheses:

Null hypothesis: the median in each groups in the population are the same
Alternative hypothesis: the population medians are different.
Assumptions:

Groups are independent
Subjects within each group are also independent

39
Q

Wilcoxon Signed-Rank Test for Two Paired Groups

A

The Wilcoxon signed-rank test is used to compare two paired groups.
This is the non- parametric analogue to the paired t-test, and should be used if the distribution of paired differences does not follow the normal distribution.

40
Q

Wilcoxon Signed-Rank Test for Two Paired Groups hypothesis and assumption

A

Hypothesis

The null hypothesis to be tested is that the median difference between pairs of observation is zero in the population
The alternative hypothesis is that the median difference is not zero.
Note that this is different from the null hypothesis of paired t-test which is that the mean difference between pairs is zero.
Assumptions

Groups are independent and subjects within each group are also independent.
The independent variable should consist of “matched pairs”.

41
Q

Kruskal-Wallis Test for 3 or More Independent Groups

A

The Kruskal-Wallis test is a non-parametric analogue to the one way analysis of variance (ANOVA).
This test is used when 3 or more independent groups are compared but data in one or more groups do not follow the normal distribution.
The measurements are expected to be continuous or at least ordinal. Since the samples are independent they can be of different sizes.

Hypothesis
We test the null hypothesis that the samples come from the same population, or from populations with identical medians. And the alternative hypothesis is that samples come from the populations with different medians

42
Q
A