Critical thinking Flashcards
TEACHING BLOCK 1: week 1
Null hypothesis
Which significance levels is needed to reject null hypothesis
- states that there is no difference / significance
- significance level of 5% (calculated as p=0.05) =
can reject the null hypothesis
=
- if the P (probability) value is less than 0.05 = reject the null hypothesis
- if P is greater than or equal to 0.05 = accept the null hypothesis
- the p value tells you how likely something is to be not true
If P>0.05 ?
- P>0.05 lies within the 95th centile
- Not significant
- there is no difference between the mean of a
sample and the population mean
What does the mean + median show
Mean tells us the proportion of data
median reflects the distribution
- Standard deviation
- Statements of Probability + confidence intervals
- check formula for SD
2.
- x̄ ±1.96 x SD ≈ 95%
- x̄ ± 3 x SD ≈ 97%
x̄ = mean
when Hypothesis testing do we prove/ disprove?
- Rather than PROVE (+) something happens….
- We need to DISPROVE (-) that something DOESN’T (-) happen
- Thus making it LIKELY (-) to have occurred- although that’s only a probability
EG:
- Hypothesis: aggression between men + women = different
- Null hypothesis: aggression between men+ women = not different
= We need to disprove that our sample is not different
Try and disprove the n hypothesis in order to invalidate it = if u can’t, probability is, it is accurate.
standard error of the mean (SEM)?
formula?
- SEM = a measure of how much variation there is likely to be between different samples of a population and the population itself
- SEM = standard deviation / √ number of samples
- Data in normal distributions (bell-shaped curves) is represented as mean + the deviations around it.
- the mean of a sample has a standard errorWhat range do u expect 95% of the sample means to fall in?
- 95 % of the data will fall within mean +/- 2(SEM)
- That means that 5% will be outside it (2.5% at each end)
EG
- if mean = 50 and SEM = 2, then.. “mean ± 2(SEM)” would be:
- 50 ±2 (2) = 50±4
- =95% data would fall within the range of 46 - 54
- What’s the z-score that corresponds to the critical value for a 95% confidence level in a normal distribution?
- How does this apply to calculating a confidence interval for a normally distributed data set?
- 1.96
= 95% of the data points in a normally distributed data set lie within ±1.96 standard deviations of the mean
- Z score of 1.96 represents 95% cutoff point of normality
- A sample mean that departs by more than 2x(1.96) its standard error from the population mean would be expected by chance in about 5% of the samples
If the difference between means for population 1 + population 2 is greater than 1.96x the SEM (p < 0.0.5)
do u accept or reject null hypothesis?
- reject null hypothesis
Either an unusual event has occurred or the null hypothesis is incorrect
As difference between the means is greater than
1.96 xSEM = result is statistically significant at
𝑝<0.05 = reject null hypothesis
week 2
Steps of Planning + conducting a study
What’s the scientific method?
- Develop the research question(hypothesis)
- Decide what to measure and how to measure it
- Collect the data
- Analyse the data
- Interpret the results
- Make observations
- Think of interesting QS
- Formulate hypothesis
- Develop testable predictions
- Gather data to test predictions (refine/Accept/ alter/reject hypothesis)
- Develop general theories
Types of data
Quantitative?
Categorical?
conversion of quantitative to categorical data
- Quantitative = How much
EG: age, blood pressure,blood group is AB, number of kids in a family, weight,height - Categorical = What Type
EG: car types, genders, colours
Converting data:
* Height –> Tall/short
* Weight –> Anorexic/ normal/overweight/ fat/obese
* Blood pressure –> Hypertensive/normotensive
However- categorising a continuous variable reduces the amount of information available
week 3
Sample Standard deviation?
Population standard deviation?
Population- A measure of how spread out the data points are from the mean
sample SD = √ Σ (𝑥i – x̄)^2 / (n-1)
xi: each data value
x̄: The sample mean
N: The total number of observations
[sample SD= use when ur data is a sample taken from a larger population]
Population SD = √Σ ( 𝑥i – μ)^2 / N
𝑥i: each data value
μ: The population mean
N: The total number of observations
[Population SD= use when u have data for the entire population]
standard deviation symbol = σ
CALCULATING SD with
frequencies
Frequency SD = √ Σf (𝑥 - x̄)^2 / ΣF
f: Frequency of each data point (how often each value occurs)
𝑥: Each data value
x̄: Mean of the data
∑f: Total frequency (the sum of all frequencies)
EG if calculated SD was 8796 amd the mean was 5,700 = 5,700 +/- 8796 [the range around the mean= most data lie within the SD 8796 above/below the mean]
Statements of Probability and confidence intervals
x̄ ±1.96 x SD ≈ 95%
= used to calculate a 95% confidence interval for a normal distribution
x̄ ±3 x SD ≈ 99.7%
= refers to the percentage of data points that fall within three standard deviations (±3×SD) of the mean in a normal distribution
week 4 [experimental design]
Accuracy v Precision?
- Accuracy: measure of how close calculated values are to the accepted standard true value (trueness)
- Precision is the closeness of 2 or more measurements to each other
- precision is the resolution of the representation, typically defined by the number of decimal or
binary digits.
Precision
It consists of 3 levels:
-
Intra-assay precision
- Repeatability
- Describes the precision of within-run replicates (intra-assay precision)
- It expresses the precision under the same operating conditions over a short interval of time. -
INTER-ASSAY PRECISION
- Intermediate precision
- Intermediate precision expresses within-laboratories variations: different days, different analysts, different equipment, etc -
Reproducibility
- Reproducibility expresses the precision between laboratories (collaborative studies, usually applied to standardization of methodology).
- Describes the precision of between-run replicates
Inter-observer variability?
Intra-observer variability?
Inter-observer variability: Differences in measurements made by different people observing the same thing.
Intra-observer variability: Differences in measurements made by the same person observing the same thing at different times.
- Normal distribution?
- Mean?
- Standard deviation
- Normal distribution (bell-shaped curves) are characterised by locality/centrality and spread dispersion scatter
-
Mean – measures centrality (position of curve)
= the sum of measurements/number of
measurements - Standard deviation – measures dispersion scatter (shape of curve)
SD = √ Σ (𝑥 – x̄)^2 / (n-1)
x: each data value
x̄: The sample mean
N: The total number
SD versus SEM
- SD is the measure of variability of the observations about the mean
- SEM- measure of the precision of an estimate of a
population parameter - Because SEM decreases as number increases
SEM = standard deviation / √ number of samples
week 5
Effect of Outliers
- values which are clearly ‘out’ or ‘wrong’ compared
to others in a set.
-
- Usually arise from gross or systematic error.
- Distort mean value or calibration curve.
- Increase standard deviation
Calculating mean and
SD including outlier
Reject if outlier outside mean +/- 1.96 x SD
EG, if outlier is 19..
Including outlier: Mean = 15.86, SD = 1.57
15.86 + 1.96 x SD = 18.94
15.86 - 1.96 x SD = 12.78
= Can reject 19.0 as outlier as it lies outside
What are Z Scores?
- This is the number of standard deviations distance from the mean
- High Z score = far from the mean
- Low Z score= close to the mean
How to convert data into a z score?
How to calculate where your data might fit into
a normal curve?
-
Z score = value - x̄ / standard deviation
z = (x-μ) / σ
x=value μ=mean σ=SD - To calculate where your data might fit into
a normal curve…
x̄ ± Z value x standard deviation
Dixon’s Q-test
CHECK FORMULA
- xs is suspected outlier
- xc is the closest value to xs
- xbiggest - xsmallest = range including outlier
–> Compare Q with value from table for given CL.
–> If Q > table then value is an outlier
Problems with Detecting Outliers
- Removing data – always risky
- Selection of outlier may be altered by
subsequent measurements. - Only valid if single outlier – especially for small
data sets - User has to identify potential outlier.
- Q-test mathematically simpler. More clear cut.
- 1.96 x SD test doesn’t need the look-up table
week 6
Formula to calculate CI (confidence interval) from z-score?
CI = x̄ ±Z × (σ/√n)
x̄: The mean value
Z: The z-score appropriate for the confidence level
σ: The standard deviation
n: The sample size
week 7
Research (/alternative) hypothesis?
EG
Null hypothesis?
EG
- research hypothesis = states an expectation to be tested aka: alternative hypothesis [H_a]
Tomato plants exhibit a higher rate of growth when planted in compost rather than in soil
- Investigator derives a statement that is the
opposite of the research hypothesis = null hypothesis (in notation: H 0) = states there will be no difference
Tomato plants do not exhibit a higher rate of growth when planted in compost rather than soil
If significance tests generate 95% or 99% likelihood that the results do not fit the null hypothesis, then..
…. then null hypothesis is rejected, in favour of the alternative.
- You have to prove that something NOT HAPPENING is NOT LIKELY
Falsifiable?
- Falsifiable = something can be logically contradicted by an empirical test.
- A core element of a scientific hypothesis is that it must be capability of being proven false.
- helps improve research - the H0 gets closer to the
reality each time, even if it isn’t correct, it is better than the last H0.
Z-score
- If you’re comparing 2 sample’s Z-score?
Difference in means, divided by combined standard deviations of mean
Check two-sample z-test formula
When would u use the two‐sample
z-test?
- The two‐sample z‐test needs the 2 population standard deviations σ 1 + σ 2
- If don’t have these data = need a different test that uses sample standard deviations
Z test versus T Test
- If you know the standard deviation= Use Z tests
- Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a Student’s T-distribution.
- A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when you are handling moderate-large samples (n > 30).
- T-test is more adaptable than Z-test since Z-test will often require certain conditions to be reliable.
- T-test has many methods that will suit any need.
- T-tests are more commonly used than Z-tests
- Instead of n numbers you use degrees of freedom (n-1)
Paired vs Unpaired T test
Paired
- 2 samples
- Come from same source
- Dependent
Unpaired
- 2 samples
- Different sources
- Independent
Assumptions:
- Two samples come from distributions that may vary in mean value but not in standard deviation
- The observations are independent
- The data are quantitative + normally distributed
The t statistic for comparing two
unpaired groups is…
- calculated by dividing the difference of the means by the standard error of those differences, taking into account how many subjects were in the test
- T value = [x̄1 - x̄2] / √ (S1^2/n1) + (S2^2/n2)
x̄1 = mean value for 1st group
x̄2 = mean value for 2nd group
S1 = standard deviation of 1st group
n1 = size of 1st group
S2 = standard deviationof 2nd group
n2 = size of 2nd group
What degrees of freedom do u use when looking at a table for T test
- n-1 degrees of freedom
- Use it in t-tests
what ‘probability’ would u use for 2 tailed v 1 tailed test
2 tailed t test - p =0.05 (2.5% of area under either side of the bell curve)
1 tailed test - p =0.025
using the t test for comparing unpaired
means the SE diff is derived by pooling the
variances. How?
1: Find Standard deviation in sample 1 s1 + sample 2 s2
2: Multiply the square of the SD of sample1( s12 ) by
the degrees of freedom (number of x-1)
3:Repeat for sample 2
4: Add the two together then divide by the total
degrees of freedom to give a pooled variance
week 9
paired t-test
- If we want to compare two alternative treatments
or experiments - Crossover trials, randomised trials
- Placebo effect
- Simultaneous application
- reduce incidental variation
- compare the size of the difference between two means in relation to the amount of inherent variability (the random error, not related to treatment differences) in the data.
Assumptions:
- Quantitative data
- Differences are independent of each other
week 10
statistical test
- The x^2 (chi square) test
- Tests whether the number of individuals in different categories fit a null hypothesis
- Carried out on numbers only
- All X^2 tests are 2 sided
which degree of freedom to use?
- different for each table
- Df: (number of rows-1) x (number of columns-1)
don’t include the ‘total’ rows/columns