Data Analysis: Hypothesis testing and comparing meanss Flashcards
What is data?
The actual pieces of information you collect in your study
What is variable?
measurement which varies between subjects e.g. height or gender (not constant)
How can data be classified?
Into 2 types:
Categorical or numerical
Categorical: can be sorted into groups or categories, use bar charts and pie charts to represent
Can be further split up into:
- nominal values: you can count but not order or measure e.g. sex and eye colour
- Ordinal values: you can count and order but not measure e.g. house numbers and swimming level
How do populations and samples relate to one another?
If your chosen sample is chosen correctly, the sample data can represent the whole population and can be used to draw inferences about the whole population
What is point estimation?
Where the sample data is used to estimate the parameters of a population
statistics - calculated using sample data
parameters- characteristics of population data
How do we choose which average and measure of spread to use?
1 - First look at the type of data you’re looking at (numerical or categorical)
2- If numerical:
- for normally distributed data measure average using mean and spread of data - standard deviation
- for skewed data use median, spread (IQR)
If categorical,
- for ordinal use median (IQR)
- for nominal use mode (no measure of spread) - rare
What is hypothesis testing?
A way for you to test the results of a survey or experiment to see if you have meaningful results
You are testing to see if your results or valid or if they are due to chance
If due to chance then your experiment won’t be repeatable and of little use
Objective way of making decisions or inferences from sample data
What are the two hypotheses you can have?
Null - Ho
- assume that there is no difference/effect/relationship
Research (alternative) hypothesis - HA
- assume that there is a difference/effect/relationship
What are the types of error?
Type 1 - where there isn’t a significant difference but study reports there is (reject null hypothesis)
Type 2 - where there is a significant difference but study reports there isn’t (accept null hypothesis)
Which one is worse depends on the scenario - consider risks of each error
What test do we use to compare means?
T - tests
What are the types of t-tests and when do we use them?
paired test - used for paired data - when we study the same individuals at two different times or under two diff conditions
independent samples t-test - data collected from two separate groups
What does t-test assume?
Assumes normal distribution
How can we check to see if assumptions are met in t-tests and what tests do we carry out if they aren’t?
Independent, you check using histograms of data by group. If data shows not normal distribution then use Mann-Whitney test (non parametric)
For paired t-test, check using histogram of paired differences. If not normal distribution then use Wilcoxon signed rank (non parametric)
What is ANOVA and what are the types of ANOVA?
ANalysis Of Variance
2 MAIN TYPES:
ONE WAY - when you want to test two groups to see if there’s a difference
TWO WAY (with or without replication) - Without replication - when you have one group and you're double testing that same group (e.g. one group before and after medication)
With replication - when you have to groups and the members of those groups are doing more than one thing (e.g. two groups of patients from diff hospitals trying two diff therapies)
What distribution do we use for one way ANOVA?
USed to compare two means from two independent groups using f-distribution
Looks at all the data in the groups together
- looks at all the variance within the groups then looks at overall variation between the groups
When do we use a one way ANOVA?
When you have a group of individuals split up into smaller groups and completing diff tasks
What are the limitations of one way ANOVA?
It tells you that at least two groups are different to each other but doesn’t tell you which groups are different.
For that you need to look at confidence intervals or post hoc tests.
Expand on two way ANOVA?
It’s an extension of one way ANOVA
There are 2 independents - called factors in two way ANOVA
Factors can be split into levels
What is the main effect and interaction effect in 2 way ANOVA?
Results from two way ANOVA will calculate a main effect and an interaction effect
The main effect is similar to one way ANOVA - all factors are considered separately
The interaction effect, all factors are considered at the same time
What are interactions and how do we show them?
Interactions show where there is no difference
For that we have to plot a means/line/interactions plot
What are the assumptions for two way ANOVA?
The population must be close to a normal distribution
Sample must be independent
Population variances must be equal
Groups must have equal sample sizes
How do we check assumptions for two way ANOVA tests and what do we do if they are not met?
Normality - we check using histograms. If not met, then we do a Kruskall-Wallis test (non parametric - doesn’t assume normality)
Homogeneity of variance - check using Levene’s test. If not met then use Welch test and Games-Howell for post hoc
What are post hoc tests?
If there’s a significant ANOVA test (difference is seen) then pairwise comparisons are made
They’re t-tests with adjustments to keep type 1 error to a min.
Most common: Tukey’s and Scheffe’s tests
Hochberg’s G2 better where sample sizes for the groups are very different