Week 1: Types of Outcomes + NHST Flashcards
Process of hypothesis testing (6)
- We have a question or hypothesis about a population
- Propose a study to gather data
- The design of the study aims to optimise it to gain the most valuable information about the hypothesis
- Collect data
- Use statistics to test the hypothesis base on a model of the data (informed by hypothesis)
- Examine and interpret the results.
There are constraints of getting valueable info for hypothesis from study’s design such as - (2)
duration of the study
how many people you can recruit
What is a sample?
A sample is the specific group that you will collect data from.
What is a population?
A population is the entire group that you want to draw conclusions about.
Example of population vs sample (2)
Population : Advertisements for IT jobs in the UK
Sample: The top 50 search results for advertisements for IT jobs in the UK on 1 May 2020
What does the arrows show in process of hypothesis testing? (2)
show the iterative nature of hypothesis testing,
the question or design to answer the question can be updated on the basis of the statistical analysis from only study so a future study can be devised.
The decision tree above guides you to the appropriate
inferential statistics (statistical approach) to use
What is inferential statistics?
Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population.
What we have measured is called the (2)
outcome variable/
DV variable
The outcome variables influences what
statistical test to use on data you have gathered
Why is there a focus to do parametric tests than others in research? - (3)
- they are more rigorous, powerful and sensitive than non-parametric tests to answer your question
- This means that they have a higher chance of detecting a true effect or difference if it exists.
- They also allow you to make generalizations and predictions about the population based on the sample data.
What question has been covered?
We measure the answers to our question (hypothesis) which
informs on our question (hypothesis)
We can obtain multiple outcomes from the
same people
We can obtain outcomes under
different conditions, groups or both
We specificy what we measure and under what condition we measure them in the
design of the experiment or study
What are the 4 types of outcomes we measure? (4)
- Ratio
- Interval
- Ordinal
- Nominal
What is a continous variables? - (2)
: there is an infinite number of possible values these variables can take on-
entities get a distinct score
2 examples of continous variables (2)
- Interval
- Ratio
What is an interval variable?
: Equal intervals on the variable represent equal differences in the property being measured
Examples of interval variables - (2)
e.g. the difference between 600ms and 800ms is equivalent to the difference between 1300ms and 1500ms. (reaction time)
temperature (Farenheit), temperature (Celcius), pH, SAT score (200-800), credit score (300-850)
What is ratio variable?
The same as an interval variable and also has a clear definition of 0.0.
Examples of ratio variable - (3)
E.g. Participant height or weight
(can have 0 height or weight)
temp in Kelvin (0.0 Kelvin really does mean “no heat”)
dose amount, reaction rate, flow rate, concentration,
What is a categorical variable? (2)
A variable that cannot take on all values within the limits of the variable
- entities are divided into distinct categories
What are 2 examples of categorical variables? (2)
- Nominal
- Ordinal
What is nominal variable? - (2)
a variable with categories that do not have a natural order or ranking
Has two or more categories
Examples of nominal variable - (2)
genotype, blood type, zip code, gender, race, eye color, political party
e.g. whether someone is an omnivore, vegetarian, vegan, or fruitarian.
What is ordinal variables?
categories have a logical, incremental order
Examples of ordinal variables - (3)
e.g. whether people got a fail, a pass, a merit or a distinction in their exam
socio economic status (“low income”,”middle income”,”high income”),
satisfaction rating [Likert Scale] (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”).
Using the term ‘variables’ for continous and categorical variables as - (2)
both outcome and predictor are variables
We will see later on that not only the type of outcome but also type of predictor influences our choice of stats test
Likert scale is ordinal variable but sometimes outcomes measured on likert scale are treated as - (3)
continuous after inspection of the distribution of the data and may argue the divisons on scale are equal
(i.e., treated as interval if distribution is normal)
gives greater sensitivity in parametric tests
What is measurement error?
The discrepancy between the actual value we’re trying to measure, and the number we use to represent that value.
Example of measurement error in psych experiments - (2)
Imprecise measurement: Not accurate to use a stopwatch to measure reaction times that are about 1/2 second
Systematic problem: broken ruler (affects validity)
In reducing measurement error in outcomes, the
values have to have the same meaning over time and across situations
Validity means that the (2)
instrument measures what it set out to measure
refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure
Reliability means the
ability of the measure to produce the same results under the same conditions
Test-retest reliability is the ability of a
measure to produce consistent results when the same entities are tested at two different points in time
3 types of variation (3)
- Systematic variation
- Unsystematic variation
- Rnadomisation
What is systematic variation - (2)
Differences in performance created by a specific experimental manipulation.
This is what we want
What is Unsystematic variation (3)
Differences in performance created by unknown factors
.
Age, Gender,** IQ, **Time of day, Measurement error etc.
These differences can be controleld of course (e.g., inclusion/exclusion of pps setting age range of 18-25)
Randomisation (other approaches) minimises - (2)
effects of unsystematic variation
does not remove unsystematic variation
What is the independent variable (Factors)? ( 3)
- The hypothesised cause
- A predictor variable
- A manipulated variable (in experiments)
What is depenedent variable? (measures)- (3)
- The proposed effect , change in DV
- An outcome variable
- Measured not manipulated (in experiments)
In all experiments we have two hypotheses which is (2)
- Null hypothesis
- Alternative hypothesis
What is null hypothesis?
that there is no effect of the predictor variable on the outcome variable
What is alternative hypothesis?
is that this is an effect of the predictor variable on the outcome variable
Null Hypothesis Signifiance Testing computes the probability the (4)
the probability of the null hypothesis being true (referred as p-value) by computing a statistic and how likely it is that the statistic has that value by chance alone
The NHST does not compute the probability of the
null hypothesis
Null Hypothesis Signifiance Testing Example Z Curve - (9)
- From experiment we have z statistic that is calculated from 2 groups
- Shows normal distribution in this case of z statistic
- The horizontal axis measures how many standard deviations from the mean
- the vertical axis measures the probability of density of z
- On left, mean of grp1 < mean of grp 2
- On right, mean of grp > mean of grp 2
- If two-tailed test/non-directional then spilt probability on two ends of the tail –> more extreme value of statistic for sig
- If directional hypothesis then not spilt alpha value –> leway along x axis and get lower value and still p 0.05
There can be directional and non-directional hypothesis of
an alternate hypothesis
non-directional alternate hypothesis is..
The alternative hypothesis is that this is an effect of the group on the outcome variable
Directional alternate hypothesis is…
The alternative hypothesis is that this the mean of the outcome variable for group 1 is larger than the mean of group 2
Example of directional alternate hypothesis
There would be far greater engagment in stats lecture if they were held at 4 PM and not 9AM
For a non-directional hypothesis you will need to divide your alpha value at
two ends of the tail of normal distirbution
The 3 misconceptions of Null Hypothesis Signifiance Testing (NHST) - (3)
- A significant result means the effect is important
- A non-significant result means the null hypothesis is true
- A significant result means the null hypothesis is false (just give probability that data occured given null hypothesis, doesn’t say huge evidence that null hypothesis is categorically false)
P-Hacking and HARKING is another issue with
NHST
p-Hacking and HARKINGS are the - (2)
researchers degrees of freedom
cchange after results are in and some analysis has been done
P-hacking refers to a
selective reporting of significant results
Harking is
Hypothesising After the Results are Known
P-hacking and HARKING are often used in
combination
What does EMBERS stand for? (5)
- Effect Sizes
- Meta-analysis
- Bayesian Estimation
- Registration
- Sense
EMBERS can reduce issues of
NHST
Uses of Effect sizes and Types of Effect Size (3)
- There a quite a few measures of effect size
- Get used to using them and understanding how studies can be compared on the basis of effect size
- A brief example: Cohen’s d
Meaning of Effect Size (2)
Effect size is a quantitative measure of the magnitude of the experimental effect.
The larger the effect size the stronger the relationship between two variables.
Formula of Cohen’s d
What is meta-analysis?
Meta-analysis is a study design used to systematically assess previous research studies to derive conclusions about that body of research
Meta-analysis brings together.. and assesses (2)
- Bringing together multiple studies to get a more realistic idea of the effect
- Can assess effect siz that are averaged across studies
Funnel plots in meta-analysis can be made to….. values stuides…. (2)
investigating publication bias and other bias in meta-analysis
values studies by their sample size and observe bias
Bayesian approaches capture
probabilities of the data given the hypothesis and null hypothesis
Bayes factor is now often computed and stated alongside
conventional NHST analysis (and effect sizes)
Registration is where (5)
- Telling people what you are doing before you do it
- Tell people how you intend to analyze the data
- Largely limits researcher degrees of freedom (HARKING p-hacking)
- A peer reviewed registered study can be published whatever the outcome
- The scientific record is therefore less biased to positive findings
Sense is where (4)
- Knowing what you have done in the context of NHST
- Knowing misconceptions of NHST
- Understanding the outcomes
- Adopting measures to reduce researcher degrees of freedom (like preregistration etc..)
most of the statistical tests in this book rely on
having data measured
at interval level
To say that data are interval, we must be certain that
equal intervals on the scale represent
equal differences in the property being measured.
To say that data are interval, we must be certain that
equal intervals on the scale represent equal differences in the property being measured. For
example, , on www.ratemyprofessors.com students are encouraged to rate their lecturers on
several dimensions (some of the lecturers’ rebuttals of their negative evaluations are worth
a look). Each dimension (i.e. helpfulness, clarity, etc.) is evaluated using a 5-point scale.
For this scale to be interval it must be the case that the - (2)
difference between helpfulness ratings of 1 and 2 is the same as the difference between say 3 and 4, or 4 and 5.
Similarly, the
difference in helpfulness between ratings of 1 and 3 should be identical to the difference
between ratings of 3 and 5. V
The distinction between continous and discrete variables can often be blurred - 2 examples- (2)
continuous variables can be measured in discrete terms; we measure age we rarely use nanoseconds but use years (or possibly years and months).
In doing so we turn a continuous variable into a discrete
one
treat discrete variables as if they were continuous, e.g., the number of boyfriends/girlfriends
that you have had is a discrete variable. However, you might read a magazine that says ‘the average number of boyfriends that women in their 20s have has increased from 4.6 to 8.9’
a device for measuring sperm motility that actually measures sperm count is not
valid
Criterion validity is whether the
instrument is measuring what it claims to measure (does
your lecturers’ helpfulness rating scale actually measure lecturers’ helpfulness?).
The two sources of variation that is always present in independent and repeated measures design is
unsystematic variation and systematic variation
effect of our experimental manipulation
is likely to be more apparent in a repeated-measures design than in a
between-group design,
effect of experimental manipulation is more apparent in repeated-design than independent since in independent design,
differences between the characteristics of the people allocated to each of the groups is likely to create considerable random variation both within
each condition and between them
This means that, other things being equal, repeated-measures designs have
more power to
, repeated-measures designs have
more power to d
We can use randomization in two different ways depending on
whether we have an
independent and repeated design measure
Two sources of systematic variation in repeated design measure - (2)
- Practice effects
- Boredom effects
What is practice effects?
Participants may perform differently in the second condition because
of familiarity with the experimental situation and/or the measures being used.
What is boredom effects?
: Participants may perform differently in the second condition because
they are tired or bored from having completed the first condition.
We can ensure no systematic variation between conditions in repeated measure is produced by practice and boredom effects by
counterbalancing the order in which a person participates in a condition
Example of counterbalancing
we randomly determine whether a participant
completes condition 1 before condition 2, or condition 2 before condition 1
To reduce unsystematic variation in independent design, we can ensure - (2)
that confounding variables are unlikely to
contribute systematically to the variation between experimental conditions is to randomly allocate participants to a particular experimental condition.
This should ensure that these
confounding variables are evenly distributed across conditions.