RIP final Flashcards
How to proceed with answering the question: Is there a difference between the mean resting heart rate of men and women?
The first step is calculating the difference between the two means. We must transform this distance into a relative distance (t-statistic). It allows us to compare the difference to a standardized distribution (the t-distribution). We calculate the test statistic using the formula for t. When we have the value of t, we use p-value to measure how extreme the difference is.
What is the formula for the t-statistic?
observed difference/standard error for the difference in the two means
(M1 - M2) / SE(M1 - M2)
Once we have the value of t, what do we use to measure how extreme the difference is?
p-value
conditions of causality
- covariance
- temporal precedence
- internal validity
internal validity
Alternative explanations for the relationship should be ruled out
randomized experiment
A research design where:
▪by randomization, groups can be assumed to be similar
▪one variable is manipulated(varied) by the researcher
▪the researcher measures the effect of this manipulation on another variable (the outcome)
confounding variable
A second variable that happens to vary systematicallyalong with the intended independent variable. This variable is therefore an alternative explanation for the results
internal validity
asks if groups were comparable at the beginning of the experiment, with respect to the dependent variable and other dependent variables (observed and unobserved). If, for some reason, the groups turn out to be not comparable at the start of the experiment, we speak of a selection effect
selection effect
Crucial question: how were the groups created. To reduce selection effects, groups must be formed using random assignment. for some reason, the groups turn out to be not comparable at the start of the experiment, we speak of selection effect.
goal of random assignment
making sure that: the mean and variance in scores, on all variables, measured and unmeasured, are similar for both groups at the onset of the study
randomization issues
contamination
contamination in randomization
▪Participants in the experimental group communicate with participants in the control group
▪Participants do not adhere to the treatment
▪Influence from researcher(s)
PICO
The identifier of an experimental research question
Population
Intervention
Comparison
Outcome
what do researchers use when comparing mean scores of two independent groups?
independent sample t test
standard error for difference in means
contains the group sizes (n1and n2) and spread in scores in both groups (SD1and SD2)
With the t-test we consider the relative difference between the groups, using:
*The mean difference: M1–M2
*The spread in scores in both groups:SD1and SD2
*The group sizes: n1 and n2
the idea behind the test statistic t
When a lot of samples are drawn from a population in which H0is true, The difference between the sample means will often be near zero. So, t will often be near zero, too. Values of t that are far from zero will be found less often.
what is the standard error of t dependent on?
Group sizes (n1and n2) *Variation in scores in both groups (SD1and SD2)
as standard deviation increases, standard error
also increases
as n increases, standard error
decreases
overall the test statistic is dependent on
- relative difference in means
- standard deviation pooled (weighted average of sd in sample 1 and sd in sample 2)
- and sample size per group
a larger diference in means what for the t value
larger t
more variation in scores means what for the t value
smaller t
larger samples means what for the t value
larger tr
randomization
- key of true experiment
- observed and unobserved factors are equally likely in both groups
- transparent, reproducible
- allows causal claims
between subject design
When participants are divided into different groups and each groups receives different treatment. The data is then compared between groups
within subject design
When all participants receive all different treatments (one after the other, possibly randomized in order). We first compare the data within each person
how does a pretest-posttest design compare to posttest
can serve as a randomization check, correction for differences, and can track changes. in just a posttest design, we would not know if/how the groups differed at the beginning.
disadvantage of the pretest-posttest design
learning effect
solomon four group design and advantages/disadvantages
both prettest-posttest and just posttest. can solve unequal groups at the beginning and check for learning effect. however, can be highly costly.
repeated measures design
where the same participants are measured multiple times under different conditions or at different time points. This allows researchers to examine changes within individuals, reducing variability and the need for a large sample size.
counterbalanced measures design
A research design used to control for order effects in repeated measures studies. Participants experience all conditions, but the order of conditions is varied across participants to prevent biases from practice, fatigue, or carryover effects.
quasi-experiment
Research designs that evaluate the effect of an intervention or treatment without random assignment. Instead, groups are naturally formed or pre-existing, making them useful in real-world settings where randomization isn’t feasible.
interrupted time series design
A quasi-experimental design that measures an outcome variable repeatedly over time, both before and after an intervention or event (the “interruption”). It evaluates changes in trends or levels caused by the intervention, making it useful for analyzing the effects of policies, treatments, or external events.
field experiment
An experiment with a close simulation of the conditions under which the process under study occurs or in a natural settin
threats to internal validity
design confounds
selection effect
design confounds
A second variable that happens to vary SYSTEMATICALLY along with the intended independent variable
▪This variable is therefore an alternative explanation for the results
threats to internal validity in experimental design
▪Design confounds
▪Selection effect
▪Contamination
▪Learning effect
▪Maturation
▪History
▪Regressing to the mean
▪Attrition
▪Testing
▪Instrumentation
threats to internal validity in all research
▪Observer bias
▪Demand characteristics
▪Placebo effect
Observer bias
When the researcher has certain expectations and is influenced by this in assessing the participants/ interpreting the result
Deman characteristics
When the participants realize what the study is for and therefore start to behave differently (in the expected direction
Placebo effect
When participants make progress because they believe they are receiving an effective treatment
Maturation
Is it the manipulation or the development (aging, maturing) that caused the differences?
Observed differences between the pre- and post-measurement could arise from natural developments of the participants, when participants’ characteristics change as part of a natural process.
History threats
Is it the manipulation or external events causing the differences?
Not only natural changes of participants are a source of influence, but external events as well - events that are not necessarily related to the study.
Regressing threats
Is it the manipulation or the natural “shifting” that caused the differences?
Regressing to the mean can occur when the participants show extreme values (on average) at the start of the experiment. At a later time, values are expected to be shifted towards the ‘normal’, less extreme, mean value.
Attrition threats
Is it the manipulation or the drop-out of a group of participants that caused the differences?
When participants drop out during a study, the outcome can be affected by this. This is primarily a problem when the people that quit the study are different from the people that do not.
Instrumentation threats
Is it the manipulation or the new instrument that caused the differences?
When the instrument measuring the dependent variable changes during the experiment, the results are affected.
What are posible explanations if no effect is found after an experiment
weak manipulations
power problem (there is an effect, but too few participants to detect it)
no effect (there really is no difference in the population
how is the null hypothesis protected in NHST?
by making the chance of making a type one error small (the significance level)
chance of inverse of a type 1 error
power: 1-B
power
chance of correctly rejecting H0. measures the chance that an existing difference in the population will be found by the sample data and the statistical test`
what happens to power when alpha increases?
power also increases. By increasing alpha (the threshold for rejecting the null hypothesis), it becomes easier to detect a true effect, which increases the likelihood of rejecting the null hypothesis correctly, thereby increasing power. However, the chance of making a type 1 error also increases. Researchers need to find a balance between a small value of aand high power
factors power is influenced by
The sample size
The size of the difference in the population
The level of significance
The spread (or variability) in the measured scores
The choice of the statistical technique
type two error
A type II error is that the null hypothesis is not rejected, while the null hypothesis is not true.
when spread in scores decreases, what happens to power?
power increases
4 principles which are the basis of integrity in research
Reliability, honesty, respect, accountability
major violations of scientific integrity
fabrication - making up data, deliberate
plagiarism - copying other people’s work, deliberate
data falsification - not reporting certain findings, adjusting data, misinterpreting it, all deliberately
publication bias
absence of non-significant effects leads to bias towards large effects
Causes of questionable research practices (QRP)
Scientific journals would like to publish interesting/innovative results, which attracts more readers AND researchers need to publish enough to make a career
p-hacking
things like:
*Removing outliers to make a difference significant
*Add a few more participants to make results significant
*Run a different analysis than planned
HARKing
hypothesising after results are known: in hindsight, formulating hypotheses and pretending that they were the main focus of the research all alon
Solutions to questionable research practices
post-publication peer review
retraction
pre-registration of aims and intended methods and expectations
replication as a standard part of research
Cohen’s D
Used to describe the size of a difference
Measure of relevance; expresses difference between two means in the number of standard devaitions
(M2-M1)/SDpooled
SD pooled
Weighted average of SD1 and SD2
confidence interval
another way to describe the size of the difference between the two groups. a range of plausible probable values based on sample data
width of confidence interval depends on
- Sample size (smaller standard error –> narrower interval)
- Spread/variation in scores in population (means greater spread of scores in sample, so more uncertainty –> wider interval)
- Chosen confidence level (95% confidence level widely used - more certainty, wider interval)
Four parts to evaluate statistical validity
- significance is determined based on test statistic t and the p-value
- relevance is assessed using a measure of effect size, such as cohen’s d
- accuracy is assessed using a confidence interval
- suitability of statistical test is assessed by checking the assumptions
how is effect size measured for regression analysis
R squared
how is effect size measured for chi squared
Cramer’s V
Three claims
Frequency claim
Association claim (correlation and regression studies)
Causal claim (best made in context of randomized experiments)
Four validities
Construct
Internal
External
Statistical
Assessing statistical validity
Sig (det by p value)
Relevance (assessed using effect size)
Accuracy (assessed using confidence interval)
How is suitability of a statistical test assessed
- check assumptions
- check if hyp match expectations
- check if results match hypotheses
assumptions of t test
- random sample
- dependent variable is of interval or ratio measurement level
- two groups are independent
- scores in both groups are normally distributed
- scores in both groups have equal spread
Violating these assumptions leads to lower statistical validity
How can we check Assumption 1 of t-test
- read methods section of article; how did researchers select participants?
if sample is not random:
- be cautious interpreting results because random sample ensures independence of observation
How can we check Assumption 2 of t-test
- methods section
- how are constructs operationally defined? is it plausible we can interpret in interval/ratio level?
- ig you have enough levels for ordinal, people won’t bother (eg aggression)
What if DV of a t-test is not interval or ratio (or ordinal)? eg answers to yes/no questions
Solution: use a statistical test for categorical variables (the chi-squared test of homogeneity)
Chi-squared test of homogeneity
Q example: is the distribution of answers of people with treatment the same as the distribution of answers of people without?
- two independent samples (like t-test)
- DV is categorical (unlike t-test)
- Used to determine if the distribution of a categorical variable is the same in two groups, can be used with more than 2 groups
Chi-squared test of homogeneity hypotheses
H0: distribution of answers in control is equal to distribution of answers in treatment
H1: distribution of answers in control is different from the distribution of answers in treatment
How can we check Assumption 3 of t-test
- Read Methods Section of an article
- Are the participants randomly assigned to two separate groups?
- Is there a link between the measurements in the two groups?
What if two groups are not independent (assumption 3 of t-test)
Solution: conduct a t-test for dependent samples
How can we check Assumption 4 of t-test
Independent sample t-test: two histograms, 1 of scores in control group and 1 of scores in experimental group
Paired sample t test: make 1 histogram of difference scores
How can we check Assumption 5 of t-test
Can use side-by-side box plot and observe the spread of the arms
- graphical checking is preferred
Can also use one of the formal t-tests for equal variances (significance means unequality of variances)
- Levene’stest
- Brown-Forsythe test
- F-max test
What to do if the equal variance assumption (assumption 5) is not satisfied?
Use an alternative called Welch’s test
The t-test we use under the assumption of equal variances has more power, so that option is preferred
what do we do to compare the distribution of a categorical variable between two (or more) groups
Use the chi-squared test of homogeneity to test if the distributions are homogeneous
the steps to measuring a theoretical concept
theoretical concept –> conceptual definition —> operational definition —> variable
correlation is used for
measuring strength and direction of linear relationship
regression is used for
describing the linear relationship with an equation and making predictions using this equation when only data on the independent variable is available
Least squares regression is a technique used for
finding the equation of the line best fitting to the data
Residuals
Residuals are the difference between the observed value of Y and the predicted value of Y (= point on the line). When a line fits the data well, the residuals will tend to be small. the equation with the smallest sum of squared residuals is the winner!
Root Mean Squared Error, or Standard Error of the Estimate in logistic regression
the standard deviation of the residuals.
roughly, the average error we make when using the regression equation to make predictions
coefficient of determination is
R squared
What does R squared tell us in a regression model
how much of the variation in Y can be explained by the linear relationship with X. percentage variance explained.
What are the two tests we can use to find out if the linear relationship is a significant relationship in the regression model
option 1: test for the slope
- we can test if the slope is significantly different from 0, using the t-test
option 2: test for explained variance
- to test if the model explains a significant proportion of the variation, we can test to see if the proportion of the variation that is explained by the model, is significantly greater than 0, using the F-test.
beta (standardized) coefficient
measures the change in Y with one SD increase in X
the assumptions of least squares regression
- linear relationship (check using scatter plot)
- interval or ration measurement level
- no outliers (check using residual plot)
- residuals are normally distributed
- homoscedasticity (spread around regression line is independent of the value of X)
adding more independent variables to a predictor always….
- explains more of the variation in the DV (so higher R squared)
- reduces the average prediction error (so SE will decrease as accuracy increases)
what to be careful of when removing variables from an MLR model
do it one at a time; never remove multiple variables at once based on the t-test!
Principle of p value in null hypothesis significance testing
Given the null hypothesis is true, what is the chance of observing the data we observed
Principle of Bayesian testing
Given the data we observed, what is the chance the null hypothesis is true?
what does the bayes factor measure
How much more does the observed data support the null hypothesis as compared to the alternative hypothesis
relative support for null hypothesis, as measured by
support in data for H0/support in data for H1
What does a Bayes factor of 5 mean
the support in the data for H0is 5 times greater than the support for H1
what does BF01 measure
support in the data for H0/support in the data for H1
what does BF10 measure
support in the data for H1/support in the data for H0
How do we interpret a BF01 of 0.4
the support in the data for H0is 0.4 times greater than for H1
but this doesn’t really make sense.
so in this case we flip the Bayes factor so that
B10 = 1/0.4 = 2.5, so the support in the data for the alternative hypothesis is 2.5 times greater than for the null hypothesis
confidence interval in NHST
Interval estimate to give the reader an idea of the size of the effect
interpretation of credible interval in Bayesian testing
Given the evidence in the data, the mean score of condition A has a 95% chance of falling between x and y
Results of the reproducibility project
In almost all original studies the null hypothesis was rejected (had a p-value smaller than .05
but only a third of the replication studies were able to reject the null
effect sizes were only half as large in the replications compared to original studies
mission of open science research
increase the openness, integrity, and reproducibility of scientific research”
everyone should have access to this scientific knowledge*everyone should be able use it for the benefit of science/ societ
in open science, researchers are…
working digitally
*collecting enormous amounts of data *able to easily share data online
advantages of open science
Increases citations
increases visibility of academic research increases reusability of academic research results
disadvantages of open science
the range of high-quality, fully open access journals is still limited
the number of available reliable journals and articles varies per discipline
Quality and reliability of open access journals
FAIR principles for how data should be stored
Findable
Accessible
Interoperable
Reusable
Following FAIR guidelines leads to
a greater efficiency of the research process, because new research questions do not always require the collection of new data because suitable data are already available
*better reproducibility and greater reliability of research
A good data management plan leads to
FAIRness of data
adv and disad direct replication
adv: easy to compare
disad: problems with internal validity in original research will still be prese
adv and disad conceptuala replication
adv:
- ability to improve design
- increase internal validity
disadvantage:
- not as easy to compare
adv and disad replication plus extension
adv: Possibility to examine additional research question
disad: Not as easy to compare