Exam Revision Flashcards
There are constraints of getting valuable info for hypothesis from study’s design such as - (2)
duration of the study
how many people you can recruit
What is a sample?
A sample is the specific group that you will collect data from.
What is a population?
A population is the entire group that you want to draw conclusions about.
Example of population vs sample (2)
Population : Advertisements for IT jobs in the UK
Sample: The top 50 search results for advertisements for IT jobs in the UK on 1 May 2020
What is inferential statistics?
Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population.
Why is there a focus to do parametric tests than others in research? - (3)
- they are more rigorous, powerful and sensitive than non-parametric tests to answer your question
- This means that they have a higher chance of detecting a true effect or difference if it exists.
- They also allow you to make generalizations and predictions about the population based on the sample data.
We can obtain multiple outcomes from the
same people
We can obtain outcomes under
different conditions, groups or both
What are the 4 types of outcomes we measure? (4)
- Ratio
- Interval
- Ordinal
- Nominal
What is a continous variables? - (2)
: there is an infinite number of possible values these variables can take on-
entities get a distinct score
2 examples of continous variables (2)
- Interval
- Ratio
What is an interval variable?
: Equal intervals on the variable represent equal differences in the property being measured
Examples of interval variables - (2)
e.g. the difference between 600ms and 800ms is equivalent to the difference between 1300ms and 1500ms. (reaction time)
temperature (Farenheit), temperature (Celcius), pH, SAT score (200-800), credit score (300-850)
What is ratio variable?
The same as an interval variable and also has a clear definition of 0.0.
Examples of ratio variable - (3)
E.g. Participant height or weight
(can have 0 height or weight)
temp in Kelvin (0.0 Kelvin really does mean “no heat”)
dose amount, reaction rate, flow rate, concentration,
What is a categorical variable? (2)
A variable that cannot take on all values within the limits of the variable
- entities are divided into distinct categories
What are 2 examples of categorical variables? (2)
- Nominal
- Ordinal
What is nominal variable? - (2)
a variable with categories that do not have a natural order or ranking
Has two or more categories
Examples of nominal variable - (2)
genotype, blood type, zip code, gender, race, eye color, political party
e.g. whether someone is an omnivore, vegetarian, vegan, or fruitarian.
What is ordinal variables?
categories have a logical, incremental order
Examples of ordinal variables - (3)
e.g. whether people got a fail, a pass, a merit or a distinction in their exam
socio economic status (“low income”,”middle income”,”high income”),
satisfaction rating [Likert Scale] (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”).
Using the term ‘variables’ for continous and categorical variables as - (2)
both outcome and predictor are variables
We will see later on that not only the type of outcome but also type of predictor influences our choice of stats test
Likert scale is ordinal variable but sometimes outcomes measured on likert scale are treated as - (3)
continuous after inspection of the distribution of the data and may argue the divisons on scale are equal
(i.e., treated as interval if distribution is normal)
gives greater sensitivity in parametric tests
What is measurement error?
The discrepancy between the actual value we’re trying to measure, and the number we use to represent that value.
In reducing measurement error in outcomes, the
values have to have the same meaning over time and across situations
Validity means that the (2)
instrument measures what it set out to measure
refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure
Reliability means the
ability of the measure to produce the same results under the same conditions
Test-retest reliability is the ability of a
measure to produce consistent results when the same entities are tested at two different points in time
3 types of variation (3)
- Systematic variation
- Unsystematic variation
- Rnadomisation
What is systematic variation - (2)
Differences in performance created by a specific experimental manipulation.
This is what we want
What is Unsystematic variation (3)
Differences in performance created by unknown factors
.
Age, Gender,** IQ, **Time of day, Measurement error etc.
These differences can be controleld of course (e.g., inclusion/exclusion of pps setting age range of 18-25)
Randomisation (other approaches) minimises - (2)
effects of unsystematic variation
does not remove unsystematic variation
What is the independent variable (Factors)? ( 3)
- The hypothesised cause
- A predictor variable
- A manipulated variable (in experiments)
What is depenedent variable? (measures)- (3)
- The proposed effect , change in DV
- An outcome variable
- Measured not manipulated (in experiments)
In all experiments we have two hypotheses which is (2)
- Null hypothesis
- Alternative hypothesis
What is null hypothesis?
that there is no effect of the predictor variable on the outcome variable
What is alternative hypothesis?
is that this is an effect of the predictor variable on the outcome variable
Null Hypothesis Signifiance Testing computes the probability the (2)
the probability of the null hypothesis being true (referred as p-value) by computing a statistic and how likely it is that the statistic has that value by chance alone
Referred as p-value
The NHST does not compute the probability of the
null hypothesis
There can be directional and non-directional hypothesis of
an alternate hypothesis
non-directional alternate hypothesis is..
The alternative hypothesis is that this is an effect of the group on the outcome variable
Directional alternate hypothesis is…
The alternative hypothesis is that this the mean of the outcome variable for group 1 is larger than the mean of group 2
Example of directional alternate hypothesis
There would be far greater engagment in stats lecture if they were held at 4 PM and not 9AM
For a non-directional hypothesis you will need to divide your alpha value at
two ends of the tail of normal distirbution
The 3 misconceptions of Null Hypothesis Signifiance Testing (NHST) - (3)
- A significant result means the effect is important
- A non-significant result means the null hypothesis is true
- A significant result means the null hypothesis is false (just give probability that data occured given null hypothesis, doesn’t say huge evidence that null hypothesis is categorically false)
P-Hacking and HARKING is another issue with
NHST
p-Hacking and HARKINGS are the - (2)
researchers degrees of freedom
cchange after results are in and some analysis has been done
P-hacking refers to a
selective reporting of significant results
Harking is
Hypothesising After the Results are Known
P-hacking and HARKING are often used in
combination
What does EMBERS stand for? (5)
- Effect Sizes
- Meta-analysis
- Bayesian Estimation
- Registration
- Sense
EMBERS can reduce issues of
NHST
Uses of Effect sizes and Types of Effect Size (3)
- There a quite a few measures of effect size
- Get used to using them and understanding how studies can be compared on the basis of effect size
- A brief example: Cohen’s d
Meaning of Effect Size (2)
Effect size is a quantitative measure of the magnitude of the experimental effect.
The larger the effect size the stronger the relationship between two variables.
Formula of Cohen’s d
What is meta-analysis?
Meta-analysis is a study design used to systematically assess previous research studies to derive conclusions about that body of research
Meta-analysis brings together.. and assesses (2)
- Bringing together multiple studies to get a more realistic idea of the effect
- Can assess effect siz that are averaged across studies
Funnel plots in meta-analysis can be made to….. values stuides…. (2)
investigating publication bias and other bias in meta-analysis
values studies by their sample size and observe bias
Bayesian approaches capture
probabilities of the data given the hypothesis and null hypothesis
Bayes factor is now often computed and stated alongside
conventional NHST analysis (and effect sizes)
Registration is where (5)
- Telling people what you are doing before you do it
- Tell people how you intend to analyze the data
- Largely limits researcher degrees of freedom (HARKING p-hacking)
- A peer reviewed registered study can be published whatever the outcome
- The scientific record is therefore less biased to positive findings
Sense is where (4)
- Knowing what you have done in the context of NHST
- Knowing misconceptions of NHST
- Understanding the outcomes
- Adopting measures to reduce researcher degrees of freedom (like preregistration etc..)
most of the statistical tests in this book rely on
having data measured
at interval level
To say that data are interval, we must be certain that
equal intervals on the scale represent
equal differences in the property being measured.
The distinction between continous and discrete variables can often be blurred - 2 examples- (2)
continuous variables can be measured in discrete terms; we measure age we rarely use nanoseconds but use years (or possibly years and months).
In doing so we turn a continuous variable into a discrete
one
treat discrete variables as if they were continuous, e.g., the number of boyfriends/girlfriends
that you have had is a discrete variable. However, you might read a magazine that says ‘the average number of boyfriends that women in their 20s have has increased from 4.6 to 8.9’
a device for measuring sperm motility that actually measures sperm count is not
valid
Criterion validity is whether the
instrument is measuring what it claims to measure (does
your lecturers’ helpfulness rating scale actually measure lecturers’ helpfulness?).
The two sources of variation that is always present in independent and repeated measures design is
unsystematic variation and systematic variation
effect of our experimental manipulation
is likely to be more apparent in a repeated-measures design than in a
between-group design,
effect of experimental manipulation is more apparent in repeated-design than independent since in independent design,
differences between the characteristics of the people allocated to each of the groups is likely to create considerable random variation both within
each condition and between them
This means that, other things being equal, repeated-measures designs have
more power to
, repeated-measures designs have
more power to d
We can use randomization in two different ways depending on
whether we have an
independent and repeated design measure
Two sources of systematic variation in repeated design measure - (2)
- Practice effects
- Boredom effects
What is practice effects?
Participants may perform differently in the second condition because
of familiarity with the experimental situation and/or the measures being used.
What is boredom effects?
: Participants may perform differently in the second condition because
they are tired or bored from having completed the first condition.
We can ensure no systematic variation between conditions in repeated measure is produced by practice and boredom effects by
counterbalancing the order in which a person participates in a condition
Example of counterbalancing
we randomly determine whether a participant
completes condition 1 before condition 2, or condition 2 before condition 1
*
What distribution is needed for parametric tests?
A normal distribution
The normal distribution curve is also referred as the
bell curve
Normal distribution is symmetrical meaning
This means that the distribution curve can be divided in the middle to produce two equal halves
The bell curve can be described using two parameters called (2)
- Mean (central tendency)
- Standard deviation (dispersion)
μ is
mean
σ is
standard deviation
Diagram shows:
e.g., If we move 1σ to the right then it contains 34.1% of the valeues
Many statistical tests (parametric) cannot be used if the data are not
normally distributed
The mean is the sum of
scores divided by the number of scores
Mean is a good measure of
central tendency for roughly symmetric distributions
The mean can be a misleading measure of central tendency in skewed distributions as
it can be greatly influenced by scores in tail e.g., extreme values
Aside from the mean, what are the 2 other measured of central tendency? - (2)
- Median
- Mode
The median is where (2)
the middle score when scores are ordered.
the middle of a distribution: half the scores are above the median and half are below the median.
The median is relatively unaffected by … and can be used with… (2)
- extreme scores or skewed distribution
- can be used with ordinal, interval and ratio data.
The mode is the most
frequently occurring score in a distribution, a score that actually occurred
The mode is the only measure of central tendency that can be used with
with nominal data
The mode is greatly subject to
sample fluctuations and is therefore not recommended to be used as the only measure of central tendency
Many distributions have more than one
mode
The mean, median and mode are identical in
symmetric distribtions
For positive skewed distribution, the
mean is greater than the median, which is greater than the mode
For negative skewed distribution
usually the mode is greater than the median, which is greater than the mean
Kurtosis in greek means
bulge or bend in greek
What is central tendency?
the tendency for the values of a random variable to cluster round its mean, mode, or median.
Diagram of normal kurotsis, positive excess kurotsis (leptokurtic) and negative excess kurotsis (platykurtic)
What does lepto mean?
prefix meaning thin
What is platy
a prefix meaning flat or wide (think Plateau)
Tests of normality (2)
Kolmogorov-Smirnov test
Shapiro-Wilks test
Tests of normality is dependent on
sample size
If you got a massive sample size then you will find these normality tests often come out as …. even when your data visually can look - (2)
significant
normally disttibuted
If you got a small sample size, then the normality tests may look non-siginificant, even when data is normally distributed, due to
lack of power in the test to detect a significant effect
There is no hard or fast rule for
determining whether data is normally distributed or not
Plot your data because this helps inform on what decisions you want to make with respect to
normality
Even if normality test is sig and data looks visually normally distributed then still do
parametric tests
A frequency distribution or a histogram is a plot of how many times
each score occurs
2 main ways a distribution can deviate from the normal - (2)
- Lack of symmetry (called skew)
- Pointyness (called kurotsis)
In a normal distribution the values of skew and kurtosis are 0 meaning…
tails of the distribution are as they should be
Is age nominal or continous?
Continous
Is gender continous or nominal?
Nominal
Is height continous or nominal?
Continous
Which of the following best describes a confounding variable?
A. A variable that affects the outcome beingmeasured as well as, or instead of, theindependent variable
B. A variable that is manipulated by theexperimenter
C. A variable that has been measured using an unreliable scale
D.A variable that is made up only of categories
A
If a test is valid , what does it mean?
A.The test measures what it claims to measure.
B. The test will give consistent results. (Reliability)
C.The test has internal consistency (measure for correlations between different items on same test = see if it measures same construct)
D. Test measures a useful construct or variable = test can measure something useful but not valid
A
A variable that measures the effect that manipulating another variable has is known as:
A. DV
B. A confounding variable
C. Predictor variable
D. IV
A
The discrepancy between the numbers used to represent something that we are trying tomeasure and the actual value of what we are measuring is called:
A. Measurement error
B. Reliability
C. The ‘fit’ of the model
D. Variance
A
A frequency distribution in which low scores are most frequent (i.e. bars on the graph arehighest on the left hand side) is said to be:
A. Positively skewed
B. Leptokurtic = distribution with positive kurotsis
C. Platykurtic = negative kruotsis
D. Negatively skewed = frequent scores
A
Which of the following is designed to compensate for practice effects?
A. Counterbalancing
B. Repeated measures design = practice effects issue in repeated measures
C. Giving a participants a break between tasks = this compenstates for bordeom effects
D. A control condition = provides reference point
A
Variation due to variables that has not been measured is
A. Unsystematic variation
B. Homogenous variance = assumption variance each population is equal
C. Systematic variation = due to exp manpulation
D. Residual variance = confirms how well regression line constructed fit to actual data
A
Purpose of control condition is to
A. Allow inferences about cause
B. Control for participants’ characteristics = randomisation
C. Show up relationship between predictor variables
D. Rule out tertium quid
A Allow inferences of cause
If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z-score for a score of 18?
A. -2
B. 11
C. 2
D. -1.41
A (18-26) = -8/4 = -2
The standard deviation is the square root of the
A. Variance
B. Coefficient of determination = r squared
C. Sum of squares = sum of squared deviances
D. Range = largest = smallest
A
Complete the following sentence:A large standard deviation (relative to the value of the mean itself
A. Indicate data points are distant from man (i.e., poor fit of data)
B. Indicate the data points are close to mean
C. Indicate that mean is good fit of data
D. Indicate that you should analyse data with parameteric
A
The probability is p = 0.80 that a patient with a certain disease will besuccessfully treated with a new medical treatment. Suppose that thetreatment is used on 40 patients. What is the “expected value” of thenumber of patients who are successfully treated?
A. 32
B. 20
C. 8
D. 40
A = 80% of 40 is 32 (0.80 * 40)
Imagine a test for a certain disease.
Suppose the probability of a positive test result is .95if someone has the disease, but the probability is only .08 that someone has the disease if his or her test result was positive.
A patient receives a positive test, and the doctor tellshim that he is very likely to have the disease. The doctor’s response is:
A. confusion of intervse
B. Law of small numbers
C. Gambler’s fallacy
D. Correct, because test is 95% accurate when someone has the disease = incorrect as doctor based assumption on incorrect inverse proability
A
Which of these variables would be considered not to have met the assumptions ofparametric tests based on the normal distribution?
(Hint many statistical tests rely on data measured on interval level)
A. gender
B. Reaction time
C. Temp
D. Heart rate
A
The test statistics we use to assess a linear model are usually _______ based on thenormal distribution
(Hint: These tests are used when all of the assumptions of a normal distribution havebeen met
A. Parametric
B. Non-parametric
C. Robust
D. Not
A
Which of the following is not an assumption of the general linear model?
A. Dependence
B. Addictivity
C. Linearity
D. Normally distributed residuals
A = independence is an assumption of parametric and not dependence
Looking at the table below, which of the following statements is the most accurate?
Hint: The further the values of skewness and kurtosis are from zero, the more likely it is that thedata are not normally distributed
A. For the number of hours spent practicsing , there is not an issue of kruotsis
B. For level of msucial skill, data are heavily negatively skewed
C. For number of hours spent practicsing there is an issue of kruotsis
D. For the number of hours spent practicsing, the data is fairly positively skewed
A - correct
B. Incorrect as value of skewnessis –0.079, which suggests that the dataare only very slightly negatively skewedbecause the value is close to zero
C. Incorrect as value of kurtosis is0.098, which is fairly close to zero,suggesting that kurtosis was not aproblem for these data
D. Incorrect as value of skewnessfor the number of hours spent practisingis –0.322, suggesting that the data areonly slightly negatively skewed
Diagram of skewness
In SPSS, output if value of skewness is between -1 and 1 then
all good
In SPSS, output if value is below -1 or above 1 then
data is skewed
In SPSS, output if value of skewness is below -1 then
negatively skewed
In SPSS, output if value of skewness is above 1 then
positively skewed
Diagram of lepto kurotic, platykurtic and mesokurtic( normal)
What does kurotsis tell you?
how much our data lies around the ends/tails of our histogram which helps us to identify when outliers may be present in the data.
A distribution with positive kurtosis, so much of the data is in the tails, will be
pointy or leptokurtic
A distribution with negative kurtosis, so the data lies more in the middle, will be more
be more sloped or platykurtic
Kurtosis is the sharpness of the
peak of a frequency-distribution curve
If our Kurtosis value is 0, then the result is a
normal distribution
If kurotsis value in SPSS between -2 and 2 then
all good! = normal distribution
If kurotsis value in SPSS less than -2 then
platykurtic
If kurotsis value is greater than 2 in SPSS then
leptokurtic
Are we good for skewness and kurotsis in this output SPSS?
Good because both the skewness is between -1 and 1 and kurtosis values are between -2 and 2.
Are we good for skewness and kurotsis in this output SPSS?A
Bad because although the skewness is between 1 and -1, we have a problem with kurtosis with a value of 2.68 which is larger than 2 and -2
Correlational research doesn’t allow to rule out the presence of a
third variable = confounding variable
e.g, we find that drownings and ice cream sales are correlated, we conclude that ice cream sales cause drowning. Are we correct? Maybe due to the weather
The tertium quid is a variable that you may not have considered that could be
influencing your results e.g., ice cream and drowning session
How to rule out tertium quid? - (2)
Use of RCTs.
Randomized Controlled Trials allow to even out the confounding variables between the groups
Correlation does not mean
causation
To infer causation,
we need to actively manipulate the variable we are interested in, and control against a group (condition) where this variable was not manipulated.
Correlation does not mean causation as according to Andy
causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results”
Aside from checking of kurotsis and skewness assumptions in data also check if it has
linearity or less commonly additivity
Additivity refrs to the combined
effect of many predictors
What does this diagram show in terms of additivty /linearity? - (5)
There is a a linear effect when the data increases at a steady rate like the graph on the left.
Your cost increases steadily as the number of chocolate bars increases.
The graph on the right shows a non-linear effect when there is not this steady increase rather there is a sharp change in your data.
So you might feel ok if you eat a few chocolate bars but after that the risk of you having a stomach ache increases quite rapidly the more chocolates you eat.
This effect is super important to check or your statistical analysis will be wrong even if your other assumptions are correct because a lot of statistical tests are based on linear models.
Discrepnacy between measurement and actual value in population is .. and not..
measurement error and NOT variance
Measurement error can happen across all psychological experiments from.. to ..
recording instrument failure to human error
What are the 2 types of measurement errors? - (2)
- Systematic
- Random
What is systematic measurement error?
: predictable, typically constant or proportional to the true value and always affect the results of an experiment in a predictable direction
Example of systematic measurement error
for example if I know I am 5ft2 and when I go to get measured I’m told I’m 6ft this is a systematic error and pretty identifiable - these usually happen when there is a problem with your experiment
What is random measurement error?
measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken.
Example of random measurement error
for example my height is 5ft2 when I measure it in the morning but its 5ft when I measure myself in the evening. This is because my measurements were taken at different times so there would be some variability – for those of you who believe you shrink throughout the day.
What is variance?
Average squared deviation of each number from its mean.
Variability is an inherent part of
things being measured and of the measurement process
Diagram of variance formula
In central limit theorem - (2)
states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for sample sizes over 30.
Therefore, as a sample size increases, the sample mean and standard deviation will be closer in value to the population mean μ and standard deviation σ .
What does histogram look at? - (2)
Frequency of scores
Look at distribution of data, skewness, kurotsis
What does boxplot look at? - (2)
To identify outliers
Shows median rather than mean (good for non-normally distributed data)
What do line graphs are?
simply bar charts with lines instead of bars
Bar charts are a good way to display
display means (and
standard errors)
What do scatterplot illustrates? - (2)
a relationship between two variables, e.g. correlation or regression
Only use regression lines for regressions!
What are matrix scatterplots? - (2)
Particular kind of scatterplot that can be used instead of the 3-D scatterplot
clearer to read
Using data provided how would you summarise skew?
A. The data has an issue with positive skew
B.The data has an issue with negative skew
C.The data is normally distributed
B
What is the median number of bullets shot at a partner by females?
67.00
What descriptive statistics does the red arrow represents?
A. Inter quartile range
B. Median
C. Mean
D. Range
A
What is the mean of males and females SD? - (2)
Males M = 27.29
Females SD = 12.20
What is the respective standard error of mean for femals and males?
3.26 & 3.42
Answering the question: Meets assumption of parametric tests will determine whether our continous data can be tested with
with parametric or non-parametric tests
A normal distribution is a distribution with the same general shape which is a
bell shape
A normal distribution curve is symmetric around
the mean μ
A normal distribution is defined by two parameters - (2)
the mean (μ) and the standard deviation (σ).
Many statistical tests (parametric) cannot be used if the data is not
normally distributed
What does this diagram show? - (2)
μ = 0 is peak of distribution
Block areas under the curve and gives us insight to way data is distributed and certain scores occuring if they belong to normally distribution e.g., 34.1% of values lie one SD below mean
A z score in standard normal distribution will reflect the number of
SD above or below the mean of a particular score is
How to calculate a z score?
Take a value of participant (e.g., 56 years old) and take away mean of distribution (e.g., mean age of class is 23) divided by SD (class like 2)
If a person scored a 70 on a test with a mean of 50 and a standard deviation of 10
Converting the test scores to z scores, an X of 70 would be…
What the result means…. - (2)
a z score of 2 means the original score was 2 standard deviations above the mean
We can convert our z scores to
pecentiles
Example: What is the percentile rank of a person receving a score of 90 on the test? - (3)
Mean - 80
SD = 5
First calculating z score: graph shows that most people scored below 90. Since 90 is 2 standard deviations above the mean z = (90 - 80)/5 = 2
Z score to pecentile can be looked at table that z score of 2 is equivalent to the 97.7th percentle:
The proportion of people scoring below 90 is thus .977 and proportion of people scoring above 90 is 2.3% (1-0.977)
What is the sample mean?
an unbiased estimate of the population mean.
How can we know how that our sample mean estimate is representative of the population mean?
Via computing standard error of mean - smaller SEM the better
Standard deviation is used as a measure of how
representative the mean was of the observed data.
Small standard deviations represented a scenario in which most data points were
most data points were close to the mean
Large standard deviation represented a situation in which data points were
widely spread
from the mean.
How to calculate the standard error of mean?
computed by dividing the standard deviation of the sample by the the square root of the number in the sample
The larger the sample the smaller the - (2)
standard error of the mean
more confident we can be that the sample mean is representative of the population.
The central limit therom proposes that
as samples get large (usually defined as greater than 30), the sampling distribution has a normal distribution with a mean equal to the population mean, SD = SEM
The standard deviation of sample means is known as the
SEM (standard error of the mean)
A different approach to assess accuracy of sample mean as estimate of - population mean, aside from SE, is to - (2)
calculate boundaries and range of values within which we believe the true value of the population mean value will fall.
Such boundaries are called confidence intervals.
Confidence intervals are created by
samples
A 95% confidence intervals is consructed such that
these intervals (created by samples) will contain the population mean
95% Confidence interval for 100 samples (CI constructed for each) would mean
95 of these samples, the confidence intervals we constructed would contain the true value of the mean in the population.
Diagram shows- (4)
- Dots show the means for each sample
- Lines sticking out representing Ci for the sample means
- If there was a vertical line down it represents population mean
- If confidence intervals don’t overlap then it shows significant difference between the sample means
In fact, for a specific confidence interval, the probability that it contains the population value is either - (2)
0 (it does not contain it) or 1 (it does contain it).
You have no way of knowing which it is.
if our sample means were normally distributed with a mean of 0 and a
standard error of 1, then the limits of our confidence interval
would be –1.96 and +1.96 -
95% of z scores fall between
-1.96 and 1.96
Confidence intervals can be constructed for any estimated parameter, not just
μ - mean
. If the mean represents the true mean well, then the confidence interval of that mean should be
small
if the confidence interval is very
wide then the sample mean could be
very different from the true mean, indicating that it
is a bad representation of the population
Remember that the standard error of the mean gets smaller with the number of observations and thus our confidence interval also gets
smaller - make sense as more we measure more certain sample mean close to population mean
Calculating Confidence Intervals for sample means - rearranging in z formula
LB = Mean - (1.96 * SEM)
UB = Mean + (1.96 * SEM)
The standard deviation of SAT verbal scores in a school system is known to be 100. A researcher wishes to estimate the mean SAT score and compute a 95% confidence interval from a random sample of 10 scores.
The 10 scores are: 320, 380, 400, 420, 500, 520, 600, 660, 720, and 780.
Calculate CI
* M - 530
* N = 10
* SEM = 100/ square root of 10 = 31.62
* Value of z for 95% CI is number of SD one must go from mean (in both directions) to contain 0.95 of the scores
* Value of 1.96 was found in z-table
* Since each tail is to contain 0.025 of the scores, you find the values of z for which is 1-0.025 = 0.975 of the socres below
* 95% of z scores lie between -1.96 and +1.96
* Lower limit = 530 - (1.96) (31.62) = 468.02
* Upper limit = 530 + (1.96)(31.62) = 591.98
Think of test statistic capturing
signal/noise
Hypo
A testStatistic for which the frequency of particular values is known (t, F, chi-square) and thus we can calculate the
probability of obtaining a certain value or p value.
To test whether the model fits the data or whether our hypothesis is a good explanation of the data, we compare
systematic variation against unsystematic
If the probability (p-value) less than or equal to the significance level, then
the null hypothesis is rejected; When the null hypothesis is rejected, the outcome is said to be “statistically significant”
If the proabilibty (p-value) is greater than the signifiance leve, the
null hypothesis is not rejected.
What is a type 1 error in terms of variance? - (2)
think the variance accounted for by the model is larger than the one unaccounted for by the model (i.e. there is a statistically significant effect but in reality there isn’t)
Type 1 is a false
positive
What is type II error in temrs of variance?
think there was too much variance unaccounted for by the model (i.e. there is no statistically significant effect but in reality there is)
Type II error is false
negative
Example of Type I and Type II error
Type I and Type II errors are mistakes we can make when testing the
fit of the model
Type 1 errors when we believe there is a geniue effect in
population, when in fact there isn’t.
Acceptable level of type I error is usually
a-level of usually 0.05
Type II error occurs when we believe there is no effect in the
population when, in reality, there is.
Acceptable level of Type II error is probability/-p-value is
β-level (often 0.2)
An effect size is a standardised measure of
the size of the an effect
Properities of effect size (3)
Standardized = comparable across studies
Not (as) reliant on the sample size
Allows people to objectively evaluate the size of observed effect.
Effect Size Measures
r = 0.1, d = 0.2 (small effect):
the effect explains 1% of the total variance.
Effect size measures
r = 0.3, d = 0.5 (medium effect) means
the effect accounts for 9% of the total variance.
Effect size measures
r = 0.5, d = 0.8 (large effect)
effect accounts for 25% of the variance
Beware of the ‘canned’ effect sizes (e.g., r = 0.5, d = 0.8 and rest) since the size of
effect should be placed within the research context.
We should aim to achieve a power of
.8, or an 80% chance of detecting
an effect if one genuinely exists.
When we fail to reject the null hypothesis, it is either that there truly are no difference to be found,
OR
it may be because we do not have enough statistical power
Power is the probability of
correctly rejecting a false H0 OR the ability of the test to find an effect assuming there is one in the population,
Power is calculated by
1 - β OR probability of making Type II error
To increase statistical power of study you can increase
your sample sizee
Factors affecting the power of the test: (4):
- Probability of a type 1 error or a-level [level at which we decide effect is sig - p-value) –> bigger [more lenient] alpha then more power)
- True alternate hypothesis H1 [effect size] (degree of overlap, less means more power) - if you find large effect in lit then better chance of detecting something
- The sampel size [N]) –> bigger the sample, less the noise and more power
- The particular tests to be employed - parametric tests greater power to detect sig effect since more sensitive
How to calculate the number of pps they need for reasonable chance of correctly rejecting null hypothesis?
Sample size calculation at a desired level of power (usually power set to 0.8 in formula)
With power, we can do 2 things - (2)
- Calculate power of test
- Calculate sample size necessary to detect an decent effect size and achieve a certain level of power based on past research
Diagram of Type I error, Type II error, power - (4) and making correct decisions
Type 1 error p = alpha
Type II error p = beta
Accepting null hypothesis which is correct - p = 1- alpha
Accepting alternate hypo which is correct - p = 1 - beta
If there is a less degree of overlap in h0 and h1 then
bigger difference means higher power and and correctly reject the null hypothesis than distributions that overlap more
If distribution between h0 and h1 are narrower then
This means that the overlap in distributions is smaller and the power is therefore greater, but this time because of a smaller standard error of our estimate of the means.
Most people want to assess how many participants they need to test to have a reasonable chance of correctly rejecting the null hypothesis (the Power). This formula shows - (2)
us how.
We usually set the power to 0.8.
What is z scores? - (2)
A measure of variability:
The number of standard deviations from the population mean or a particular data point is
Z-scores are a standardised measure, hence they ignore measurement units
Why should we care about z scores? - (2)
Z-scores allow researchers to calculate the probability of a score occurring within a standard normal distribution
Enables us to compare two scores that are from different samples (which may have different means and standard deviations)
Diagram of finding percentile of Trish
Trish takes a test and gets 25
Mean of the class is 20
SD = 4
25-20/4 = 1.25
Z-score = 1.25
Let’s say Trish takes a test and scores 25 and the mean is 20 You may calculate the z-score to be 1.25 you would use a z-score table to see what percentile they would be in (marked in red) so to read the table you would go down to the value 1.2 and you would go across to 0.05 which totals to 1.25 and you can see about 89.4% of other students performed worse.
Diagram of z score and percentile
Josh takes a different test and gets 1150
Mean of the class is 1000
SD = 150
1150 – 1000/150 = 1.0
Z score = 1.0
Who performed better Trish or Josh?
Trish had z score of 1.25
We would use our table and look down the column to a z-score of 1 and across to the 0.00 column (in purple) and we can see 84.1% of students performed worse than Josh so Trish performed better than Josh.
Diagram of z scores and normal distribution - (3)
68% of scores are within 1 SD of the mean,
95% are within 2 SDs and
99.7% are within 3 SDs.
Whats standard error?
: by taking into account the variability and size of our sample we can estimate how far away from the real population mean our mean is!
If we took infinite samples from the population, 95% of the time the population mean will lie within the
the 95% confidence interval range
What does narrow CI represent?
high statistical power
Wide CIs represent?
low statistical power
Power bring the probability of catching a real effect (as opposed to
missing a real effect – Type II error)
We can never say the null hypothesis is
FALSE (or TRUE).
TheP valueor calculated probability is the estimated probability of us
us finding an effect when the null hypothesis (H0) is true.
p = probability of observing a test statistic at least as a big as the one we have if the
H0 is true
Hence, a significant p value (p <.05) tells us that there is a less than 5% chance of getting a test statistic that is
larger than the one we have found if there were no effect in the population (e.g. the null hypothesis were true)
Statistical signifiance does not equal importance - (2)
p = .049, p = .050 are essentially the same thing- the former is ‘statistically significant’.
Importance is dependent upon the experimental design/aims: e.g., A statistically significant weight increase of 0.1Kg between two adults experimental groups may be less important than the same increase between two groups of babies.
Children can learn a second language faster before the age of 7’. Is this statement:
A. One-tailed
B. A non scientific
C. Two-tailed
D. Null hypothesos
A as one-tailed is directional and two tailed is non-direcitonal
Which of the following is true about a 95% confidence interval of the mean:
A. 95 out of 100 CIs wll contain population mean
B. 95 out of 100 sample means will fall within the limits of the confidence interval.
C. 95% of population means will fall within the limits of the confidence interval.
D. There is a 0.05 probability that the population mean falls within the limits of the confidence interval.
A as If we’d collected 100 samples, calculated the mean and then calculated a confidence interval for that mean, then for 95 of these samples the confidence intervals we constructed would contain the true value of the mean in the population
What does a significant test statistic tell us?
A. That the test statistic is larger than we would expect if there were no effect in the population.
B. There is an important effect.
C. The null hypothesis is false.
D. All of the above.
A and just because test statistic is sig does not mean its important effect
Of what is p the probability?
(Hint: NHST relies on fitting a ‘model’ to the data and then evaluating the probability of this ‘model’ given the assumption that no effect exists.)
A.p is the probability of observing a test statistic at least as big as the one we have if there were no effect in the population (i.e., the null hypothesis were true).
B. p is the probability that the results are due to chance, the probability that the null hypothesis (H0) is true.
C. p is the probability that the results are not due to chance, the probability that the null hypothesis (H0) is false
D. p is the probability that the results would be replicated if the experiment was conducted a second time.
A
A Type I error occurs when:
(Hint: When we use test statistics to tell us about the true state of the world, we’re trying to see whether there is an effect in our population.)
A. We conclude that there is an effect in the population when in fact there is not.
B. We conclude that there is not an effect in the population when in fact there is.
C. We conclude that the test statistic is significant when in fact it is not.
D. The data we have typed into SPSS is different from the data collected.
A as If we use the conventional criterion then the probability of this error is .05 (or 5%) when there is no effect in the population
True or false?
a. Power is the ability of a test to detect an effect given that an effect of a certain size exists in a population.
TRUE
True or False?
We can use power to determine how large a sample is required to detect an effect of a certain size.
TRUE
True or False?
c. Power is linked to the probability of making a Type II error.
TRUE
True or False?
d. The power of a test is the probability that a given test is reliable and valid.
FALSE
What is the relationship between sample size and the standard error of the mean?
(Hint: The law of large numbers applies here: the larger the sample is, the better it will reflect that particular population.)
A. The standard error decreases as the sample size increases.
B. The standard error decreases as the sample size decreases.
C. The standard error is unaffected by the sample size.
D. The standard error increases as the sample size increases.
A The standard error (which is the standard deviation of the distribution of sample means), defined as σ_Χ ̅ =σ/√N, decreases as the sample size (N) increases and vice versa
What is the null hypothesis for the following question: Is there a relationship between heart rate and the number of cups of coffee drunk within the last 4 hours?
A. There will be no relationship between heart rate and the number of cups of coffee drunk within the last 4 hours.
B. People who drink more coffee will have significantly higher heart rates.
C. People who drink more cups of coffee will have significantly lower heart rates.
D. There will be a significant relationship between the number of cups of coffee drunk within the last 4 hours and heart rate
A The null hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent
A Type II error occurs when :
(Hint: This would occur when we obtain a small test statistic (perhaps because there is a lot of natural variation between our samples.)
A. We conclude that there is not an effect in the population when in fact there is.
B. We conclude that there is an effect in the population when in fact there is not.
C. We conclude that the test statistic is significant when in fact it is not.
D. The data we have typed into SPSS is different from the data collected.
A A Type II error would occur when we obtain a small test statistic (perhaps because there is a lot of natural variation between our samples)
In general, as the sample size (N) increases:
A. The confidence interval gets narrower.
B. The confidence interval gets wider.
C. The confidence interval is unaffected.
D. The confidence interval becomes less accurate
A
Which of the following best describes the relationship between sample size and significance testing?
(Hint: Remember that test statistics are basically a signal-to-noise ratio, so given that large samples have less ‘noise’ they make it easier to find the ‘signal’.)
A. In large samples even small effects can be deemed ‘significant’.
B. In small samples only small effects will be deemed ‘significant’.
C. Large effects tend to be significant only in small samples.
D. Large effects tend to be significant only in large samples.
A
The assumption of homogeneity of variance is met when:
A. The variances in different groups are approximately equal.
B. The variances in different groups are significantly different.
C. The variance across groups is proportional to the means of those groups.
D. The variance is the same as the interquartile range.
A - To make sure our estimates of the parameters that define our model and significance tests are accurate we have to assume homoscedasticity (also known as homogeneity of variance)
Next, the lecturer was interested in seeing whether males and females reacted differently to the different teaching methods.
Produce a clustered bar graph showing the mean scores of teaching method for males and females.
(HINT: place TeachingMethod on the X axis, Exam Score on the Y axis, and Gender in the ‘Cluster on X’ box. Include 95% confidence intervals in the graph).
Which of the following is the most accurate interpretation of the data?
A.Females performed better than males both the reward and indifferent conditions. Regarding the confidence intervals, there was a large degree of overlap between males and females in all conditions of the teaching method.
B.Males performed better than females in the reward condition, and females performed better than males in the indifferent condition. Regarding the confidence intervals, there was no overlap between males and females across any of the conditions of teaching method.
C.Males performed better than females in all conditions. Regarding the confidence intervals, there was a small degree of overlap between males and females for the reward and indifferent conditions, and a large degree of overlap between males and females for the punish condition.
D.Males performed better than females in the reward condition, and females performed better than males in the indifferent condition. Regarding the confidence intervals, there was a small degree of overlap between males and females for the reward and indifferent conditions, and a large degree of overlap between males and females for the punish condition.
D
Produce a line graph showing the change in mean anxiety scores over the three time points.
NOTE: this is a repeated measures (or within subjects) design, ALL participants took part in the same condition.
Which of the following is the correct interpretation of the data?
A.Mean anxiety increased across the three time points.
BMean anxiety scores were reduced across the three time points, and there was a slight acceleration in this reduction between the middle and end of the course.
CMean anxiety scores were reduced across the three time points, though this reduction slowed down between the middle and end of the course.
DMean anxiety scores did not change across the three time points.
B
A general approach in regression is that our outcomes can be predicted by a model and what remains
is the error
The i in the general model in regression shows
e.g., outcome 1 is equal to model plus error 1 and outcome 2 is equal to model plus error 2 and so on…
For correlation, the outcome is modelled by
scaling (multiplying by a constant) another variable
Equation of correlation model
If you have a 1 continous variable which meets assumtpion of parametric test then you can conduct a
pearson correlation or regression
Variance is a feature of outcome measurements we have obtained and we want to predict with a model in correlation/regression that…
captures the effect of the predictor variables we have manipulated or measured
Variance of a single variable represents the
average amount that the data cary from the mean
Variance is the standard deviation
squared (s squared)
Variance formula - (2)
xi minus average of all scores of pp which is squared and divided by total number of participants minus 1
done for each participant (sigma)
Variance is SD squared meaning that it captures the
average of the squared difference the outcome values from the mean of all outcomes (explaining what the formula of variance does)
Covariance gathers information on whether
one variable covarys with another
In covariance if we are interested whether 2 variables are related then interested whether changes in one variable are met with changes in other
therefore.. - (2)
when one variable deviates from its mean we
would expect the other variable to deviate from its mean in a similar way.
So, if one variable increases then the other, related variable, should also increase or even decrease at the same level.
If one variable covaries with another variable then it means these 2 variables are
related
To get SD from variance then you would
square root variance
What would you do in covariance formula in proper words? - (5)
- Calculate the error between the mean and each subject’s score for the first variable (x).
- Calculate the error between the mean and their score for the second variable (y).
- Multiply these error values.
- Add these values and you get the product deviations.
- The covariance is the average product deviations
Example of calculaitng covariance and what does answer tell you?
The answer ispositive: that tells us the x and y values tend to risetogether.
What does each element of covariance formula stand for? - (5)
X = the value of ‘x’ variable
Y = the value of ‘y’ variable
X(line) = mean of ‘x’ - e.g., green
Y(line) = mean of ‘y’ - e.g., blue
n = the number of items in the data set
covariance will be large when values below
the mean for one variable
What does a positive covariance indicate?
as one variable deviates from the mean, the other
variable deviates in the same direction.
What does negative covariance indicate?
a negative covariance indicates that as one variable deviates from the mean (e.g. increases), the other deviates from the mean in the opposite direction (e.g. decreases).
What is the problem of covariance as a measure of the relationship between 2 variables? - (5)
dependent upon the units /scales of measurement used
So covariance is not a standardised measure
e.g., if 2 variables measured in miles and covariance is 4.25 then if we convert data to kilometres then we have to calculate covariance again and see it increases to 11.
Dependence of scale measurement is a problem as can not compare covariances in an objective way –> can not say whether covariance is large or small to another data unless both data sets measured in same units
So we need to STANDARDISE it.
What is the process of standardisaiton?
To overcome the problem of dependence on the measurement scale, we need to convert
the covariance into a standard set of units
How to standardise the covariance?
dividing by product of the standard deviations of both variables.
Formula of standardising covariance
Same formula of covariance but multipled of SD of x and SD of y
Formula of Pearson’s correlation coefficient, r
Example of calculating Pearson’s correlation coefficient, r - (5)
standard deviation for the number of adverts watched (sx)
was 1.67,
SD of number of packets of crisps bought (sy) was 2.92.
If we multiply these together we get 1.67 × 2.92 =
4.88.
.Now, all we need to do is take the covariance, which we calculated a few pages ago as being 4.25, and divide by these multiplied standard deviations.
This gives us r = 4.25/
4.88 = .87.
The standardised version of covariance is the
correlational coefficient or Pearson’s r
Pearson’s R is … version of covariance meaning independent of units of measurement
standardised
What does correlation describe? - (2)
Describes a relationship between variables
If one variable increases, what happens to the other variable?
Pearson’s correlation coefficient r was also called the
product-moment correlation
Linear relationship and normally disturbed data and interval/ratio and continous data is assumed in
Pearson’s r correlation coefficient
Pearson Correlation Coefficient varies between
-1 and +1 (direction of relationship)
The larger the R Pearson’s correlation coefficient value, the closer the values will
be with each other and the mean
The smaller R Pearson’s correlation coefficient values indicate
there is unexplained variance in the data and results in the data points being more spread out.
What does these two graphs show? - (2)
- example of high negative correlation. The data points are close together and are close to the mean.
- On the other hand, the graph on the right shows a low positive correlation. The data points are more spread out and deviate more from the mean.
The Pearson Correlation Coefficient measures the strength of a relationhip
between one variable and another hence its use in calculating effect size
A Pearson’s correlation coefficient of +1 indicates
two variablesare perfectly positively correlated, so as one variable increases, the other increases by a
proportionate amount.
A Pearson’s correlation coefficient of -1 indicates
a perfect negative relationship: if one variable increases, the other decreases by a proportionate amount.
Pearson’s r
+/- 0.1 means
small effect
Pearson’s r
+/- 0.3 means
medium effect
Pearson’s r
+/- 0.5 means
large effect
In Pearson’s correlation, we can test the hypothesis that - (2)
correlation coefficient is different from zero
(i.e., different from ‘no relationship’)
In Pearson’s correlation coefficient, we can test the hypothesis that the correlation is different from 0
If we find our observed coefficient was very unlikely to happen if there was no effect in population then gain confidence that
relationship that
we have observed is statistically meaningful.
. In the case of a correlation
coefficient we can test the hypothesis that the correlation is different from zero (i.e. different
from ‘no relationship’).
There are 2 ways to test this hypothesis
- Z scores
- T-statistic
Confidence intervals tells us something
likely correlation in the population
Can calculate confidence intervals of Pearson’s correlation coefficient by transforming formula of CI
As sample size increases, so the value of r at which a significant result occurs
decreases e.g 20 n p is not < 0.05 but at 200 pps it is p < 0.05
Pearson’s r = 0 means - (2)
indicates no linear relationship at all
so if one variable changes, the other stays the same.
Correlation coefficients give no indication of direction of… + example - (2)
causality
e.g., although we conclude no of adverts increase nmber of toffees bought we can’t say watching adverts caused us to buy toffees
We have to be caution of causality in terms of Pearson’s correlation r as - (2)
- Third variable problem - causality between variables can not be assumed in any correlation
- Direction of causality: Correlation coefficients give nothing about which variables causes other to change.
If you got weak correlation between 2 variables = weak effect then take a lot of measurements for that relationship to be
significant
R correlation coefficient gives the ratio of
covariance to a measure of variance
Example of correlations getting stronger
R squared is known as the
coefficient of determination
of cor
R^2 can be used to explain the
proportion of the variance for a dependent variable )outcome) that’s explained by an independent variable . (predictor)
Example of R^2 coefficient of determination - (2)
X = exam anxiety
Y = exam performance
If R^2 = 0.194
19.4% of variability in exam performance can be explained by exam anxiety
the variance in y accounted for by x’,
R^2 calculate the amount of shared
variance
Example of r and R^2
Multiply 0.1 * 0.1 for example
R^2 gives you the true strength of.. but without
the correlation but without an indication of its direction.
What are the three types of correlations? - (3)
- Bivarate correlations
- Partial correlations
- Semi-partial or part correlations
Whats bivarate correlation?
relation between 2 variables
What is a partial correlation?
looks at the relationship between two variables while ‘controlling’ the effect of one or more additional variables.
The partial correlation partials out the
the effect of one or more variables on either X or Y
A partial correlation controls for third variable which is made from - (3)
- A correlation calculates each data points distance from line (residuals)
- This is the error relative to the model (unexplained variance)
- A third variable might predict some of that variation in residuals
The partial correlation compares the unique variation of one variable with the
unfiltiered variation of the other
The partial correlation holds the
third variable constant (but we don’t manipulate these)
Example of partial correlation- (2)
For example, when studying the effect of a diet, the level of exercise might also influence weight loss
We want to know the unique effect of diet, so we need to partial out the effect of exercise
Example of Venn Diagram of Partial Correlation - (2)
Partial Correlation between IV1 and DV = D / D+C
Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.
Example of Partial Correlation - (2)
Partial correlation: Purple / Red + Purple
If we were doing just a partial correlation, we would see how much exam anxiety is influencing both exam performance and revision time.
Example of partial correlation and semi-partial correlation - (2)
The partial correlation that we calculated took
account not only of the effect of revision on exam performance, but also of the effect of revision on anxiety.
If we were to calculate the semi-partial correlation for the same data, then this would control for only the effect of revision on exam performance (the effect of revision
on exam anxiety is ignored).
In partial correlation, the third variable is typically not considered as the primary independent or dependent variable. Instead, it functions as a
control variable—a variable whose influence is statistically removed or controlled for when examining the relationship between the two primary variables (IV and DV).
The partial correlation is
The amount of variance the variable explains
relative to the amount of variance in the outcome that is left to explain after the contribution of other predictors have been removed from both the predictor and outcome.
These partial correlations can be done when variables are dichotomous (including third variable) e.g., - (2)
we could look at the relationship between bladder relaxation (did the person wet themselves or not?) and the number of large tarantulas crawling up your leg controlling for fear of spiders
(the first variable is dichotomous, but the second variable and ‘controlled for’ variables are continuous).
What does this partial correlation output show?
Revision time = partial, controlling for its effect
Exam performance = DV
Exam anxiety = X - (5)
- . First, notice that the partial correlation between exam performance and exam anxiety is −.247, which is considerably less than the correlation when the effect of
revision time is not controlled for (r = −.441). - . Although this correlation is still statistically significant (its p-value is still below .05), the relationship is diminished.
- value of R2 for the partial correlation is .06, which means that exam anxiety can now account for only 6% of the variance in exam performance.
- When the effects of revision time were not controlled for, exam anxiety shared 19.4% of the variation in exam scores and so the inclusion of revision time has severely diminished the amount of variation in exam scores shared by anxiety.
- As such, a truer measure of the role of exam anxiety has been obtained.
Partial correlations are most useful for looking at the unique
relationship between two variables when
other variables are ruled out
In a semi-partial correlation we control for the
effect that
the third variable has on only one of the variables in the correlation
The semi partial (part) correlation partials out the - (2)
Partials out the effect of one or more variables on either X or Y.
e.g. The amount revision explains exam performance after the contribution of anxiety has been removed from the one variable (usually the predictor- e.g. revision).
The semi-partial correlation compares the
unique variation of one variable with the unfiltered variation of the other.
Diagram of venn diagram of semi-partial correlation - (2)
- Semi-Partial Correlation between IV1 and DV = D / D+C+F+G
Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.
Diagram of revision and exam performance and revision time on semi-partial correlation - (2)
- purple/red + purple + white+ orange
- When we use semi-partial correlation to look at this relationship, we partial out the variance accounted for by exam anxiety (the orange bit) and look for the variance explained by revision time (the purple bit).
Summary of partial correlation and semi-partial correlation - (2)
A partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on both variables in the original correlation.
A semi-partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on only one of the variables in the original correlation.
Pearson’s product-moment correlation coefficient (described earlier) and Spearman’s rho (see section 6.5.3) are examples of
of bivariate correlation coefficients.
Non-parametric tests of correlations are… (2)
- Spearman’s roh
- Kendall’s tau test
In spearman’s rho the variables are not normally distributed and measures are on a
ordinal scale (e.g., grades)
Spearman’s rho works on by
first ranking the data n(numbers converted into ranks), and then running Pearson’s r on the ranked data
Spearman’s correlation coefficient, rs, is a non-parametric statistic and so can be used when the data have
data have violated parametric assumptions such as nonnormally distributed data
In spearman correlation coefficient is sometimes called
Spearman’s rho
For spearman’s r we can get R squared but it is interpreted slightly different as
proportion of
variance in the ranks that two variables share.
Kendall’s tau used rather than Spearman’s coefficient when - (2)
when you have a small data set with a large number of
tied ranks.
This means that if you rank all of the scores and many scores have the same rank, then Kendall’s tau should be used
Kendall’s tau test - (2)
For small datasets, many tied ranks
Better estimate of correlation in population than Spearman’s ρ
Kendall’s tau is not numerically similar to r or rs (spearman) and so tau squared does not tell us about
proportion of
variance shared by two variables (or the ranks of those two variables).
The Kendall’s tau is 66-75% smaller than both Spearman’s r and Pearson’s r so
tau is not comparable to r and r s
There is a benefit using Kendall’s statistic than Spearman as it shows - (2)
Kendall’s statistic is actually a better estimate of the correlation in the population
we can draw more accurate generalizations from Kendall’s statistic than from Spearman’s.
Whats the decision tree for Spearman’s correlation? - (4)
- What type of measurement = continous
- How many predictor variables = one
- What type of continous variable = continous
- Meets assumption of parametric tests - No
The output of Kendall and Spearman can be interpreted the same way as
Pearson’s correlation coefficient r output box
The biserial and point-biserial correlation coefficients used when
one of the two variables is dichotomous (e.g., example of dichotomous variable is women being pregnant or not)
What is the difference between biserial and point-biserial correlations?
depends on whether the dichotomous variable is discrete or continuous
The point–biserial correlation coefficient (rpb) is used when
one variable is a
discrete dichotomy (e.g. pregnancy),
biserial correlation coefficient (rb) is used
when - (2)
one variable is a continuous dichotomy (e.g. passing or failing an exam).
e.g. An example is passing or failing a statistics test: some people will only just fail while others will fail by
a large margin; likewise some people will scrape a pass while others will clearly excel.
Example of when point=biserial correlation used - (3)
- Imagine interested in relationship between gender of a cat and how much time it spent away from home
- Time spent away is measured in interval level –> mets assumptions of parametric data
- Gender is discrete dichotomous variable coded with 0 for male and 1 for female
Can convert point-biserial correlation coefficient into
biseral correlation coefficient
Point biserial and biserial correlation differ in size as
biserial correlation bigger than point biserial
Example of queston conducting Pearson’s r (4) -
The researchers was interested in whether the amount someone gets paid and amount of holidays they take from work, whether these two variables would be related to their productivity at work
- Pay: Annual salary
- Holiday: Number of holiday days taken
- Productivity: Productivity rating out of 10
Example of Pearson’s r scatterplot :
relationship between pay and productivity
If we have r = 0.313 what effect size is it?
medium effect size
±.1 = small effect
±.3 = medium effect
±.5 = large effect
What does this scatterplot show?
o This indicates very little correlation between the 2 variables
What will a matrix scatterplot show?
the relationship between all possible combinations of your variables
What does this scatterplot matrix show? - (2)
- For Pay and Holiday, we can see the line is very flat and indicates the correlation between the two variables is quite low
- For pay and productivity, the line is steeper suggesting the correlation is fairly substantial between these 2 variables and same for holidays and pay and productivity and holidays here
What is degrees of freedom for correlational analysis?
N-2
What does this Pearson’s correlation r output show? - (4)
- The relationship between pay and holidays is very low correlation is -0.04
- Between pay and productivity, there is a medium size correlation of r = 0.313
- Between holidays and productivity there is medium going on large effect size of 0.435
- Relationship between pay and productivity and also holidays and productivity is sig but correlation with pay and holidays was not sig
Another examp;e of Pearson’s correlation r question - (3)
A student was interested in the relationship between the time spent preparing an essay, the interestingness of the essay topic and the essay mark received.
He got 45 of his friends and asked them to rate, using a scale from 1 to 7, how interesting they thought the essay topic was (1 - I’ll kill myself of boredom, 4 - it’s not too bad!, 7 - it’s the most interesting thing in the world!) (interesting).
He then timed how long they spent writing the essay (hours), and got their percentage score on the essay (essay).
Example of interval/ratio continous data needed for Pearson’s r for IV and DV - (2)
- Interval scale: difference between 10 degrees C and 20 degrees is same as 80 F and 90 F, 0 degrees does not mean absence of temp
- Ratio: Height as 0 cm means no weight and weight, time
Pearson’s correlation r , spearman and kendall equires
one IV and one DV
What does this SPSS output show?
A. There was a non-significant positive correlation between interestingness of topic and the amount of time spent writing. There was a non-significant positive correlation between time spent writing an essay and essay mark
There was a significant positive correlation between interestingness of topic and essay mark, with a medium effect size
B. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a small effect size.There was a significant positive correlation between time spent writing an essay and essay mark, with a large effect size. .There was a non-significant positive correlation between interestingness of topic and essay mark
C. There was a significant negative correlation between interestingness of topic and the amount of time spent writing, with a medium effect size.. There was a non-significant positive correlation between time spent writing an essay and essay mark. There was a non-significant positive correlation between interestingness of topic and essay mark
D. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a large effect size. There was a non-significant positive correlation between time spent writing an essay and essay mark There was a non-significant positive correlation between interestingness of topic and essay mark
D. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a large effect size. There was a non-significant positive correlation between time spent writing an essay and essay mark There was a non-significant positive correlation between interestingness of topic and essay mark
r = 0.21 effect size is..
in between small and medium effect
Effect size is only meaningful if you evaluatte it witth regards to
your own research area
Biserial correlaion is when
one variable is dichotomous, but there is an underlying continuum (e.g. pass/fail on an exam)
Pointt biserial correlation is when
When one variable is dichotomous, and it is a true dichotomy (e.g. pregnancy)
Example of dichotomous relationship - (3)
- example of a true dichotomous relationship.
- We can compare the differences in height between males and females.
- Use dichotomous predictor of gender
What is the decision tree for multiple regression? - (4)
- Continous
- Two or more predictors that are continous
- Multiple regression
- Meets assumptions of parametric tests
Multiple regression is the same as simple linear regression expect for - (2)
every extra predictor you include, you have to add a coefficient;
so, each predictor variable has its own coefficient, and the outcome variable is predicted from a combination of all the variables multiplied by their respective coefficients plus a residual term
Multiple regression equation
In multiple regression equation, list all the terms - (5)
- Y is the outcome variable,
- b1 is the coefficient of the first predictor (X1),
- b2 is the coefficient of the second predictor (X2),
- bn is the coefficient of the nth predictor (Xn),
- εi is the difference between the predicted and the observed value of Y for the ith participant.
Multiple regression uses the same principle as linear regression in a way that
we seek to find the linear combination of predictors that correlate maximally with the outcome variable.
Regression is a way of predicting things that you have not measured by predicting
an outcome variable from one or more predictor variables
Can’t plot a 3D plot of MR as shown here
for more than 2 predictor (X) variables
If you got two prediictors thart overlap and correlate a lot then it is a .. model
bad model can’t uniquely explain the outcome
In Hierarchical regression, we are seeing whether
one model explains significantly more variance than the other
In hierarchical regression predictors are selected based on
past work and the experimenter
decides in which order to enter the predictors into the model
As a general rule for hierarchical regression, - (3)
known predictors (from other research) should be entered into the model first in order of their importance in predicting the outcome.
After known predictors have been entered, the
experimenter can add any new predictors into the model.
New predictors can be entered either all in one go, in a stepwise manner, or hierarchically (such that the new predictor
suspected to be the most important is entered first).
Example of hierarchical regression in terms of album sales - (2)
The first model allows all the shared variance between Ad budget and Album sales to be accounted for.
The second model then only has the option to explain more variance by the unique contribution from the added predictor Plays on the radio.
What is forced entry MR?
method in which all predictors are forced
into the model simultaneously.
Like HR, forced entry MR relies on
good theoretical reasons for including the chosen predictors,
Different from HR, forced entry MR
makes no decision about the order in which variables are entered.
Some researchers believe that about forced entry MR that
this method is the only appropriate method for theory testing because stepwise techniques are influenced by random variation in the data and so rarely give replicable results if the model is retested.
Why select colinearity diagnostics in statistics box for multiple regression? - (2)
This option is for obtaining collinearity statistics such as the
VIF, tolerance,
Checking assumption of no multicolinearity
Multicollinearity poses a problem only for multiple regression because
simple regression requires only one predictor.
Perfect collinearity exists in multiple regression when at least
e.g., two predictors are perfectly correlated , have a correlation coefficient of 1
If there is perfect collinearity in multiple regression between predictors it
becomes impossible
to obtain unique estimates of the regression coefficients because there are an infinite number of combinations of coefficients that would work equally well.
Good news is perfect colinearity in multiple regression is rare in
real-life data
If two predictors are perfectly correlated in multiple regression then the values of b for each variable are
interchangable
As colinearity increases in multiple regression, there are 3 problems that arise - (3)
- Untrustory bs
- Limit size of R
- Importance of predictors
One way of identifying multicollinearity in multiple regression is to scan a
a correlation matrix of all of the predictor
variables and see if any correlate very highly (by very highly I mean correlations of above .80
or .90)
The VIF indicates in multiple regression whether a
predictor has a strong linear relationship with the other predictor(s).
If VIF statistic above 10 or approaching 10 in multiple regression then what you would want to do is have a - (2)
look at your variables to see if you need to include all variables whether all need to go in model
if high correlation between 2 predictors (measuring same thing) then decide whether its important to include both vars or take one out and simplify regression model
Related to the VIF in multiple regression is the tolerance
statistic, which is its
reciporal (1/VIF) = inverse of VIF
In Plots in SPSS, you put in multiple regression - (2)
ZRESID on Y and ZPRED on X
Plot of residuals against predicted to asses homoscedasticity
What is ZPRED in MR? - (2)
(the standardized predicted values of the dependent variable based on the model).
These values are standardized forms of the values predicted by the model.
What is ZRESID in MR? - (2)
(the standardized residuals, or errors).
These values are the standardized differences between the observed data and the values that the model predicts).
SPSS in multiple linear regression gives descriptive outcoems which is - (2)
- basics means and also a table of correlations between variables.
- This is a first opportunity to determine whether there is high correlation between predictors, otherwise known as multi-collinearity
In model summary of SPSS, it captures how the model or models explain in MR
variance in terms of R squared, and more importantly how R squared changes between models and whether those changes are significant.
Diagram of model summary
What is the measure of R^2 in multiple regression
measure of how much of the variability in the outcome is accounted for
by the predictors
The adjusted R^2 gives us an estimate of in multiple regression
fit in the general population
The Durbin-Watson statistic if specificed in multiple regresion tells us whether the - (2)
assumption of independent errors is tenable (value less than 1 or greater than 3 raise alarm bells)
value closer to 2 the better = assumption met
SPSS output for MR = ANOVA table which performs
F-tests for each model
SPSS output for MR contains ANOVA that tests whether the model is
significantly beter at predicting the outcome than using the mean as a ‘best guess’
The F-ratio represents the ratio of
improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model
We are told the sum of squares for model (SSM) - MR regression line in output which represents
improvement in prediction resulting from fitting a regression line to the data rather than using the mean as an estimate of the outcome
We are told residual sum of squares (Residual line) in this MR output which represents
total difference between
the model and the observed data
DF for Sum of squares Model for MR regression line is equal to
number of predictors (e.g., 1 for first model, 3 for second)
DF for Sum of Squares Residual for MR is - (2)
Number of observations (N) minus number of coefficients in regression model
(e.g., M1 has 2 coefficents - one for predictor and one for constant, M2 has 4 - one for each 3 predictor and one for constant)
The average sum of squares in ANOVA table is calculated by
calculated for each term (SSM, SSR) by dividing the SS by the df. T
How is the F ratio calculated in this ANOVA table?
F-ratio is calculated by dividing the average improvement in prediction by the model (MSM) by the average
difference between the model and the observed data (MSR)
If the improvement due to fitting the regression model is much greater than the inaccuracy within the model then value of F will be
greater than 1 and SPSS calculates exact prob (p-value) of obtaining value of F by change
What happens if b values are positive in multiple regression?
there is a positive relationship between the predictor and the outcome,
What happens if the b value is negative in multiple regression?
represents a negative relationship between predictor and outcome variable?
What do the b values in this table tell us what relationships between predictor and outcome variable in multiple regression? (3)
Indicating positive relationships so as advertising budget increases, record sales increases (outcome)
plays on ratio increase as do record sales
attractiveness of band increases record sales