Week 1: The Descriptive Stats of Outcomes - How is the Data Distributed and How can we Assess the Distribution Flashcards
*
What distribution is needed for parametric tests?
A normal distribution
The normal distribution curve is also referred as the
bell curve
Normal distribution is symmetrical meaning
This means that the distribution curve can be divided in the middle to produce two equal halves
The bell curve can be described using two parameters called (2)
- Mean (central tendency)
- Standard deviation (dispersion)
μ is
mean
σ is
standard deviation
Diagram shows:
e.g., If we move 1σ to the right then it contains 34.1% of the valeues
Many statistical tests (parametric) cannot be used if the data are not
normally distributed
The mean is the sum of
scores divided by the number of scores
Mean is a good measure of
central tendency for roughly symmetric distributions
The mean can be a misleading measure of central tendency in skewed distributions as
it can be greatly influenced by scores in tail e.g., extreme values
Aside from the mean, what are the 2 other measured of central tendency? - (2)
- Median
- Mode
The median is where (2)
the middle score when scores are ordered.
the middle of a distribution: half the scores are above the median and half are below the median.
The median is relatively unaffected by … and can be used with… (2)
- extreme scores or skewed distribution
- can be used with ordinal, interval and ratio data.
The mode is the most
frequently occurring score in a distribution, a score that actually occurred
The mode is the only measure of central tendency that can be used with
with nominal data
The mode is greatly subject to
sample fluctuations and is therefore not recommended to be used as the only measure of central tendency
Many distributions have more than one
mode
The mean, median and mode are identical in
symmetric distribtions
For positive skewed distribution, the
mean is greater than the median, which is greater than the mode
For negative skewed distribution
usually the mode is greater than the median, which is greater than the mean
Kurtosis in greek means
bulge or bend in greek
What is central tendency?
the tendency for the values of a random variable to cluster round its mean, mode, or median.
Diagram of normal kurotsis, positive excess kurotsis (leptokurtic) and negative excess kurotsis (platykurtic)
What does lepto mean?
prefix meaning thin
What is platy
a prefix meaning flat or wide (think Plateau)
Tests of normality (2)
Kolmogorov-Smirnov test
Shapiro-Wilks test
Tests of normality is dependent on
sample size
If you got a massive sample size then you will find these normality tests often come out as …. even when your data visually can look - (2)
significant
normally disttibuted
If you got a small sample size, then the normality tests may look non-siginificant, even when data is normally distributed, due to
lack of power in the test to detect a significant effect
There is no hard or fast rule for
determining whether data is normally distributed or not
Plot your data because this helps inform on what decisions you want to make with respect to
normality
Even if normality test is sig and data looks visually normally distributed then still do
parametric tests
A frequency distribution or a histogram is a plot of how many times
each score occurs
2 main ways a distribution can deviate from the normal - (2)
- Lack of symmetry (called skew)
- Pointyness (called kurotsis)
In a normal distribution the values of skew and kurtosis are 0 meaning…
tails of the distribution are as they should be
Is age nominal or continous?
Continous
Is gender continous or nominal?
Nominal
Is height continous or nominal?
Continous
Which of the following best describes a confounding variable?
A. A variable that affects the outcome beingmeasured as well as, or instead of, theindependent variable
B. A variable that is manipulated by theexperimenter
C. A variable that has been measured using an unreliable scale
D.A variable that is made up only of categories
A
If a test is valid , what does it mean?
A.The test measures what it claims to measure.
B. The test will give consistent results. (Reliability)
C.The test has internal consistency (measure for correlations between different items on same test = see if it measures same construct)
D. Test measures a useful construct or variable = test can measure something useful but not valid
A
A variable that measures the effect that manipulating another variable has is known as:
A. DV
B. A confounding variable
C. Predictor variable
D. IV
A
The discrepancy between the numbers used to represent something that we are trying tomeasure and the actual value of what we are measuring is called:
A. Measurement error
B. Reliability
C. The ‘fit’ of the model
D. Variance
A
A frequency distribution in which low scores are most frequent (i.e. bars on the graph arehighest on the left hand side) is said to be:
A. Positively skewed
B. Leptokurtic = distribution with positive kurotsis
C. Platykurtic = negative kruotsis
D. Negatively skewed = frequent scores
A
Which of the following is designed to compensate for practice effects?
A. Counterbalancing
B. Repeated measures design = practice effects issue in repeated measures
C. Giving a participants a break between tasks = this compenstates for bordeom effects
D. A control condition = provides reference point
A
Variation due to variables that has not been measured is
A. Unsystematic variation
B. Homogenous variance = assumption variance each population is equal
C. Systematic variation = due to exp manpulation
D. Residual variance = confirms how well regression line constructed fit to actual data
A
Purpose of control condition is to
A. Allow inferences about cause
B. Control for participants’ characteristics = randomisation
C. Show up relationship between predictor variables
D. Rule out tertium quid
A Allow inferences of cause
If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z-score for a score of 18?
A. -2
B. 11
C. 2
D. -1.41
A (18-26) = -8/4 = -2
The standard deviation is the square root of the
A. Variance
B. Coefficient of determination = r squared
C. Sum of squares = sum of squared deviances
D. Range = largest = smallest
A
Complete the following sentence:A large standard deviation (relative to the value of the mean itself
A. Indicate data points are distant from man (i.e., poor fit of data)
B. Indicate the data points are close to mean
C. Indicate that mean is good fit of data
D. Indicate that you should analyse data with parameteric
A
The probability is p = 0.80 that a patient with a certain disease will besuccessfully treated with a new medical treatment. Suppose that thetreatment is used on 40 patients. What is the “expected value” of thenumber of patients who are successfully treated?
A. 32
B. 20
C. 8
D. 40
A = 80% of 40 is 32 (0.80 * 40)
Imagine a test for a certain disease.
Suppose the probability of a positive test result is .95if someone has the disease, but the probability is only .08 that someone has the disease if his or her test result was positive.
A patient receives a positive test, and the doctor tellshim that he is very likely to have the disease. The doctor’s response is:
A. confusion of intervse
B. Law of small numbers
C. Gambler’s fallacy
D. Correct, because test is 95% accurate when someone has the disease = incorrect as doctor based assumption on incorrect inverse proability
A
Which of these variables would be considered not to have met the assumptions ofparametric tests based on the normal distribution?
(Hint many statistical tests rely on data measured on interval level)
A. gender
B. Reaction time
C. Temp
D. Heart rate
A
The test statistics we use to assess a linear model are usually _______ based on thenormal distribution
(Hint: These tests are used when all of the assumptions of a normal distribution havebeen met
A. Parametric
B. Non-parametric
C. Robust
D. Not
A
Which of the following is not an assumption of the general linear model?
A. Dependence
B. Addictivity
C. Linearity
D. Normally distributed residuals
A = independence is an assumption of parametric and not dependence
Looking at the table below, which of the following statements is the most accurate?
Hint: The further the values of skewness and kurtosis are from zero, the more likely it is that thedata are not normally distributed
A. For the number of hours spent practicsing , there is not an issue of kruotsis
B. For level of msucial skill, data are heavily negatively skewed
C. For number of hours spent practicsing there is an issue of kruotsis
D. For the number of hours spent practicsing, the data is fairly positively skewed
A - correct
B. Incorrect as value of skewnessis –0.079, which suggests that the dataare only very slightly negatively skewedbecause the value is close to zero
C. Incorrect as value of kurtosis is0.098, which is fairly close to zero,suggesting that kurtosis was not aproblem for these data
D. Incorrect as value of skewnessfor the number of hours spent practisingis –0.322, suggesting that the data areonly slightly negatively skewed
Diagram of skewness
In SPSS, output if value of skewness is between -1 and 1 then
all good
In SPSS, output if value is below -1 or above 1 then
data is skewed
In SPSS, output if value of skewness is below -1 then
negatively skewed
In SPSS, output if value of skewness is above 1 then
positively skewed
Diagram of lepto kurotic, platykurtic and mesokurtic( normal)
What does kurotsis tell you?
how much our data lies around the ends/tails of our histogram which helps us to identify when outliers may be present in the data.
A distribution with positive kurtosis, so much of the data is in the tails, will be
pointy or leptokurtic
A distribution with negative kurtosis, so the data lies more in the middle, will be more
be more sloped or platykurtic
Kurtosis is the sharpness of the
peak of a frequency-distribution curve
If our Kurtosis value is 0, then the result is a
normal distribution
If kurotsis value in SPSS between -2 and 2 then
all good! = normal distribution
If kurotsis value in SPSS less than -2 then
platykurtic
If kurotsis value is greater than 2 in SPSS then
leptokurtic
Are we good for skewness and kurotsis in this output SPSS?
Good because both the skewness is between -1 and 1 and kurtosis values are between -2 and 2.
Are we good for skewness and kurotsis in this output SPSS?A
Bad because although the skewness is between 1 and -1, we have a problem with kurtosis with a value of 2.68 which is larger than 2 and -2
Correlational research doesn’t allow to rule out the presence of a
third variable = confounding variable
e.g, we find that drownings and ice cream sales are correlated, we conclude that ice cream sales cause drowning. Are we correct? Maybe due to the weather
The tertium quid is a variable that you may not have considered that could be
influencing your results e.g., ice cream and drowning session
How to rule out tertium quid? - (2)
Use of RCTs.
Randomized Controlled Trials allow to even out the confounding variables between the groups
Correlation does not mean
causation
To infer causation,
we need to actively manipulate the variable we are interested in, and control against a group (condition) where this variable was not manipulated.
Correlation does not mean causation as according to Andy
causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results”
Aside from checking of kurotsis and skewness assumptions in data also check if it has
linearity or less commonly additivity
Additivity refrs to the combined
effect of many predictors
What does this diagram show in terms of additivty /linearity? - (5)
There is a a linear effect when the data increases at a steady rate like the graph on the left.
Your cost increases steadily as the number of chocolate bars increases.
The graph on the right shows a non-linear effect when there is not this steady increase rather there is a sharp change in your data.
So you might feel ok if you eat a few chocolate bars but after that the risk of you having a stomach ache increases quite rapidly the more chocolates you eat.
This effect is super important to check or your statistical analysis will be wrong even if your other assumptions are correct because a lot of statistical tests are based on linear models.
Discrepnacy between measurement and actual value in population is .. and not..
measurement error and NOT variance
Measurement error can happen across all psychological experiments from.. to ..
recording instrument failure to human error
What are the 2 types of measurement errors? - (2)
- Systematic
- Random
What is systematic measurement error?
: predictable, typically constant or proportional to the true value and always affect the results of an experiment in a predictable direction
Example of systematic measurement error
for example if I know I am 5ft2 and when I go to get measured I’m told I’m 6ft this is a systematic error and pretty identifiable - these usually happen when there is a problem with your experiment
What is random measurement error?
measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken.
Example of random measurement error
for example my height is 5ft2 when I measure it in the morning but its 5ft when I measure myself in the evening. This is because my measurements were taken at different times so there would be some variability – for those of you who believe you shrink throughout the day.
What is variance?
Average squared deviation of each number from its mean.
Variability is an inherent part of
things being measured and of the measurement process
Diagram of variance formula
In central limit theorem - (2)
states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for sample sizes over 30.
Therefore, as a sample size increases, the sample mean and standard deviation will be closer in value to the population mean μ and standard deviation σ .
What does histogram look at? - (2)
Frequency of scores
Look at distribution of data, skewness, kurotsis
What does boxplot look at? - (2)
To identify outliers
Shows median rather than mean (good for non-normally distributed data)
What do line graphs are?
simply bar charts with lines instead of bars
Bar charts are a good way to display
display means (and
standard errors)
What do scatterplot illustrates? - (2)
a relationship between two variables, e.g. correlation or regression
Only use regression lines for regressions!
What are matrix scatterplots? - (2)
Particular kind of scatterplot that can be used instead of the 3-D scatterplot
clearer to read
Using data provided how would you summarise skew?
A. The data has an issue with positive skew
B.The data has an issue with negative skew
C.The data is normally distributed
B
What is the median number of bullets shot at a partner by females?
67.00
What descriptive statistics does the red arrow represents?
A. Inter quartile range
B. Median
C. Mean
D. Range
A
What is the mean of males and females SD? - (2)
Males M = 27.29
Females SD = 12.20
What is the respective standard error of mean for femals and males?
3.26 & 3.42