Week 4 Flashcards
What’s the importance of visuals in statistics?
-may have similar mean yet different spread of data which we can’t see solely looking at the numbers
-dots closer to line=shows stronger relationship between variables
What are histograms good at?
showing the distribution of data where you can have:
-symmetric data
-skewed right (+ skew like toes on right foot)
-skewed left (- skew like toes on left foot)
How does a stem and leaf plot work?
put the tens in a column and the units/end digits in a list
5 8
6 26778
7 14555 (e.g. one is 75)
We could compare the 2 data sets using one stem and leaf plot.
can do mode e.g., 75 median and mean
What are boxplots useful for?
-Boxplots are useful for showing medians, ranges, IQ ranges,
skewness etc.
-We could also compare the 2 data sets using box plot
How can you tell if boxplots are skewed?
right/+ly skewed = most data on the upper end of the scale
left/-ly skewed= most data trailing on the lower end of the scale
What is a normal distribution like?
normally distributed data fits nicely under a bell-shaped curve.
allowing us to do better and more accurate statistical tests.
Name the two ways in which a distribution can deviate from
normality
– Lack of symmetry (skewness)
– Pointiness (kurtosis)
What is an example of frequency distribution?
■ Histograms
■ They’re individual frequency bars
■ Each bar gives the frequency of a given value e.g., we can count how many people have a healthy heart rate
What is an example of probability distribution?
■ Bell curves
■ They’re smooth, but segmented by SDs
■ Area under curve is the the probability that value occurs
■ We can work out the likelihood of a person having a healthy heart rate e.g.
True or false: outliers have a bigger impact on smaller sized samples
True
Define skewness
■ Skewness is deviation from symmetry.
■histograms show a big difference between means, medians
and mode=skewed data
■ Skewness means some extreme scores are affecting the mean.
Define kurtosis (i.e. pointiness)
■ Kurtosis is a measure of the tailedness of a distribution
■ Tailedness = How often outliers occur
■ Three types = Mesokurtic (AKA zero, AKA normal); Leptokurtic
(AKA positive/thin); Platykurtic (AKA negative/flat)
What is Kurtosis: Leptokurtic
■ + kurtosis
■ High peak
■ Lepto = skinny (in the middle)
■ Fat tails (big gap underneath? check) (outliers): signifies
either lots of outliers or
occasional outliers which are
very extreme
What is Kurtosis: Platykurtic
■ Negative kurtosis
■ Flatter distribution
■ Platy = Broad (in the middle)
■ Skinny tails (outliers):
signifies few outliers or
outliers not so extreme
Why is the distribution so important?
■ Tells us which measure of central tendency/dispersion represents our sample best/to use normal distribution=mean and
standard deviation skewed data=median and ranges.
■ Also tells us which inferential statistics we should use.
How can you assess the distribution of data using SPSS/histograms?
-Perfectly normally distributed data has a skewness of 0.
-skewness statistic is > twice the standard error=data likely skewed.
Define standard error
An approximate standard deviation of the population. I.e. how far is the mean of our sample likely to be from the population mean.
What are the rules in producing a table?
– Labelled and titled.
– Placed at the top of the most appropriate page.
– Font and size should be the same as the main text.
– Logical and easy to understand.
Define figures
all other visuals that are not tables e.g. bar charts,scatter plots
What 2 types of bar charts can you have?
1.simple (bars separate)
2.clustered (bars together)
Explain what error bars are
-a visual representation of variability within your data. (on bar charts like the lines)
-Error bars hint at statistical significance
-two confidence intervals do not overlap=difference between
parameters will be significant.
-two confidence intervals do overlap=difference between two parameters can be significant or non-significant.
■ But – we have our p-values to tell us about statistical significance.
What’s the most common confidence interval used in error bars?
-95% Confidence intervals are
-It’s the % of times you expect to reproduce an estimate (e.g., a
mean) within the range.
■ E.g., You are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval.
When do the figures go in the appendix/results section?
A=When using figures to assess data (for you) e.g., Stem and leaf plots, boxplots and histograms which are used to assess distribution (e.g., skewness).
R=Figures used to visually present data analysis (for others) e.g., Bar graphs and scatterplots which are not used to assess data.
Summarise this lecture
■ Visual aids are useful tools to assess/explain some aspects
of the data.
■ When assessing data, they can show measures of central
tendency/measures of spread.
(put in the appendix of reports.)
■ When explaining data, they can show differences or
relationships in the data.
(in the results section of reports.)