[L2] Descriptive Statistics Flashcards
Used with ordinal data or with non-normal distributions.
INTER-QUARTILE-RANGE
A ___ is drawn for each category, where the height of the
bars represent the frequency or number of members of
that category.
bar
indicates the
percentage of scores that fall below the upper limit of
each interval
Cumulative Percentage Distribution
How to compute the Median
- Scores should be placed in ascending order of size, from
the smallest to the largest score. - When there is an odd number of scores in the
distribution, halve the number and take the next whole
number up. - If there are an even number of scores, the median is the mean of the two middle scores.
Instead of using bars, a ____ is plotted over the midpoint
of each interval at a height corresponding to the
frequency of the interval. Points are joined by a ___-
point; straight
line.
To find the range we find the ____ value (2) and the
____ value (17).
lowest; highest
The computer defines that point as an ____.
outlier
- Called the ___ mean.
- Calculated by adding up all the scores and dividing by the
number of individual scores.
arithmetic; MEAN
if outliers can not be eliminated and you are convinced
that you have a genuine measurement, then you have a
___.
dilemma
When the distribution is ____ (i.e. has one mode) and
____, then the mode, median, and mean will have
very similar values.
unimodal, symmetrical
It is the distance between the highest score and the lowest
score.
range
_____ sometimes occur most commonly when we
are trying to ask questions to measure the range of some
variable, and the questions are all too easy, or too low
down the scale
Ceiling Effect
(e.g., histograms, bar chart etc) – is used to present
the pattern in the data.
Charts
small sample size, not normally distributed
nonparametric test
Loosely known as the average. In statistical description,
though, we have to be more precise about just ___ we mean.
what sort
of average
____should not overwhelm the reader who is
trying to see what is going on.
Data presentation
But as long as the assumptions are __ to any
great extent, we will be OK.
not violated
The downfall of the mean is that it is affected by ___
skew and
outliers.
The median is ____than the mean to extreme
scores.
less sensitive
___ occurs when only few of the subjects are
strong enough to get off the floor.
Floor Effect –
when data are measured on an ____, it is tricky
and difficult to decide whether to use the mean, median,
or even the mode
ordinal scale
The ___ then extend from the box to the highest
and lowest points – unless this would mean that the
length of the whisker would be more than 1.5 times the
length of the box.
whiskers
There are ____that can be used to
draw a normal distribution. These equations can be used
in statistical tests.
mathematical equations
Small number of data points that lie outside the
distribution when the distribution is approximately
normal.
OUTLIERS
The first thing to describe is the ___, to
show the kinds of numbers that we have.
distribution of the data
When the distribution is skewed, the ___ have
the effect of pulling the mean away from the true value.
skewed values
The Normal Distribution
* Also known as the ___
Gaussian Distribution
The central tendency does not mean a lot without a
____
measure of dispersion or spread.
It s very hard to interpret a measure of central tendency
without also having a ___
measure of dispersion
indicates the
number of scores that fall below the upper limit of each
interval.
Cumulative Frequency Distribution –
If median is used as a measure of central tendency, the
IQR is probably used as a ___
measure of dispersion.
Some statisticians would argue that things like ___ scales can only be considered to be
ordinal data.
personality
measures and attitude
It is the distance between the upper and lower quartiles.
INTER-QUARTILE-RANGE
___ has some serious implications for some types of
data analysis.
Skewness
When deciding which to use, take into account the
___
distribution of the scores.
__- Effects are common in many measures in
Psychology.
Floor
As long as the distribution is ____distribution, it will not matter too much.
close to a normal
frequency distributions of Nominal or
Ordinal Data are customarily plotted using a bar graph.
Bar Graph –
– when there are too many people, too
far away, in the tails of the distribution.
Negative Kurtosis
S - Greek letter called “____” or “summation of” or
“add up” or “take the sum of.
Sigma
Under most circumstances, of the measures used for
central tendency, the mean is ___
least subject to sampling
variation.
- A non-symmetrical distribution is said to be skewed.
SKEW
indicates the
proportion of the total number of scores in each interval.
Relative Frequency Distribution –
The shape is a pattern that forms when a histogram is
plotted and is known as the ___.
distribution
In a skewed distribution the mean, median, and mode are
____
not the same.
sample standard deviation
population standard deviation
- s; σ
First, because it is not symmetrical – this is called __
SKEW
____helps a researcher
understand the data that he has, while ____help him explain to other people what is
happening to his data.
Exploratory data analysis (EDA); descriptive
statistics
In which case it extends to the furthest point which
means it does not exceed ____
1.5 times the length of the box.
The range suffers from one huge problem, in that it is
massively ___ that occur
affected by any outliers
In some way refers to the most central value of a data set
with different interpretations of the sense of “___”.
central
___ differ between statisticians. There is a very fuzzy line between what could definitely
be called ____
Opinions; ordinal and interval.
Mean is pronounced as ___.
x-bar
If our assumptions are ____, then some of the
things we say (results of analysis) will be wrong.
wrong (violated)
The ___ is equal to the sum of the mean of each
group times the number of scores in the group, divided by
the sum of the number of scores in each group.
overall mean
When data are grouped –___
some information is lost
The ___ is often the best average, for a couple of
reasons.
* Unlike the median, it uses all of the information available.
Every number in the data set has an influence on the
mean.
The mean also has useful distributional properties which
the median does not have.
mean
= refers to the number of people in the sample.
N
Ability to test population parameter
parametric tests
- Second most common measure of central tendency.
MEDIAN
Different ways of Describing the Distribution
- Frequency Table
- Charts
The mean is sensitive to the ___of all the scores
in the distribution.
exact value
Generally, the ___we present our data, the
____ them, and the more space they take
up.
more accurately; less we summarize
____ is just a “posh” way of saying average
Central tendency
Unlike the range, the IQR does not go to the ends of the
scales, and is therefore not affected by ____.
outliers
If our assumptions about data are wrong, but not too
wrong, we need to be aware that our statistics will not be
____
perfectly correct.
That is why we use the unbiased standard deviation, or
the )___
population standard deviation.
A ___is symmetrical and____. It
curves outwards at the top and then inwards nearer the
bottom, the tails getting thinner and thinner.
normal distribution; bell shaped
– when there are insufficient people in
the tail (ends) of the scores to make the distribution
normal.
Positive Kurtosis
Histograms are very important in data analysis, because
they allow us to examine the ___ of the distribution of a
variable.
shape
The ____ of a set of scores is just the square of the
Standard Deviation.
variance
The sum of the deviations about the mean equals
____.
(Mean is the balance point of the distribution)
zero
A lot of
____ depend on the data being from a normal
distribution.
tests
the mean of all the means do match the
population mean – hence the mean is an ____
unbiased
estimator.
Best measure of central tendency for CATEGORICAL
data (although it is not even very useful for that). Rarely used in research.
mode
Data will
___ form a perfect normal distribution.
never
___presents the score values and
their frequency of occurrence.
Frequency distribution –
___, _____– one of the most
useful graphical techniques in presenting, summarizing
data.
or box and whisker plot Boxplot
The 1st quartile happens one quarter of the way up the
data, which is also the ___; 2nd =___; 3rd =__
25th centile. 50th centile. 75th centile
Assumptions about data when calculating and
Interpreting the Mean
- The distribution is symmetrical. This means there is not
much SKEW, and no OUTLIERS on one side. - The data are measured at the INTERVAL or RATIO
level.
Skew often happens because of ___
Floor Effect or a Ceiling
Effect
The ___ is the simplest measure of dispersion
range
The range is only rarely used in ___-
Psychological Research.
A large number of ____make the assumption
that the data form a normal distribution
statistical tests
– also used to represent interval or
ratio data.
Frequency polygon
Cumulative Percentage –
cum f / N x 100
Range can be expressed as a
__
_ number, or sometimes it is
expressed as the ____
single; highest and lowest scores.
The separation of the mean, median and mode in the
direction of the skew is a
_____in a skewed
distribution.
consistent effect
property of median
Under usual circumstances, the median is more subject
to sampling variability than the mean but less subject to
sampling variability than the mode.
in grouping data – ___
how wide should interval be?
- Much trickier than Skew but is usually less of a problem.
- Occurs when there are either too many people at the
extremes of the scale, or not enough people at the
extremes
KURTOSIS
It is also not affected by SKEW and KURTOSIS to any
great extent.
IQR
Classes of Kurtosis
Leptokurtic (thin)
Mesokurtic
Platykurtic (flat)
Cumulative Frequency –
frequency of interval +
frequencies of all class intervals below it.
When the distribution is skewed, the __ is more
representative value of central tendency.
median
describe/summarize the data a
researcher has.
Descriptive Statistics –
The mean is very sensitive to ____ scores.
extreme
The ___ the interval, the
___ information is lost.
wider, more
Second, because it is not the characteristic bell shape –
this is called ___.
KURTOSIS
– IQR divided by 2.
Semi-inter-quartile range
Constructing a frequency distribution of grouped scores
- Find the range of the scores.
- Determine the width of each class interval (i).
- List the limits of each class interval, placing the interval
containing the lowest score value at the bottom. - Tally the raw scores into the appropriate class intervals.
- Add the tallies for each interval to obtain the interval
frequency
Relative Frequency –
f/N
The sample standard deviations would, on average, be a
___
bit too low.
For statistics to be correct, we need to make some
___.
assumptions
It is also like the mean in that it needs to make some
assumptions about the shape of the distribution.
* To calculate the SD, we must assume that we have a
_____
normal distribution._
Don’t draw a bar chart for ___
continuous measures.
- The most frequent score in the distribution or the most common observation among a group of scores.
MODE
Properties of the Standard Deviation
- The SD gives us a measure of dispersion relative to the
mean. - The SD is sensitive to each score in the distribution.
- Like the Mean, the SD is stable with regard to
sampling fluctuations.
When presented in a table, the score values are listed in
____ with the lowest score value usually at the
bottom of the table
rank order,
Distributions can be of wrong shape for two reasons.
- First, because it is not symmetrical
- Second, because it is not the characteristic bell shape
___causes negative skew and are much less
common in Psychology.
Ceiling Effect
There are ___in a variable – they are the three
values that divide the variable into four groups.
three quartiles
___– the curve rises slowly and then
decreases rapidly.
Negative Skew
___ – the curve rises rapidly and then drops off
slowly.
Positive Skew
In which case it extends to the furthest point which
means it does not exceed ____
1.5 times the length of the box.
What to do when there are outliers.
- See if you have made an error.
- Check if any measurement that you took was carried
out correctly.
Usually easily spotted in histograms. easy to spot but deciding what to do with
them can be much trickier.
OUTLIERS
How to find the IQR
- Scores are placed in rank order and counted.
- The half-way point is the median.
- The IQR is the distance between the quarter and threequarters
distance points
In a frequency distribution mode is very easy to see because it
is the ___of the distribution
highest point
___ is like the mean, in that it
takes all of the values in the dataset into account when it
is calculated.
The Standard Deviation
____ are plotted on the horizontal axis such that
each class bar begins and terminates at the real limits of
the interval.
Class intervals
Majority would argue that these can be considered to be
an___and therefore it is OK to use the ___.
interval data ; mean
It is the middle score in a set of scores.
* Used when the mean is not valid, which might be because
the data are not symmetrically or normally distributed, or
because the data are measured in an ordinal level.
MEDIAN
The median in a boxplot is represented with a t__
thick line.
The variance is not used much in descriptive statistics
because it gives us squared units of measurement.
However, it is used quite frequently in___
inferential
statistics.
don’t refer to the Normal Distribution as either of the
following; ____
usual, regular, standard, or even distribution.
– therefore means “add up all the scores in x.
Σx
Options
- Eliminate the point and carry on with the analysis.
- If you keep the data point then it may well have a large
effect on your analysis and you will analyze your data
badly.
The sample standard deviation suffers from a problem –
it is a biased estimator of the _____
population standard
deviation.
The sum of the squared deviations of all the scores
about their mean is a ___.
minimum
A very large number of ____ are
normally distributed.
naturally occurring variables
While our assumptions will usually be broadly correct,
they will never be ___ correct.
exactly
__ – used to represent frequency distributions
composed of interval or ratio data. Bar is drawn for each
class interval.
Histogram
It is most commonly used to describe some aspect of a
sample which___
does not need to be summarized with any
degree of accuracy.