Descriptive Statistics Flashcards
describe/summarize the data a researcher has
descriptive statistics
helps a researcher understand the data that he has, while descriptive statistics help him explain to other people what is happening to his data
Exploratory data analysis (EDA)
The first thing to describe is the distribution of the data,
to show the kinds of numbers that we have.
describing data
- Different ways of Describing the Distribution
- is used to
present the pattern in the data.
- Frequency Table
- Charts (e.g., histograms, bar chart etc)
frequency distributions of nominal or ordinal data are customarily plotted using a ______
bar graph
____ drawn for each category, where the height of the
bars represent the frequency or number of members of
that category.
Bar
used to represent frequency distributions
composed of interval or ratio data. Bar is drawn for each
class interval.
- Class intervals are plotted on the horizontal axis such
that each class bar begins and terminates at the real
limits of the interval.
histogram
also used to represent interval or
ratio data.
Instead of using bars, a point is plotted over the midpoint
of each interval at a height corresponding to the
frequency of the interval. Points are joined by a straight
line.
frequency polygon
Don’t draw a bar chart for ___
Continuous measures
presents the score values and
their frequency of occurrence.
When presented in a table, the score values are listed in
rank order, with the lowest score value usually at the
bottom of the table.
Frequency distribution
in grouping data
how wide should interval be?
When data are grouped
some information is lost
The wider the interval,
the more information is lost.
Constructing a frequency distribution of grouped scores
- Find the range of the scores.
- Determine the width of each class interval (i).
- List the limits of each class interval, placing the interval
containing the lowest score value at the bottom. - Tally the raw scores into the appropriate class intervals.
- Add the tallies for each interval to obtain the interval
frequency.
indicates the
proportion of the total number of scores in each interval.
Relative Frequency Distribution
indicates the
number of scores that fall below the upper limit of each
interval.
Cumulative Frequency Distribution
–indicates the
percentage of scores that fall below the upper limit of
each interval.
Cumulative Percentage Distribution
what is this symbol?
f/N
Relative Frequency
frequency of interval + frequencies of all class intervals below it.
Cumulative Frequency
what is this formula?
cum f / N x 100
cumulative percentage
_____are very important in data analysis, because
they allow us to examine the shape of the distribution of
a variable.
The shape is a pattern that forms when a _____ is
plotted and is known as the distribution.
histogram
the normal distribution also known as the
Gaussian Distribution
_____ symmetrical and bell shaped. It
curves outwards at the top and then inwards nearer the
bottom, the tails getting thinner and thinner.
normal distribution
is the data form a perfect normal distribution?
never but as long as the distribution is close to a normal
distribution, it will not matter too much.
A very ___ of naturally occurring variables are
normally distributed.
A _____ of statistical tests make the assumption
that the data form a normal distribution.
large number
don’t refer to the Normal Distribution as either of the
following;
usual, regular, standard, or even distribution.
Wrong Shape
Distributions can be of wrong shape for two reasons.
First, because it is not symmetrical –
Second, because it is not the characteristic bell shape
- SKEW
- KURTOSIS
A non-symmetrical distribution is said to be _____.
SKEW
the curve rises rapidly and then drops off
slowly.
positive skew
the curve rises slowly and then
decreases rapidly.
negative skew
Skewness has some serious implications for some types
of data analysis.
Skew often happens because of ____ or _____
floor effect or ceiling effect
occurs when only few of the subjects are
strong enough to get off the floor.
floor effect
causes negative skew and are much less
common in Psychology.
sometimes occur most commonly when we
are trying to ask questions to measure the range of some
variable, and the questions are all too easy, or too low
down the scale.
ceiling effect
Much trickier than Skew but is usually less of a problem.
Occurs when there are either too many people at the
extremes of the scale, or not enough people at the
extremes.
kurtosis
when there are insufficient people in
the tail (ends) of the scores to make the distribution
normal.
positive kurtosis
when there are too many people,
too far away, in the tails of the distribution.
negative kurtosis
_____ is just a “posh” way of saying average.
In some way refers to the most central value of a data
set with different interpretations of the sense of
“central”.
Loosely known as the average. In statistical description,
though, we have to be more precise about just what sort
of average we mean.
central tendency
Small number of data points that lie outside the
distribution when the distribution is approximately
normal.
Usually easily spotted in histograms.
______ are easy to spot but deciding what to do with
them can be much trickier.
outliers
The mean is very sensitive to _____
extreme scores
Called the arithmetic mean.
Calculated by adding up all the scores and dividing by the
number of individual scores.
Equation: (?) = ∑x / N
Mean
Under most circumstances, of the measures used for
central tendency, the mean is least subject to ______
sampling variation
For statistics to be correct, we need to make some _____
assumptions
The sum of the squared deviations of all the scores
about their mean is a ______
minimum
the _____ is equal to the sum of the mean of each
group times the number of scores in the group, divided
by the sum of the number of scores in each group.
overall mean
Second most common measure of central tendency.
It is the middle score in a set of scores.
Used when the mean is not valid, which might be
because the data are not symmetrically or normally
distributed, or because the data are measured in an
ordinal level.
Median
The median is _____ than the mean to extreme
scores.
less sensitive
The most frequent score in the distribution or the most
common observation among a group of scores.
Best measure of central tendency for CATEGORICAL data
(although it is not even very useful for that)
Rarely used in research.
mode
In a frequency distribution it is very easy to see because
it is the _______ of the distribution.
The problem with it is it does not tell us very much.
highest point
The _____ is the simplest measure of dispersion.
It is the distance between the highest score and the
lowest score.
It can be expressed as a single number, or sometimes it is
expressed as the highest and lowest scores.
range
To find the range we find the lowest value (2) and the
highest value (17). Sometimes the range is expressed as
a single figure, calculated as:
Range = Highest Value – Lowest Value
Used with ordinal data or with non-normal distributions.
If median is used as a measure of central tendency, the ___ is probably used as a measure of dispersion.
It is the distance between the upper and lower quartiles.
inter-quartile-range
There are ____ quartiles in a variable – they are the____ values that divide the variable into four groups.
three
The ____ quartile happens one quarter of the way up the
data, which is also the 25th centile.
1st quartile
The _____ quartile is the half-way point, which is the
median, and is also the 50th centile.
2nd quartile
The ____ quartile is the three-quarter-way point or the 75th
centile.
third quartile
symbol
s
sample standard deviation
______ is like the mean, in that it
takes all of the values in the dataset into account when
it is calculated.
It is also like the mean in that it needs to make some
assumptions about the shape of the distribution.
To calculate the _____, we must assume that we have a
normal distribution.
Standard Deviation
symbol
σ
population standard deviation
the _____ of a set of scores is just the square of the standard deviation
variance
the variance is not used much in descriptive statistics because it gives us squared units of measurement. however, it is used quite frequently in ___________
inferential statistics
- the SD gives us a measure of dispersion relative to the mean
- the SD is sensitive to each score in the distribution
- like the mean, the SD is stable with regard to sampling fluctuations
properties of the standard deviation
population standard deviation
boxplot or box and whisker plot