[L2] Descriptive Statistics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Used with ordinal data or with non-normal distributions.

A

INTER-QUARTILE-RANGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A ___ is drawn for each category, where the height of the
bars represent the frequency or number of members of
that category.

A

bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

indicates the
percentage of scores that fall below the upper limit of
each interval

A

Cumulative Percentage Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to compute the Median

A
  1. Scores should be placed in ascending order of size, from
    the smallest to the largest score.
  2. When there is an odd number of scores in the
    distribution, halve the number and take the next whole
    number up.
  3. If there are an even number of scores, the median is the mean of the two middle scores.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Instead of using bars, a ____ is plotted over the midpoint
of each interval at a height corresponding to the
frequency of the interval. Points are joined by a ___-

A

point; straight
line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To find the range we find the ____ value (2) and the
____ value (17).

A

lowest; highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The computer defines that point as an ____.

A

outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • Called the ___ mean.
  • Calculated by adding up all the scores and dividing by the
    number of individual scores.
A

arithmetic; MEAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

if outliers can not be eliminated and you are convinced
that you have a genuine measurement, then you have a

___.

A

dilemma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When the distribution is ____ (i.e. has one mode) and
____, then the mode, median, and mean will have
very similar values.

A

unimodal, symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

It is the distance between the highest score and the lowest
score.

A

range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

_____ sometimes occur most commonly when we
are trying to ask questions to measure the range of some
variable, and the questions are all too easy, or too low
down the scale

A

Ceiling Effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(e.g., histograms, bar chart etc) – is used to present
the pattern in the data.

A

Charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

small sample size, not normally distributed

A

nonparametric test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Loosely known as the average. In statistical description,
though, we have to be more precise about just ___ we mean.

A

what sort
of average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

____should not overwhelm the reader who is
trying to see what is going on.

A

Data presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

But as long as the assumptions are __ to any
great extent, we will be OK.

A

not violated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The downfall of the mean is that it is affected by ___

A

skew and
outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The median is ____than the mean to extreme
scores.

A

less sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

___ occurs when only few of the subjects are
strong enough to get off the floor.

A

Floor Effect –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

when data are measured on an ____, it is tricky
and difficult to decide whether to use the mean, median,
or even the mode

A

ordinal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The ___ then extend from the box to the highest
and lowest points – unless this would mean that the
length of the whisker would be more than 1.5 times the
length of the box.

A

whiskers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

There are ____that can be used to
draw a normal distribution. These equations can be used
in statistical tests.

A

mathematical equations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Small number of data points that lie outside the
distribution when the distribution is approximately
normal.

A

OUTLIERS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The first thing to describe is the ___, to
show the kinds of numbers that we have.

A

distribution of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When the distribution is skewed, the ___ have
the effect of pulling the mean away from the true value.

A

skewed values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The Normal Distribution
* Also known as the ___

A

Gaussian Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The central tendency does not mean a lot without a
____

A

measure of dispersion or spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

It s very hard to interpret a measure of central tendency
without also having a ___

A

measure of dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

indicates the
number of scores that fall below the upper limit of each
interval.

A

Cumulative Frequency Distribution –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

If median is used as a measure of central tendency, the
IQR is probably used as a ___

A

measure of dispersion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Some statisticians would argue that things like ___ scales can only be considered to be
ordinal data.

A

personality
measures and attitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

It is the distance between the upper and lower quartiles.

A

INTER-QUARTILE-RANGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

___ has some serious implications for some types of
data analysis.

A

Skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

When deciding which to use, take into account the
___

A

distribution of the scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

__- Effects are common in many measures in
Psychology.

A

Floor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

As long as the distribution is ____distribution, it will not matter too much.

A

close to a normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

frequency distributions of Nominal or
Ordinal Data are customarily plotted using a bar graph.

A

Bar Graph –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

– when there are too many people, too
far away, in the tails of the distribution.

A

Negative Kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

S - Greek letter called “____” or “summation of” or
“add up” or “take the sum of.

A

Sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Under most circumstances, of the measures used for
central tendency, the mean is ___

A

least subject to sampling
variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q
  • A non-symmetrical distribution is said to be skewed.
A

SKEW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

indicates the
proportion of the total number of scores in each interval.

A

Relative Frequency Distribution –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

The shape is a pattern that forms when a histogram is
plotted and is known as the ___.

A

distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

In a skewed distribution the mean, median, and mode are
____

A

not the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

sample standard deviation
population standard deviation

A
  • s; σ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

First, because it is not symmetrical – this is called __

A

SKEW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

____helps a researcher
understand the data that he has, while ____help him explain to other people what is
happening to his data.

A

Exploratory data analysis (EDA); descriptive
statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

In which case it extends to the furthest point which
means it does not exceed ____

A

1.5 times the length of the box.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

The range suffers from one huge problem, in that it is
massively ___ that occur

A

affected by any outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

In some way refers to the most central value of a data set
with different interpretations of the sense of “___”.

A

central

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

___ differ between statisticians. There is a very fuzzy line between what could definitely
be called ____

A

Opinions; ordinal and interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Mean is pronounced as ___.

A

x-bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

If our assumptions are ____, then some of the
things we say (results of analysis) will be wrong.

A

wrong (violated)

55
Q

The ___ is equal to the sum of the mean of each
group times the number of scores in the group, divided by
the sum of the number of scores in each group.

A

overall mean

56
Q

When data are grouped –___

A

some information is lost

57
Q

The ___ is often the best average, for a couple of
reasons.
* Unlike the median, it uses all of the information available.
Every number in the data set has an influence on the
mean.
The mean also has useful distributional properties which
the median does not have.

A

mean

58
Q

= refers to the number of people in the sample.

A

N

59
Q

Ability to test population parameter

A

parametric tests

60
Q
  • Second most common measure of central tendency.
A

MEDIAN

61
Q

Different ways of Describing the Distribution

A
  • Frequency Table
  • Charts
62
Q

The mean is sensitive to the ___of all the scores
in the distribution.

A

exact value

63
Q

Generally, the ___we present our data, the
____ them, and the more space they take
up.

A

more accurately; less we summarize

64
Q

____ is just a “posh” way of saying average

A

Central tendency

65
Q

Unlike the range, the IQR does not go to the ends of the
scales, and is therefore not affected by ____.

A

outliers

66
Q

If our assumptions about data are wrong, but not too
wrong, we need to be aware that our statistics will not be

____

A

perfectly correct.

67
Q

That is why we use the unbiased standard deviation, or
the )___

A

population standard deviation.

68
Q

A ___is symmetrical and____. It
curves outwards at the top and then inwards nearer the
bottom, the tails getting thinner and thinner.

A

normal distribution; bell shaped

69
Q

– when there are insufficient people in
the tail (ends) of the scores to make the distribution
normal.

A

Positive Kurtosis

70
Q

Histograms are very important in data analysis, because
they allow us to examine the ___ of the distribution of a
variable.

A

shape

71
Q

The ____ of a set of scores is just the square of the
Standard Deviation.

A

variance

72
Q

The sum of the deviations about the mean equals
____.
(Mean is the balance point of the distribution)

A

zero

73
Q

A lot of
____ depend on the data being from a normal
distribution.

A

tests

74
Q

the mean of all the means do match the
population mean – hence the mean is an ____

A

unbiased
estimator.

75
Q

Best measure of central tendency for CATEGORICAL
data (although it is not even very useful for that). Rarely used in research.

A

mode

76
Q

Data will
___ form a perfect normal distribution.

A

never

77
Q

___presents the score values and
their frequency of occurrence.

A

Frequency distribution –

78
Q

___, _____– one of the most
useful graphical techniques in presenting, summarizing
data.

A

or box and whisker plot Boxplot

79
Q

The 1st quartile happens one quarter of the way up the
data, which is also the ___; 2nd =___; 3rd =__

A

25th centile. 50th centile. 75th centile

80
Q

Assumptions about data when calculating and
Interpreting the Mean

A
  1. The distribution is symmetrical. This means there is not
    much SKEW, and no OUTLIERS on one side.
  2. The data are measured at the INTERVAL or RATIO
    level.
81
Q

Skew often happens because of ___

A

Floor Effect or a Ceiling
Effect

82
Q

The ___ is the simplest measure of dispersion

A

range

83
Q

The range is only rarely used in ___-

A

Psychological Research.

84
Q

A large number of ____make the assumption
that the data form a normal distribution

A

statistical tests

85
Q

– also used to represent interval or
ratio data.

A

Frequency polygon

86
Q

Cumulative Percentage –

A

cum f / N x 100

87
Q

Range can be expressed as a
__
_ number, or sometimes it is
expressed as the ____

A

single; highest and lowest scores.

88
Q

The separation of the mean, median and mode in the
direction of the skew is a
_____in a skewed
distribution.

A

consistent effect

89
Q

property of median

A

Under usual circumstances, the median is more subject
to sampling variability than the mean but less subject to
sampling variability than the mode.

90
Q

in grouping data – ___

A

how wide should interval be?

91
Q
  • Much trickier than Skew but is usually less of a problem.
  • Occurs when there are either too many people at the
    extremes of the scale, or not enough people at the
    extremes
A

KURTOSIS

92
Q

It is also not affected by SKEW and KURTOSIS to any
great extent.

A

IQR

93
Q

Classes of Kurtosis

A

Leptokurtic (thin)
Mesokurtic
Platykurtic (flat)

94
Q

Cumulative Frequency –

A

frequency of interval +
frequencies of all class intervals below it.

95
Q

When the distribution is skewed, the __ is more
representative value of central tendency.

A

median

96
Q

describe/summarize the data a
researcher has.

A

Descriptive Statistics –

97
Q

The mean is very sensitive to ____ scores.

A

extreme

98
Q

The ___ the interval, the
___ information is lost.

A

wider, more

99
Q

Second, because it is not the characteristic bell shape –
this is called ___.

A

KURTOSIS

100
Q

– IQR divided by 2.

A

Semi-inter-quartile range

101
Q

Constructing a frequency distribution of grouped scores

A
  1. Find the range of the scores.
  2. Determine the width of each class interval (i).
  3. List the limits of each class interval, placing the interval
    containing the lowest score value at the bottom.
  4. Tally the raw scores into the appropriate class intervals.
  5. Add the tallies for each interval to obtain the interval
    frequency
102
Q

Relative Frequency –

A

f/N

103
Q

The sample standard deviations would, on average, be a
___

A

bit too low.

104
Q

For statistics to be correct, we need to make some
___.

A

assumptions

105
Q

It is also like the mean in that it needs to make some
assumptions about the shape of the distribution.
* To calculate the SD, we must assume that we have a
_____

A

normal distribution._

106
Q

Don’t draw a bar chart for ___

A

continuous measures.

107
Q
  • The most frequent score in the distribution or the most common observation among a group of scores.
A

MODE

108
Q

Properties of the Standard Deviation

A
  1. The SD gives us a measure of dispersion relative to the
    mean.
  2. The SD is sensitive to each score in the distribution.
  3. Like the Mean, the SD is stable with regard to
    sampling fluctuations.
109
Q

When presented in a table, the score values are listed in
____ with the lowest score value usually at the
bottom of the table

A

rank order,

110
Q

Distributions can be of wrong shape for two reasons.

A
  • First, because it is not symmetrical
  • Second, because it is not the characteristic bell shape
111
Q

___causes negative skew and are much less
common in Psychology.

A

Ceiling Effect

112
Q

There are ___in a variable – they are the three
values that divide the variable into four groups.

A

three quartiles

113
Q

___– the curve rises slowly and then
decreases rapidly.

A

Negative Skew

114
Q

___ – the curve rises rapidly and then drops off
slowly.

A

Positive Skew

115
Q

In which case it extends to the furthest point which
means it does not exceed ____

A

1.5 times the length of the box.

116
Q

What to do when there are outliers.

A
  1. See if you have made an error.
  2. Check if any measurement that you took was carried
    out correctly.
117
Q

Usually easily spotted in histograms. easy to spot but deciding what to do with
them can be much trickier.

A

OUTLIERS

118
Q

How to find the IQR

A
  1. Scores are placed in rank order and counted.
  2. The half-way point is the median.
  3. The IQR is the distance between the quarter and threequarters
    distance points
119
Q

In a frequency distribution mode is very easy to see because it
is the ___of the distribution

A

highest point

120
Q

___ is like the mean, in that it
takes all of the values in the dataset into account when it
is calculated.

A

The Standard Deviation

121
Q

____ are plotted on the horizontal axis such that
each class bar begins and terminates at the real limits of
the interval.

A

Class intervals

122
Q

Majority would argue that these can be considered to be
an___and therefore it is OK to use the ___.

A

interval data ; mean

123
Q

It is the middle score in a set of scores.
* Used when the mean is not valid, which might be because
the data are not symmetrically or normally distributed, or
because the data are measured in an ordinal level.

A

MEDIAN

124
Q

The median in a boxplot is represented with a t__

A

thick line.

125
Q

The variance is not used much in descriptive statistics
because it gives us squared units of measurement.
However, it is used quite frequently in___

A

inferential
statistics.

126
Q

don’t refer to the Normal Distribution as either of the
following; ____

A

usual, regular, standard, or even distribution.

127
Q

– therefore means “add up all the scores in x.

A

Σx

128
Q

Options

A
  1. Eliminate the point and carry on with the analysis.
  2. If you keep the data point then it may well have a large
    effect on your analysis and you will analyze your data
    badly.
129
Q

The sample standard deviation suffers from a problem –
it is a biased estimator of the _____

A

population standard
deviation.

130
Q

The sum of the squared deviations of all the scores
about their mean is a ___.

A

minimum

131
Q

A very large number of ____ are
normally distributed.

A

naturally occurring variables

132
Q

While our assumptions will usually be broadly correct,
they will never be ___ correct.

A

exactly

133
Q

__ – used to represent frequency distributions
composed of interval or ratio data. Bar is drawn for each
class interval.

A

Histogram

134
Q

It is most commonly used to describe some aspect of a
sample which___

A

does not need to be summarized with any
degree of accuracy.