[L2] Descriptive Statistics Flashcards

1
Q

Used with ordinal data or with non-normal distributions.

A

INTER-QUARTILE-RANGE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A ___ is drawn for each category, where the height of the
bars represent the frequency or number of members of
that category.

A

bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

indicates the
percentage of scores that fall below the upper limit of
each interval

A

Cumulative Percentage Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to compute the Median

A
  1. Scores should be placed in ascending order of size, from
    the smallest to the largest score.
  2. When there is an odd number of scores in the
    distribution, halve the number and take the next whole
    number up.
  3. If there are an even number of scores, the median is the mean of the two middle scores.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Instead of using bars, a ____ is plotted over the midpoint
of each interval at a height corresponding to the
frequency of the interval. Points are joined by a ___-

A

point; straight
line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To find the range we find the ____ value (2) and the
____ value (17).

A

lowest; highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The computer defines that point as an ____.

A

outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • Called the ___ mean.
  • Calculated by adding up all the scores and dividing by the
    number of individual scores.
A

arithmetic; MEAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

if outliers can not be eliminated and you are convinced
that you have a genuine measurement, then you have a

___.

A

dilemma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When the distribution is ____ (i.e. has one mode) and
____, then the mode, median, and mean will have
very similar values.

A

unimodal, symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

It is the distance between the highest score and the lowest
score.

A

range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

_____ sometimes occur most commonly when we
are trying to ask questions to measure the range of some
variable, and the questions are all too easy, or too low
down the scale

A

Ceiling Effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(e.g., histograms, bar chart etc) – is used to present
the pattern in the data.

A

Charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

small sample size, not normally distributed

A

nonparametric test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Loosely known as the average. In statistical description,
though, we have to be more precise about just ___ we mean.

A

what sort
of average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

____should not overwhelm the reader who is
trying to see what is going on.

A

Data presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

But as long as the assumptions are __ to any
great extent, we will be OK.

A

not violated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The downfall of the mean is that it is affected by ___

A

skew and
outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The median is ____than the mean to extreme
scores.

A

less sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

___ occurs when only few of the subjects are
strong enough to get off the floor.

A

Floor Effect –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

when data are measured on an ____, it is tricky
and difficult to decide whether to use the mean, median,
or even the mode

A

ordinal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The ___ then extend from the box to the highest
and lowest points – unless this would mean that the
length of the whisker would be more than 1.5 times the
length of the box.

A

whiskers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

There are ____that can be used to
draw a normal distribution. These equations can be used
in statistical tests.

A

mathematical equations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Small number of data points that lie outside the
distribution when the distribution is approximately
normal.

A

OUTLIERS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The first thing to describe is the ___, to show the kinds of numbers that we have.
distribution of the data
26
When the distribution is skewed, the ___ have the effect of pulling the mean away from the true value.
skewed values
27
The Normal Distribution * Also known as the ___
Gaussian Distribution
28
The central tendency does not mean a lot without a ____
measure of dispersion or spread.
29
It s very hard to interpret a measure of central tendency without also having a ___
measure of dispersion
30
indicates the number of scores that fall below the upper limit of each interval.
Cumulative Frequency Distribution –
31
If median is used as a measure of central tendency, the IQR is probably used as a ___
measure of dispersion.
32
Some statisticians would argue that things like ___ scales can only be considered to be ordinal data.
personality measures and attitude
33
It is the distance between the upper and lower quartiles.
INTER-QUARTILE-RANGE
34
___ has some serious implications for some types of data analysis.
Skewness
35
When deciding which to use, take into account the ___
distribution of the scores.
36
__- Effects are common in many measures in Psychology.
Floor
37
As long as the distribution is ____distribution, it will not matter too much.
close to a normal
38
frequency distributions of Nominal or Ordinal Data are customarily plotted using a bar graph.
Bar Graph –
39
– when there are too many people, too far away, in the tails of the distribution.
Negative Kurtosis
40
S - Greek letter called “____” or “summation of” or “add up” or “take the sum of.
Sigma
41
Under most circumstances, of the measures used for central tendency, the mean is ___
least subject to sampling variation.
42
* A non-symmetrical distribution is said to be skewed.
SKEW
43
indicates the proportion of the total number of scores in each interval.
Relative Frequency Distribution –
44
The shape is a pattern that forms when a histogram is plotted and is known as the ___.
distribution
45
In a skewed distribution the mean, median, and mode are ____
not the same.
46
sample standard deviation population standard deviation
* s; σ
47
First, because it is not symmetrical – this is called __
SKEW
48
____helps a researcher understand the data that he has, while ____help him explain to other people what is happening to his data.
Exploratory data analysis (EDA); descriptive statistics
49
In which case it extends to the furthest point which means it does not exceed ____
1.5 times the length of the box.
50
The range suffers from one huge problem, in that it is massively ___ that occur
affected by any outliers
51
In some way refers to the most central value of a data set with different interpretations of the sense of “___”.
central
52
___ differ between statisticians. There is a very fuzzy line between what could definitely be called ____
Opinions; ordinal and interval.
53
Mean is pronounced as ___.
x-bar
54
If our assumptions are ____, then some of the things we say (results of analysis) will be wrong.
wrong (violated)
55
The ___ is equal to the sum of the mean of each group times the number of scores in the group, divided by the sum of the number of scores in each group.
overall mean
56
When data are grouped –___
some information is lost
57
The ___ is often the best average, for a couple of reasons. * Unlike the median, it uses all of the information available. Every number in the data set has an influence on the mean. The mean also has useful distributional properties which the median does not have.
mean
58
= refers to the number of people in the sample.
N
59
Ability to test population parameter
parametric tests
60
* Second most common measure of central tendency.
MEDIAN
61
Different ways of Describing the Distribution
* Frequency Table * Charts
62
The mean is sensitive to the ___of all the scores in the distribution.
exact value
63
Generally, the ___we present our data, the ____ them, and the more space they take up.
more accurately; less we summarize
64
____ is just a “posh” way of saying average
Central tendency
65
Unlike the range, the IQR does not go to the ends of the scales, and is therefore not affected by ____.
outliers
66
If our assumptions about data are wrong, but not too wrong, we need to be aware that our statistics will not be ____
perfectly correct.
67
That is why we use the unbiased standard deviation, or the )___
population standard deviation.
68
A ___is symmetrical and____. It curves outwards at the top and then inwards nearer the bottom, the tails getting thinner and thinner.
normal distribution; bell shaped
69
– when there are insufficient people in the tail (ends) of the scores to make the distribution normal.
Positive Kurtosis
70
Histograms are very important in data analysis, because they allow us to examine the ___ of the distribution of a variable.
shape
71
The ____ of a set of scores is just the square of the Standard Deviation.
variance
72
The sum of the deviations about the mean equals ____. (Mean is the balance point of the distribution)
zero
73
A lot of ____ depend on the data being from a normal distribution.
tests
74
the mean of all the means do match the population mean – hence the mean is an ____
unbiased estimator.
75
Best measure of central tendency for CATEGORICAL data (although it is not even very useful for that). Rarely used in research.
mode
76
Data will ___ form a perfect normal distribution.
never
77
___presents the score values and their frequency of occurrence.
Frequency distribution –
78
___, _____– one of the most useful graphical techniques in presenting, summarizing data.
or box and whisker plot Boxplot
79
The 1st quartile happens one quarter of the way up the data, which is also the ___; 2nd =___; 3rd =__
25th centile. 50th centile. 75th centile
80
Assumptions about data when calculating and Interpreting the Mean
1. The distribution is symmetrical. This means there is not much SKEW, and no OUTLIERS on one side. 2. The data are measured at the INTERVAL or RATIO level.
81
Skew often happens because of ___
Floor Effect or a Ceiling Effect
82
The ___ is the simplest measure of dispersion
range
83
The range is only rarely used in ___-
Psychological Research.
84
A large number of ____make the assumption that the data form a normal distribution
statistical tests
85
– also used to represent interval or ratio data.
Frequency polygon
86
Cumulative Percentage –
cum f / N x 100
87
Range can be expressed as a __ _ number, or sometimes it is expressed as the ____
single; highest and lowest scores.
88
The separation of the mean, median and mode in the direction of the skew is a _____in a skewed distribution.
consistent effect
89
property of median
Under usual circumstances, the median is more subject to sampling variability than the mean but less subject to sampling variability than the mode.
90
in grouping data – ___
how wide should interval be?
91
* Much trickier than Skew but is usually less of a problem. * Occurs when there are either too many people at the extremes of the scale, or not enough people at the extremes
KURTOSIS
92
It is also not affected by SKEW and KURTOSIS to any great extent.
IQR
93
Classes of Kurtosis
Leptokurtic (thin) Mesokurtic Platykurtic (flat)
94
Cumulative Frequency –
frequency of interval + frequencies of all class intervals below it.
95
When the distribution is skewed, the __ is more representative value of central tendency.
median
96
describe/summarize the data a researcher has.
Descriptive Statistics –
97
The mean is very sensitive to ____ scores.
extreme
98
The ___ the interval, the ___ information is lost.
wider, more
99
Second, because it is not the characteristic bell shape – this is called ___.
KURTOSIS
100
– IQR divided by 2.
Semi-inter-quartile range
101
Constructing a frequency distribution of grouped scores
1. Find the range of the scores. 2. Determine the width of each class interval (i). 3. List the limits of each class interval, placing the interval containing the lowest score value at the bottom. 4. Tally the raw scores into the appropriate class intervals. 5. Add the tallies for each interval to obtain the interval frequency
102
Relative Frequency –
f/N
103
The sample standard deviations would, on average, be a ___
bit too low.
104
For statistics to be correct, we need to make some ___.
assumptions
105
It is also like the mean in that it needs to make some assumptions about the shape of the distribution. * To calculate the SD, we must assume that we have a _____
normal distribution._
106
Don’t draw a bar chart for ___
continuous measures.
107
* The most frequent score in the distribution or the most common observation among a group of scores.
MODE
108
Properties of the Standard Deviation
1. The SD gives us a measure of dispersion relative to the mean. 2. The SD is sensitive to each score in the distribution. 3. Like the Mean, the SD is stable with regard to sampling fluctuations.
109
When presented in a table, the score values are listed in ____ with the lowest score value usually at the bottom of the table
rank order,
110
Distributions can be of wrong shape for two reasons.
* First, because it is not symmetrical * Second, because it is not the characteristic bell shape
111
___causes negative skew and are much less common in Psychology.
Ceiling Effect
112
There are ___in a variable – they are the three values that divide the variable into four groups.
three quartiles
113
___– the curve rises slowly and then decreases rapidly.
Negative Skew
114
___ – the curve rises rapidly and then drops off slowly.
Positive Skew
115
In which case it extends to the furthest point which means it does not exceed ____
1.5 times the length of the box.
116
What to do when there are outliers.
1. See if you have made an error. 2. Check if any measurement that you took was carried out correctly.
117
Usually easily spotted in histograms. easy to spot but deciding what to do with them can be much trickier.
OUTLIERS
118
How to find the IQR
1. Scores are placed in rank order and counted. 2. The half-way point is the median. 3. The IQR is the distance between the quarter and threequarters distance points
119
In a frequency distribution mode is very easy to see because it is the ___of the distribution
highest point
120
___ is like the mean, in that it takes all of the values in the dataset into account when it is calculated.
The Standard Deviation
121
____ are plotted on the horizontal axis such that each class bar begins and terminates at the real limits of the interval.
Class intervals
122
Majority would argue that these can be considered to be an___and therefore it is OK to use the ___.
interval data ; mean
123
It is the middle score in a set of scores. * Used when the mean is not valid, which might be because the data are not symmetrically or normally distributed, or because the data are measured in an ordinal level.
MEDIAN
124
The median in a boxplot is represented with a t__
thick line.
125
The variance is not used much in descriptive statistics because it gives us squared units of measurement. However, it is used quite frequently in___
inferential statistics.
126
don’t refer to the Normal Distribution as either of the following; ____
usual, regular, standard, or even distribution.
127
– therefore means “add up all the scores in x.
Σx
128
Options
1. Eliminate the point and carry on with the analysis. 2. If you keep the data point then it may well have a large effect on your analysis and you will analyze your data badly.
129
The sample standard deviation suffers from a problem – it is a biased estimator of the _____
population standard deviation.
130
The sum of the squared deviations of all the scores about their mean is a ___.
minimum
131
A very large number of ____ are normally distributed.
naturally occurring variables
132
While our assumptions will usually be broadly correct, they will never be ___ correct.
exactly
133
__ – used to represent frequency distributions composed of interval or ratio data. Bar is drawn for each class interval.
Histogram
134
It is most commonly used to describe some aspect of a sample which___
does not need to be summarized with any degree of accuracy.