Lecture 2 (DESCRIPTIVE STATISTICS II) Flashcards

1
Q

MEASURES OF CENTRAL TENDENCY

A

Yield information about “particular places places or locations in a group of numbers”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MODE

A

The most frequently occurring value in a data set.
Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
Can be used to determine what categories occur most frequently.
BIMODAL : In a tie for the most frequently occurring value, two modes are listed.
MULTIMODAL: Data sets that contain more than two modes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MEDIAN

A

Middle value in a ordered array of
numbers.
For an array with an odd number of terms, the median is the middle number.
For an array with an even number of terms the median is the average of the middle two numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ARITHMETIC MEAN

A

Mean is the average of a group of numbers.
Applicable for interval and ratio data.
Not applicable for nominal or ordinal data.
Affected by each value in the data set, including extreme values.
Computed by summing all values in the data set and dividing the sum by the number of values in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Population mean

A

μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample mean

A

x bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

PERCENTILES

A

Measures of central tendency that divide a group of data into 100 parts.
At least n% of the data lie below the nth percentile, and at most (100-n)% of the data lie above the nth percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to calculate percentiles

A

Organise data into ascending ordered array.
Calculate the percentile location i= (P/100)*n
Determine the percentile’s location and its value.
If i is a whole number, the percentile is the average of the values at the i and (i+1) positions.
If i is not a whole number, the percentile is at the (i+1) position in the ordered array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

QUARTILES

A

Measure of central tendency that divide a group of data into four subgroups.
Q1: 25% of the data fall below the first quartile.
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MEASURES OF VARIABILITY

A

Tools that describe the spread or the dispersion of a set of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RANGE

A

The difference between the largest and the smallest values in a set of data.
ADVANTAGE: Easy to compute
DISADVANTAGE: is affected by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

INTERQUARTILE RANGE

A

Range of values between the first and third quartiles.
Range of the middle half; middle 50%
Useful when researchers are interested in the middle 50% and not the extremes.
Used in the construction of box plots and whisker plots
Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean Absolute Deviation, variance, and Standard Deviation

A

These data are not meaningful unless the data are at least interval level data.
One way for researchers to look at the spread of the data is to subtract the mean from each data set.
Subtracting the mean from each data value gives the deviation from the mean (X - μ)
An examination of deviation from the mean can reveal information about the variability of data.
The sum of deviation from the arithmetic mean is always zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ABSOLUTE DEVIATION

A

An obvious way to force the sum of deviations to have a non zero total is to take the absolute value of each deviation around the mean.
Allows on to solve for the Mean Absolute Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MEAN ABSOLUTE DEVIATION

A

Average of the absolute deviations from the mean.

(ΣN[X-μ])/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

POPULATION VARIANCE

A

Average of the squared deviations from the arithmetic mean σ^2

17
Q

SUM OF SQUARED DEVIATIONS

A

SSD about the mean of a set of values

18
Q

SAMPLE VARIANCE

A

Average of the squared deviations from the arithmetic mean.

S^2 = (Σ(X-Xbar)^2) / n-1

19
Q

SAMPLE STANDARD DEVIATION

A

Is the square root of the sample variance.

20
Q

EMPIRICAL RULE

A

A guideline that states the approximate % of values that fall within a given number of standard deviations of a mean of a set of data that are normally distributed.
Distance from the mean:
μ +/- 1σ
Percentage of values falling within distance: 68
Distance from the mean:
μ +/- 2σ
Percentage of values falling within distance:
95
Distance from the mean:
μ +/- 3σ
Percentage of values falling within distance:
99.7
Applies when data are approximately normally distributed.

21
Q

CHEBYSHEV’S THEOREM

A

Applies to all distribution, and they can be used whenever the data distribution shape is unknown or non-normal.
At least 1 - 1/k^2 values fall within + and - standard deviations of the mean, regardless of the shape of the distribution.
k is the number of standard deviations.

22
Q

Z-SCORES

A

Represents the number of Std Dev a value (x) is above or below the mean of a set of numbers when the data are normally distributed.
Allows the translation of a value’s raw distance from the mean into units of std dev.
z = (x - u)/o

23
Q

COEFFICIENT OF VARIATION

A

Ratio of the standard deviation to the mean, expressed as a percentage.
Measurement of relative dispersion
CV = o/u * 100

24
Q

SYMMETRICAL

A

The right half is a mirror image of the left half

25
Q

SKEWNESS

A

Shows that the distribution lack symmetry; used to denote the data is sparse at one end, and piled at the other end.

26
Q

COEFFICIENT OF SKEWNESS

A

Compares the mean and median in light of the magnitude to the standard deviation; Md is the median; o is the standard deviation

Sk = (3(u-Md)) / o
If Sk < 0 The distribution is negatively skewed. (left)
If Sk = 0, the distribution is symmetric (not skewed)
If Sk > 0, the distribution is positively skewed (right)

27
Q

Describe the distribution of the mean, median and mode when data is negatively skewed

A

Mean is lowest value, median is middle value, mode is highest value.

28
Q

Describe the distribution of the mean, median and mode when data is symmetric.

A

Mean, mod and median all have the same value.

29
Q

describe the distribution of the mean, median and mode when data are positively skewed.

A

Mode is lowest, median is middle, mean is highest.

30
Q

Kurtosis

A

Peakedness
LEPTOKURTIC: high and thin
MESOKURTIC: normal in shape
PLATYKURTIC: flat and spread out

31
Q

BOX AND WHISKER PLOT

A
Five specific values are used:
Median, Q2
First Quartile, Q1
Third Quartile, Q3
Minimum value in data set
Maximum value in data set.

INNER FENCES:
IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 - 1.5 IQR

OUTER FENCES:
Lower inner fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR