Lecture 2 (DESCRIPTIVE STATISTICS II) Flashcards

Question 1

Q

MEASURES OF CENTRAL TENDENCY

Answer

A

Yield information about “particular places places or locations in a group of numbers”.

Question 2

Q

MODE

Answer

A

The most frequently occurring value in a data set.
Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
Can be used to determine what categories occur most frequently.
BIMODAL : In a tie for the most frequently occurring value, two modes are listed.
MULTIMODAL: Data sets that contain more than two modes.

Question 3

Q

MEDIAN

Answer

A

Middle value in a ordered array of
numbers.
For an array with an odd number of terms, the median is the middle number.
For an array with an even number of terms the median is the average of the middle two numbers.

Question 4

Q

ARITHMETIC MEAN

Answer

A

Mean is the average of a group of numbers.
Applicable for interval and ratio data.
Not applicable for nominal or ordinal data.
Affected by each value in the data set, including extreme values.
Computed by summing all values in the data set and dividing the sum by the number of values in the data set.

Question 5

Q

Population mean

Question 6

Q

Sample mean

Question 7

Q

PERCENTILES

Answer

A

Measures of central tendency that divide a group of data into 100 parts.
At least n% of the data lie below the nth percentile, and at most (100-n)% of the data lie above the nth percentile.

Question 8

Q

How to calculate percentiles

Answer

A

Organise data into ascending ordered array.
Calculate the percentile location i= (P/100)*n
Determine the percentile’s location and its value.
If i is a whole number, the percentile is the average of the values at the i and (i+1) positions.
If i is not a whole number, the percentile is at the (i+1) position in the ordered array.

Question 9

Q

QUARTILES

Answer

A

Measure of central tendency that divide a group of data into four subgroups.
Q1: 25% of the data fall below the first quartile.
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile.

Question 10

Q

MEASURES OF VARIABILITY

Answer

A

Tools that describe the spread or the dispersion of a set of data.

Question 11

Q

RANGE

Answer

A

The difference between the largest and the smallest values in a set of data.
ADVANTAGE: Easy to compute
DISADVANTAGE: is affected by extreme values

Question 12

Q

INTERQUARTILE RANGE

Answer

A

Range of values between the first and third quartiles.
Range of the middle half; middle 50%
Useful when researchers are interested in the middle 50% and not the extremes.
Used in the construction of box plots and whisker plots
Q3 - Q1

Question 13

Q

Mean Absolute Deviation, variance, and Standard Deviation

Answer

A

These data are not meaningful unless the data are at least interval level data.
One way for researchers to look at the spread of the data is to subtract the mean from each data set.
Subtracting the mean from each data value gives the deviation from the mean (X - μ)
An examination of deviation from the mean can reveal information about the variability of data.
The sum of deviation from the arithmetic mean is always zero.

Question 14

Q

ABSOLUTE DEVIATION

Answer

A

An obvious way to force the sum of deviations to have a non zero total is to take the absolute value of each deviation around the mean.
Allows on to solve for the Mean Absolute Deviation

Question 15

Q

MEAN ABSOLUTE DEVIATION

Answer

A

Average of the absolute deviations from the mean.

(ΣN[X-μ])/N

Question 16

Q

POPULATION VARIANCE

Answer

A

Average of the squared deviations from the arithmetic mean σ^2

Question 17

Q

SUM OF SQUARED DEVIATIONS

Answer

A

SSD about the mean of a set of values

Question 18

Q

SAMPLE VARIANCE

Answer

A

Average of the squared deviations from the arithmetic mean.

S^2 = (Σ(X-Xbar)^2) / n-1

Question 19

Q

SAMPLE STANDARD DEVIATION

Answer

A

Is the square root of the sample variance.

Question 20

Q

EMPIRICAL RULE

Answer

A

A guideline that states the approximate % of values that fall within a given number of standard deviations of a mean of a set of data that are normally distributed.
Distance from the mean:
μ +/- 1σ
Percentage of values falling within distance: 68
Distance from the mean:
μ +/- 2σ
Percentage of values falling within distance:
95
Distance from the mean:
μ +/- 3σ
Percentage of values falling within distance:
99.7
Applies when data are approximately normally distributed.

Question 21

Q

CHEBYSHEV’S THEOREM

Answer

A

Applies to all distribution, and they can be used whenever the data distribution shape is unknown or non-normal.
At least 1 - 1/k^2 values fall within + and - standard deviations of the mean, regardless of the shape of the distribution.
k is the number of standard deviations.

Question 22

Q

Z-SCORES

Answer

A

Represents the number of Std Dev a value (x) is above or below the mean of a set of numbers when the data are normally distributed.
Allows the translation of a value’s raw distance from the mean into units of std dev.
z = (x - u)/o

Question 23

Q

COEFFICIENT OF VARIATION

Answer

A

Ratio of the standard deviation to the mean, expressed as a percentage.
Measurement of relative dispersion
CV = o/u * 100

Question 24

Q

SYMMETRICAL

Answer

A

The right half is a mirror image of the left half

Question 25

Q

SKEWNESS

Answer

A

Shows that the distribution lack symmetry; used to denote the data is sparse at one end, and piled at the other end.

Question 26

Q

COEFFICIENT OF SKEWNESS

Answer

A

Compares the mean and median in light of the magnitude to the standard deviation; Md is the median; o is the standard deviation

Sk = (3(u-Md)) / o
If Sk < 0 The distribution is negatively skewed. (left)
If Sk = 0, the distribution is symmetric (not skewed)
If Sk > 0, the distribution is positively skewed (right)

Question 27

Q

Describe the distribution of the mean, median and mode when data is negatively skewed

Answer

A

Mean is lowest value, median is middle value, mode is highest value.

Question 28

Q

Describe the distribution of the mean, median and mode when data is symmetric.

Answer

A

Mean, mod and median all have the same value.

Question 29

Q

describe the distribution of the mean, median and mode when data are positively skewed.

Answer

A

Mode is lowest, median is middle, mean is highest.

Question 30

Q

Kurtosis

Answer

A

Peakedness
LEPTOKURTIC: high and thin
MESOKURTIC: normal in shape
PLATYKURTIC: flat and spread out

Question 31

Q

BOX AND WHISKER PLOT

Answer

A

Five specific values are used:
Median, Q2
First Quartile, Q1
Third Quartile, Q3
Minimum value in data set
Maximum value in data set.

INNER FENCES:
IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 - 1.5 IQR

OUTER FENCES:
Lower inner fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR