Lecture 2 (DESCRIPTIVE STATISTICS II) Flashcards by

MEASURES OF CENTRAL TENDENCY

Yield information about “particular places places or locations in a group of numbers”.

How well did you know this?

Not at all

Perfectly

MODE

The most frequently occurring value in a data set.
Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
Can be used to determine what categories occur most frequently.
BIMODAL : In a tie for the most frequently occurring value, two modes are listed.
MULTIMODAL: Data sets that contain more than two modes.

How well did you know this?

Not at all

Perfectly

MEDIAN

Middle value in a ordered array of
numbers.
For an array with an odd number of terms, the median is the middle number.
For an array with an even number of terms the median is the average of the middle two numbers.

How well did you know this?

Not at all

Perfectly

ARITHMETIC MEAN

Mean is the average of a group of numbers.
Applicable for interval and ratio data.
Not applicable for nominal or ordinal data.
Affected by each value in the data set, including extreme values.
Computed by summing all values in the data set and dividing the sum by the number of values in the data set.

How well did you know this?

Not at all

Perfectly

Population mean

How well did you know this?

Not at all

Perfectly

Sample mean

x bar

How well did you know this?

Not at all

Perfectly

PERCENTILES

Measures of central tendency that divide a group of data into 100 parts.
At least n% of the data lie below the nth percentile, and at most (100-n)% of the data lie above the nth percentile.

How well did you know this?

Not at all

Perfectly

How to calculate percentiles

Organise data into ascending ordered array.
Calculate the percentile location i= (P/100)*n
Determine the percentile’s location and its value.
If i is a whole number, the percentile is the average of the values at the i and (i+1) positions.
If i is not a whole number, the percentile is at the (i+1) position in the ordered array.

How well did you know this?

Not at all

Perfectly

QUARTILES

Measure of central tendency that divide a group of data into four subgroups.
Q1: 25% of the data fall below the first quartile.
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile.

How well did you know this?

Not at all

Perfectly

MEASURES OF VARIABILITY

Tools that describe the spread or the dispersion of a set of data.

How well did you know this?

Not at all

Perfectly

RANGE

The difference between the largest and the smallest values in a set of data.
ADVANTAGE: Easy to compute
DISADVANTAGE: is affected by extreme values

How well did you know this?

Not at all

Perfectly

INTERQUARTILE RANGE

Range of values between the first and third quartiles.
Range of the middle half; middle 50%
Useful when researchers are interested in the middle 50% and not the extremes.
Used in the construction of box plots and whisker plots
Q3 - Q1

How well did you know this?

Not at all

Perfectly

Mean Absolute Deviation, variance, and Standard Deviation

These data are not meaningful unless the data are at least interval level data.
One way for researchers to look at the spread of the data is to subtract the mean from each data set.
Subtracting the mean from each data value gives the deviation from the mean (X - μ)
An examination of deviation from the mean can reveal information about the variability of data.
The sum of deviation from the arithmetic mean is always zero.

How well did you know this?

Not at all

Perfectly

ABSOLUTE DEVIATION

An obvious way to force the sum of deviations to have a non zero total is to take the absolute value of each deviation around the mean.
Allows on to solve for the Mean Absolute Deviation

How well did you know this?

Not at all

Perfectly

MEAN ABSOLUTE DEVIATION

Average of the absolute deviations from the mean.

(ΣN[X-μ])/N

How well did you know this?

Not at all

Perfectly

POPULATION VARIANCE

Study These Flashcards

Average of the squared deviations from the arithmetic mean σ^2

SUM OF SQUARED DEVIATIONS

Study These Flashcards

SSD about the mean of a set of values

SAMPLE VARIANCE

Study These Flashcards

Average of the squared deviations from the arithmetic mean.

S^2 = (Σ(X-Xbar)^2) / n-1

SAMPLE STANDARD DEVIATION

Study These Flashcards

Is the square root of the sample variance.

EMPIRICAL RULE

Study These Flashcards

A guideline that states the approximate % of values that fall within a given number of standard deviations of a mean of a set of data that are normally distributed.
Distance from the mean:
μ +/- 1σ
Percentage of values falling within distance: 68
Distance from the mean:
μ +/- 2σ
Percentage of values falling within distance:
95
Distance from the mean:
μ +/- 3σ
Percentage of values falling within distance:
99.7
Applies when data are approximately normally distributed.

CHEBYSHEV’S THEOREM

Study These Flashcards

Applies to all distribution, and they can be used whenever the data distribution shape is unknown or non-normal.
At least 1 - 1/k^2 values fall within + and - standard deviations of the mean, regardless of the shape of the distribution.
k is the number of standard deviations.

Z-SCORES

Study These Flashcards

Represents the number of Std Dev a value (x) is above or below the mean of a set of numbers when the data are normally distributed.
Allows the translation of a value’s raw distance from the mean into units of std dev.
z = (x - u)/o

COEFFICIENT OF VARIATION

Study These Flashcards

Ratio of the standard deviation to the mean, expressed as a percentage.
Measurement of relative dispersion
CV = o/u * 100

SYMMETRICAL

Study These Flashcards

The right half is a mirror image of the left half

SKEWNESS

Shows that the distribution lack symmetry; used to denote the data is sparse at one end, and piled at the other end.

COEFFICIENT OF SKEWNESS

Compares the mean and median in light of the magnitude to the standard deviation; Md is the median; o is the standard deviation Sk = (3(u-Md)) / o If Sk < 0 The distribution is negatively skewed. (left) If Sk = 0, the distribution is symmetric (not skewed) If Sk > 0, the distribution is positively skewed (right)

Describe the distribution of the mean, median and mode when data is negatively skewed

Mean is lowest value, median is middle value, mode is highest value.

Describe the distribution of the mean, median and mode when data is symmetric.

Mean, mod and median all have the same value.

describe the distribution of the mean, median and mode when data are positively skewed.

Mode is lowest, median is middle, mean is highest.

Kurtosis

Peakedness LEPTOKURTIC: high and thin MESOKURTIC: normal in shape PLATYKURTIC: flat and spread out

BOX AND WHISKER PLOT

``` Five specific values are used: Median, Q2 First Quartile, Q1 Third Quartile, Q3 Minimum value in data set Maximum value in data set. ``` INNER FENCES: IQR = Q3 - Q1 Lower inner fence = Q1 - 1.5 IQR Upper inner fence = Q3 - 1.5 IQR OUTER FENCES: Lower inner fence = Q1 - 3.0 IQR Upper outer fence = Q3 + 3.0 IQR

Lecture 2 (DESCRIPTIVE STATISTICS II) Flashcards

(31 cards)