Ch4: Central Tendency and Variables Flashcards by Ava Cervas

Central tendency

4.1: Central Tendency

…the descriptive statistic that best represents the center of a data set, the particular value that all the other data seem to be gathering around; the “typical” score
Usually, at (or near) the highest point in the histogram or the polygon
Expressed in three different ways: mean, median, mode

How well did you know this?

Not at all

Perfectly

The mean expressed by symbolic notation

Mean

4.1: Central Tendency

We need to understand only a handful of symbols to express the ideas necessary to understanding stats

Several symbols can represent the mean:
* M: on the left side of the formula
* X: a single score
* μ:
* Σ: sum of single scores (X)
* N: total number of scores

How well did you know this?

Not at all

Perfectly

The numbers based on samples taken from a population are called…

Mean

4.1: Central Tendency

Statistics
E.g., M is a statistic

How well did you know this?

Not at all

Perfectly

The numbers based on whole populations are called

Mean

4.1: Central Tendency

Parameters
E.g., μ is a parameter

How well did you know this?

Not at all

Perfectly

Steps to calculating the mean:

4.1: Central Tendency

Step 1: add up all the scores in the sample. In statistical notation, this is ΣX
Step 2: divide the total of all the scores by the number of donation
– The total number of scores in a sample is typically represented by N
– The full equation would be: M = ΣX / N

How well did you know this?

Not at all

Perfectly

Median (Mdn)

4.1: Central Tendency

the middle score of all the scores in a sample when the scores are arranged in ascending order; AKA 50th percentile

How well did you know this?

Not at all

Perfectly

How to determine median

4.1: Central Tendency

Step 1: line up all the scores in ascending order
Step 2: find the middle score.
With an odd number of scores, there will be an actual middle score.
With an even number of scores, there will be no actual middle score
In this case, calculate the mean of the two middle scores

How well did you know this?

Not at all

Perfectly

Mode

4.1: Central Tendency

The most common score of all the scores in a sample
Doesn’t have a symbol nor abbreviation
Mode can be used with scale data, but is more commonly used with nominal data
EX: maps based on census data that showed how residents of England and Wales typically commute to work

How well did you know this?

Not at all

Perfectly

WHAT DO WE CALL EACH CASE?

When there is more than one mode, whether a single score or an interval, we report both, or all, of the most common scores

1 mode
2 modes
> 2 modes

4.1: Central Tendency

When a distribution of scores has one mode, we refer to it as: unimodal
When a distribution has two modes, we call it: bimodal
When a distribution has more than two modes, we call it: multimodal

How well did you know this?

Not at all

Perfectly

How outliers affect measures of central tendency:

4.1: Central Tendency

Mean - greatly affected by outliers: extreme scores that are either very high or very low in comparison with the other scores

How well did you know this?

Not at all

Perfectly

Different measures of central tendency can lead to different conclusions, but when a decision needs to be made, the choice is usually between the mean and the median

Mode is generally used in three situations:

4.1: Central Tendency

When one particular score dominates a distribution
When the distribution is bimodal or multimodal
When the data are nominal

How well did you know this?

Not at all

Perfectly

Variability:

4.2: Measures of Variability

a numerical way of describing how much spread there is in a distribution

How well did you know this?

Not at all

Perfectly

2 ways to describe/compute variance:

4.2: Measures of Variability

Computing its range
Compute variance and its square root, known as standard deviation

How well did you know this?

Not at all

Perfectly

Range

4.2: Measures of Variability

a measure of variability calculated by subtracting the lowest score (the minimum) from the highest score (the maximum).

How well did you know this?

Not at all

Perfectly

Range equation:

4.2: Measures of Variability

range = X(highest) - X(lowest)

How well did you know this?

Not at all

Perfectly

Range is a useful first indicator of variability, but… (downsides to range)

4.2: Measures of Variability

Study These Flashcards

….is influenced only by the highest and lowest scores
* All other scores in-between could be clustered near the highest score, huddled near the center spread out evenly, or have some unexpected pattern; WE CAN’T KNOW SOLELY BASED ON THE RANGE!

Whenever there are outliers, the range will be an exaggerated measure of the variability - WHAT’S THE SOLUTION?

4.2: Measures of Variability

Study These Flashcards

Interquartile range

What does interquartile range indicate?

4.2: Measures of Variability

Study These Flashcards

A measure of the distance between the first and third quartiles
Like the median marks the 50th percentile of a data set, the first quartile marks the 25th percentile of a data set, and the third quartile marks the 75th percentile of a data set
Essentially, the first (25th percentile) and third (75th percentile) are medians of the TWO HALVES of data: the half below the median, and the half above

WHY use interquartile range?

4.2: Measures of Variability

Study These Flashcards

Because it’s based on values that come from the middle 50% of the data (between 25%-75%) of the distribution, it’s unlikely to be influenced by outliers (not considering scores < 25% or > 75%)

Steps to determining interquartile range

4.2: Measures of Variability

Study These Flashcards

1: calculate the median
2: look at all of the scores BELOW the median. Then, the median of these scores, the lower half, is the first quartile, often called Q1 for short
3: look at all of the scores ABOVE the median. Then, the median of these scores, the upper half of the scores, is the third quartile, often called Q3 for short
4: subtract Q1 from Q3
The interquartile range, often abbreviated as IQR, is the difference between the first and third quartiles

Variance

4.2: Measures of Variability

Study These Flashcards

the average of the squared deviations from the mean
When something varies, it must vary from (or be different from) some standard - standard as in the mean
Thus, when we compute variance, that number describes how far a distribution varies around the mean

Variance - why can’t we just take the square of each deviation from the mean?

4.2: Measures of Variability

Study These Flashcards

If we do, we get 0

Remember, the mean is the point at which all scores are perfectly balanced; mathematically, the scores have to balance out - yet we know that there is variability among these scores
To eliminate the negative signs, SQUARING ALL THE DEVIATIONS is what statisticians do to solve this problem
Once we square the deviations, we can take their average and get a measure of variability
Later, we will “unsquare” those deviations to calculate the SD

4 STEPS TO CALCULATE VARIANCE:

4.2: Measures of Variability

Study These Flashcards

1: subtract the mean from every score (X-M)
* AKA deviations from the mean

2: square every deviation from the mean
* AKA squared deviations

3: sum of all squared deviations
* AKA sum of squared deviations, or sum of squares for short

4: divide the sum of squares (the sum of each score’s squared deviation from the mean) by the total number in the sample
* EX: average squared deviation = 48.80
* Total # of scores: 5
* 48.80/5 = 9.76
* Thus, variance = 9.76

Symbols that represent the variance of a sample include:

4.2: Measures of Variability

Study These Flashcards

SD2 (standard deviation squared)
s^2 (standard deviation squared)
MS (comes from “mean square”, referring to average of the squared deviation)

Most basic formula for SD

SD = square root of SD^2

Full formula for SD

SD = square root of Σ (X-M)2 / N

LECTURE

How can we tell what's based on a sample vs. population (equation)?

* Sample: used M not μ

First step to calculating the median

list all scores in ascending order

We can use central tendency as a clue to distribution shape: perfect shape, positive skew, negative skew

* In a symmetrical "bell shaped" curve: mean = median = mode * Positive skew: mean > median > mode (mean gets pulled by upper tail) * Negative skew: mean < median < mode (mean gets pulled by lower tail)

What measure (mean, median, mode) is the best to describe central tendency - 4 key points:

1. Usually the mean 2. Small dataset - harder to interpret 3. If extreme outliers, consider calculating with/without 4. If unsure, report all three - except note that mode is the only option if nominal data

Ch4: Central Tendency and Variables Flashcards

(32 cards)