Ch4: Central Tendency and Variables Flashcards

1
Q

Central tendency

4.1: Central Tendency

A
  • …the descriptive statistic that best represents the center of a data set, the particular value that all the other data seem to be gathering around; the “typical” score
  • Usually, at (or near) the highest point in the histogram or the polygon
  • Expressed in three different ways: mean, median, mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The mean expressed by symbolic notation

Mean

4.1: Central Tendency

A

We need to understand only a handful of symbols to express the ideas necessary to understanding stats

Several symbols can represent the mean:
* M: on the left side of the formula
* X: a single score
* μ:
* Σ: sum of single scores (X)
* N: total number of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The numbers based on samples taken from a population are called…

Mean

4.1: Central Tendency

A
  • Statistics
  • E.g., M is a statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The numbers based on whole populations are called

Mean

4.1: Central Tendency

A
  • Parameters
  • E.g., μ is a parameter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Steps to calculating the mean:

4.1: Central Tendency

A
  • Step 1: add up all the scores in the sample. In statistical notation, this is ΣX
  • Step 2: divide the total of all the scores by the number of donation
    – The total number of scores in a sample is typically represented by N
    – The full equation would be: M = ΣX / N
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Median (Mdn)

4.1: Central Tendency

A

the middle score of all the scores in a sample when the scores are arranged in ascending order; AKA 50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to determine median

4.1: Central Tendency

A
  • Step 1: line up all the scores in ascending order
  • Step 2: find the middle score.
  • With an odd number of scores, there will be an actual middle score.
  • With an even number of scores, there will be no actual middle score
    In this case, calculate the mean of the two middle scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

4.1: Central Tendency

A
  • The most common score of all the scores in a sample
  • Doesn’t have a symbol nor abbreviation
  • Mode can be used with scale data, but is more commonly used with nominal data
  • EX: maps based on census data that showed how residents of England and Wales typically commute to work
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

WHAT DO WE CALL EACH CASE?

When there is more than one mode, whether a single score or an interval, we report both, or all, of the most common scores

  • 1 mode
  • 2 modes
  • > 2 modes

4.1: Central Tendency

A
  • When a distribution of scores has one mode, we refer to it as: unimodal
  • When a distribution has two modes, we call it: bimodal
  • When a distribution has more than two modes, we call it: multimodal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How outliers affect measures of central tendency:

4.1: Central Tendency

A

Mean - greatly affected by outliers: extreme scores that are either very high or very low in comparison with the other scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Different measures of central tendency can lead to different conclusions, but when a decision needs to be made, the choice is usually between the mean and the median

Mode is generally used in three situations:

4.1: Central Tendency

A
  • When one particular score dominates a distribution
  • When the distribution is bimodal or multimodal
  • When the data are nominal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variability:

4.2: Measures of Variability

A

a numerical way of describing how much spread there is in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 ways to describe/compute variance:

4.2: Measures of Variability

A
  1. Computing its range
  2. Compute variance and its square root, known as standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Range

4.2: Measures of Variability

A

a measure of variability calculated by subtracting the lowest score (the minimum) from the highest score (the maximum).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Range equation:

4.2: Measures of Variability

A

range = X(highest) - X(lowest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range is a useful first indicator of variability, but… (downsides to range)

4.2: Measures of Variability

A

….is influenced only by the highest and lowest scores
* All other scores in-between could be clustered near the highest score, huddled near the center spread out evenly, or have some unexpected pattern; WE CAN’T KNOW SOLELY BASED ON THE RANGE!

17
Q

Whenever there are outliers, the range will be an exaggerated measure of the variability - WHAT’S THE SOLUTION?

4.2: Measures of Variability

A

Interquartile range

18
Q

What does interquartile range indicate?

4.2: Measures of Variability

A
  • A measure of the distance between the first and third quartiles
  • Like the median marks the 50th percentile of a data set, the first quartile marks the 25th percentile of a data set, and the third quartile marks the 75th percentile of a data set
  • Essentially, the first (25th percentile) and third (75th percentile) are medians of the TWO HALVES of data: the half below the median, and the half above
19
Q

WHY use interquartile range?

4.2: Measures of Variability

A

Because it’s based on values that come from the middle 50% of the data (between 25%-75%) of the distribution, it’s unlikely to be influenced by outliers (not considering scores < 25% or > 75%)

20
Q

Steps to determining interquartile range

4.2: Measures of Variability

A
  • 1: calculate the median
  • 2: look at all of the scores BELOW the median. Then, the median of these scores, the lower half, is the first quartile, often called Q1 for short
  • 3: look at all of the scores ABOVE the median. Then, the median of these scores, the upper half of the scores, is the third quartile, often called Q3 for short
  • 4: subtract Q1 from Q3
  • The interquartile range, often abbreviated as IQR, is the difference between the first and third quartiles
21
Q

Variance

4.2: Measures of Variability

A
  • the average of the squared deviations from the mean
  • When something varies, it must vary from (or be different from) some standard - standard as in the mean
  • Thus, when we compute variance, that number describes how far a distribution varies around the mean
22
Q

Variance - why can’t we just take the square of each deviation from the mean?

4.2: Measures of Variability

A

If we do, we get 0

  • Remember, the mean is the point at which all scores are perfectly balanced; mathematically, the scores have to balance out - yet we know that there is variability among these scores
  • To eliminate the negative signs, SQUARING ALL THE DEVIATIONS is what statisticians do to solve this problem
  • Once we square the deviations, we can take their average and get a measure of variability
  • Later, we will “unsquare” those deviations to calculate the SD
23
Q

4 STEPS TO CALCULATE VARIANCE:

4.2: Measures of Variability

A

1: subtract the mean from every score (X-M)
* AKA deviations from the mean

2: square every deviation from the mean
* AKA squared deviations

3: sum of all squared deviations
* AKA sum of squared deviations, or sum of squares for short

4: divide the sum of squares (the sum of each score’s squared deviation from the mean) by the total number in the sample
* EX: average squared deviation = 48.80
* Total # of scores: 5
* 48.80/5 = 9.76
* Thus, variance = 9.76

24
Q

Symbols that represent the variance of a sample include:

4.2: Measures of Variability

A
  • SD2 (standard deviation squared)
  • s^2 (standard deviation squared)
  • MS (comes from “mean square”, referring to average of the squared deviation)
25
Q

Most basic formula for SD

A

SD = square root of SD^2

26
Q

Full formula for SD

A

SD = square root of Σ (X-M)2 / N

27
Q

LECTURE

28
Q

How can we tell what’s based on a sample vs. population (equation)?

A
  • Sample: used M not μ
29
Q

First step to calculating the median

A

list all scores in ascending order

30
Q

We can use central tendency as a clue to distribution shape: perfect shape, positive skew, negative skew

A
  • In a symmetrical “bell shaped” curve: mean = median = mode
  • Positive skew: mean > median > mode (mean gets pulled by upper tail)
  • Negative skew: mean < median < mode (mean gets pulled by lower tail)
31
Q

What measure (mean, median, mode) is the best to describe central tendency - 4 key points:

A
  1. Usually the mean
  2. Small dataset - harder to interpret
  3. If extreme outliers, consider calculating with/without
  4. If unsure, report all three - except note that mode is the only option if nominal data