Exam 3 Lecture 1 Flashcards
What is the variable in a dataset?
Each person’s data value
Central Tendency
What best represents most datapoints in a dataset
What are the three forms of central tendency?
Mean, Median, and Mode
Mean
List statistical term, concept, definition, and when it is relevant
Statistical term: Mean
Concept: Average/mediocre/expected
Definition: Add all the data up and divide it by the number of data points
When relevant: A lot of the time except when you have extreme outliers
Median
List statistical term, concept, definition, and when it is relevant
Statistical term: Central/right in the middle of the pack
Definition: Line all the data up in order and pick the middle one (or mean of the 2 middle ones if there are an even number of data points)
When it is relevant: You’ve got crazy numbers that are all over the place
Mode
List statistical term, concept, definition, and when it is relevant
Statistical term: Mode
Concept: Typical/most common/popular
Definition: Most typical or common number in the dataset
When it is relevant: You’ve got a large cluster of just one number
When reviewing data steps
Step 1: Visual inspection to get a ‘feel’ for the numbers before analyzing them
Step 2: Clean your data. Look for numbers that look weird and make a plan for what to do: keep, fix, remove.
Step 3: Pick a central tendency. Rule out:
- Too much spread of values
- No single cluster of one number
Class Attendence Context
144
141
125
129
98
150
129
126
129
127
133
8
150
129
146
Mean tells me that, ON AVERAGE, about 124 students out of 150 come to class. SO, if I made cookies for everyone, I should bring about 124. But does this seem correct? There were actually NO DAYS when there were 124 students in class. And there are only 2 days with less than 125.
NEED TO CLEAN DATA
Snowstorm, prof didn’t cancel, only 8 students showed up. Remove this value.
Mean of the CLEAN data tells me that, on average, about 133 students out of 150 come to class.
Raw data may be accurate, but they are NOT REPRESENTATIVE of attendance on most days. Removing the 8 makes the CENTRAL TENDENCY match the data values better.
Mean tells me that, on average, about 133 students out of 150 come to class.
If I brought in 133 cookies, wouldn’t be as likely to run short.
BUT, is there another way to see what the typical number of students in class is?
Median is the middle of the pack. Line them up and find the middle one.
No, that’s not right.
Mode tells me that there are MOST OFTEN 129 students in the room. Having 129 students in the room is most likely based on past attendance.
Here, all ‘central tendencies’ are pretty similar, but WHICH IS MOST INFORMATIVE/helpful to plan/predict the future?
MODE
Raw data may be accurate, but they are often __________________ of a dataset. Removing values makes the ____________ match the data values better.
Not Representative
Central Tendency
Only __________ requires math. The others are about sorting.
Mean
Average equation
Sum of all values/# of data points
Real Estate Context
Mean tells me that, on average, houses sell for >800k in my area. BUT look at how different the numbers are- $100! $6.7 mill! So, $800K is average, but is it typical? The mean doesn’t help me figure out if I can afford a house.
Median tells me that $312,000 buys me a mid-level house in this town. MUCH MORE USEFUL! It doesn’t sum and divide to make an ‘estimate’, it is an actual number in the list.
Mode isn’t super useful. Yes, 2 of 15 houses have the same value, but that doesn’t help us decide if we can afford to live in this town.
Realtors talk in MEDIANS.
Stress Rating Context
0
2
3
4
2
2
5
2
0
1
0
5
2
2
2
Mean says that the sample, ON AVERAGE, has a stress of 2.29.
Median says that 2 is the midpoint of the sample.
Mode tells me that MOST people report a stress of 2.
USE MODE
Mean, mode, or median?
You ask a sample of college students how many times they have had a virus in the past semester.
Median, Mode
Mean, mode, or median?
You ask a sample of sedentary, normal weight young adult females to track calories consumed per day.
Median- typical intake.