Exam 3 Lecture 1 Flashcards

1
Q

What is the variable in a dataset?

A

Each person’s data value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Central Tendency

A

What best represents most datapoints in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three forms of central tendency?

A

Mean, Median, and Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mean
List statistical term, concept, definition, and when it is relevant

A

Statistical term: Mean
Concept: Average/mediocre/expected
Definition: Add all the data up and divide it by the number of data points
When relevant: A lot of the time except when you have extreme outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median
List statistical term, concept, definition, and when it is relevant

A

Statistical term: Central/right in the middle of the pack
Definition: Line all the data up in order and pick the middle one (or mean of the 2 middle ones if there are an even number of data points)
When it is relevant: You’ve got crazy numbers that are all over the place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mode
List statistical term, concept, definition, and when it is relevant

A

Statistical term: Mode
Concept: Typical/most common/popular
Definition: Most typical or common number in the dataset
When it is relevant: You’ve got a large cluster of just one number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When reviewing data steps

A

Step 1: Visual inspection to get a ‘feel’ for the numbers before analyzing them
Step 2: Clean your data. Look for numbers that look weird and make a plan for what to do: keep, fix, remove.
Step 3: Pick a central tendency. Rule out:
- Too much spread of values
- No single cluster of one number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Class Attendence Context
144
141
125
129
98
150
129
126
129
127
133
8
150
129
146

A

Mean tells me that, ON AVERAGE, about 124 students out of 150 come to class. SO, if I made cookies for everyone, I should bring about 124. But does this seem correct? There were actually NO DAYS when there were 124 students in class. And there are only 2 days with less than 125.
NEED TO CLEAN DATA
Snowstorm, prof didn’t cancel, only 8 students showed up. Remove this value.
Mean of the CLEAN data tells me that, on average, about 133 students out of 150 come to class.
Raw data may be accurate, but they are NOT REPRESENTATIVE of attendance on most days. Removing the 8 makes the CENTRAL TENDENCY match the data values better.

Mean tells me that, on average, about 133 students out of 150 come to class.
If I brought in 133 cookies, wouldn’t be as likely to run short.
BUT, is there another way to see what the typical number of students in class is?
Median is the middle of the pack. Line them up and find the middle one.
No, that’s not right.
Mode tells me that there are MOST OFTEN 129 students in the room. Having 129 students in the room is most likely based on past attendance.
Here, all ‘central tendencies’ are pretty similar, but WHICH IS MOST INFORMATIVE/helpful to plan/predict the future?

MODE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Raw data may be accurate, but they are often __________________ of a dataset. Removing values makes the ____________ match the data values better.

A

Not Representative
Central Tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Only __________ requires math. The others are about sorting.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Average equation

A

Sum of all values/# of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Real Estate Context

A

Mean tells me that, on average, houses sell for >800k in my area. BUT look at how different the numbers are- $100! $6.7 mill! So, $800K is average, but is it typical? The mean doesn’t help me figure out if I can afford a house.

Median tells me that $312,000 buys me a mid-level house in this town. MUCH MORE USEFUL! It doesn’t sum and divide to make an ‘estimate’, it is an actual number in the list.

Mode isn’t super useful. Yes, 2 of 15 houses have the same value, but that doesn’t help us decide if we can afford to live in this town.

Realtors talk in MEDIANS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stress Rating Context
0
2
3
4
2
2
5
2
0
1
0
5
2
2
2

A

Mean says that the sample, ON AVERAGE, has a stress of 2.29.

Median says that 2 is the midpoint of the sample.

Mode tells me that MOST people report a stress of 2.

USE MODE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mean, mode, or median?
You ask a sample of college students how many times they have had a virus in the past semester.

A

Median, Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mean, mode, or median?
You ask a sample of sedentary, normal weight young adult females to track calories consumed per day.

A

Median- typical intake.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mean, mode, or median?
You ask a sample of college-aged adult drinkers whether they prefer beer, wine, or hard liquor.

A

Mode- mode is well-suited for categorical data.

17
Q

Mean, mode, or median?
You ask a sample of college-aged adult drinkers how much of beer, wine, or hard liquor they consume per week.

A

Mean or median

18
Q

Mean, mode, or median?
You ask a random sample of Americans in their 50s to report their annual gross income.

A

Median

19
Q

Let’s design a study. How addictive is nicotine? Compare e-cigs and combustible cigs.

  1. Sample(s)?
  2. What’s the study design?
    - Observational/Experimental
    - Cross-sectional/Longitudinal
  3. What’s the data?
    - Objective/Subjective
    - Categorical/Continuous

So… what kind of central tendency would we expect to use?

A

Mean: If the data is normally distributed, the mean could be used to compare average levels between smokers and vapers.

Median: If the data is skewed, the median might be better to represent the “typical” level for each group.

Mode: The mode could be useful for categorical data, such as identifying the most common reason people report for using e-cigs or combustible cigs.

20
Q

Descriptive Statistics

A

The story of conforming and non-conforming numbers.