Exam 3 Lecture 3 Flashcards

1
Q

If your data follow a __________, then your data have a ‘normal distribution’.

A

Bell curve. If your data have a normal distribution, then you can use common statistical analysis tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Extreme values are ___________. Central values are _____________.

A

Extreme values are rare. Central values are common.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain what happens when you get further from the central tendency.

A

The further you get from the central tendency, the fewer the data points. Most values in the data set are close to the middle/central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Normal Distributions

A

The data values in a dataset follow a patten
The distribution of data values is as expected (no weird numbers, no clusters of numbers that are not near the central tendency)
This is a naturally occurring distribution in many situations
- IQ, SAT, GPA, BP, height
But some data are NOT NORMAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When the distribution is normal, the following are true:

A

Mean = median = mode
It is symmetrical. Half of data are on the left, half the data are on the right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Normal Distributions

A

The shape of the curve allows us to predict how people will score.
The distribution is proportional
- 68% will score within 1 standard deviation from the mean
- 34% will be a little better than average
- 34% will be a little worse than average
The slope isn’t that steep

  • 95% will score within 2 standard deviations from the mean (27% more)
  • Slope is really steep!
  • 99.7% will score within 3 standard deviations from the mean (4.7% more)
  • Slope starts to flatten out
  • These #s are rare.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a histogram?

A

A bar graph that is specifically designed to tell you how frequent a data value is in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Normal Distributions
LOTS of statistical tests- which is how we compare or associate variables- ASSUME a normal distribution.
If the data aren’t normal, then ?

A

You fail to meet assumptions of the statistical test and the results are not reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Typical statistical tests are comparing MEANS. What is the caveat to this?

A

That comparison might not be fair if the data aren’t normal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal distributions mean the data are ____________.
You can predict more because there is a __________ to responses.

A

Normal distributions mean the data are predictable. You can predict more because there is a pattern to responses. Helps compare means & variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To know if the DISTRIBUTION is NORMAL, you need to find the _______________

A

Mean, mode, and median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Once you know the distribution is normal, to understand the DISTRIBUTION, you need to know _________________

A

The mean and the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

3.5 4.0 4.5 5 5.5 6.0 6.5
What is the mean and the SD?

A

The mean is 5 and the SD is .5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If the mean is 311 and the SD is 5, how should the values be arranged on a bell curve?

A

296 301 306 311 316 321 326

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data that fits a bell curve is predictable. When we know what the data ‘look like’ it is easier to analyze. Data that fit a bell curve all increase __________… that rare values are rare!

A

Increase reliability that rare values are rare. (And common/average values are common).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is it called if the distribution is too tall or too flat? What are some examples of this?

A

If the distribution is too tall or too flat, this is KURTOSIS. It is KURTOTIC.
- Too many people clumped near average (this makes the distribution tall)
- Too many people far from average in tails (this makes the distribution flat)
Tall= leptokurtic
Flat= platykurtic

17
Q

What is it called when the distribution is tipped to the left or right?

A

SKEW (aka, off-center)
- Too many people above average (positively skewed)
- Too many people below average (negatively skewed)
It’s a measure of asymmetry

18
Q

What happens to the mean, median, and mode when the graph is skewed?

A

Mean is not equal to median is not equal to mode.
- Mean is in the center, but the peak (mode) is not.
- Mean is less meaningful. It doesn’t tell you what is most likely in a sample. It overvalues the tail (those far away from typical).

19
Q

How can one tail be too long?

A

One boundary, one open end (wait time, weight, income)
Zero-inflation (alcohol -> lots of non-drinkers)

20
Q

What is zero-inflation in a skewed distribution?

A

Lots of zeros!
Alcohol -> lots of non-drinkers.
This skew tells us that it is more common to be a non-drinker or a light drinker than a heavy drinker.

21
Q

What is only 1 boundary in a skewed distribution?

A

There is only one boundary.
For example, flights. There is only one boundary (on time to forever). This skew tells us that the flight most often takes off on time or close, but sometimes it can be HOURS.

22
Q

Only 1 boundary in a skewed distribution example with doctors

A

This skew tells us that it is uncommon for doctors to run on time; most of them make us wait a long time!

23
Q

Not everyone conforms…
Overall, your dataset can look ‘normal’ but there can still be a few data points that don’t fit expectations. What do you do with unexpected values?
What are unexpected values?

A

(If it is too many unexpected values, you have a problem with your data/study)
- An extremely low value
- An extremely high value
- A value that is very far from the mean

24
Q

Outliers versus errors

A

Is the outlier an ERROR?
- Errors add BIAS to your results.
- Errors can alter the MEAN and the VARIANCE

Is the outlier REAL?
- Sometimes a number is unlikely but not impossible
- These data can alter the MEAN and the VARIANCE, but is it BIAS?
- There is no single rule about what an outlier is. All data ‘behave’ differently (HR has a known range, brain activity does not)

Keep or chuck?
HR= 1000 (Error! This is impossible).
Drinks per day= 60 (Unlikely but not impossible!)

25
Q

Data literacy= asking: Did they have a plan?
When to keep vs. discard data
Errors vs. outliers

A

With all data, the goal is always to use as much as possible! Some people think- if a data point looks weird, chuck it! But it’s not that simple!
- Yes, errors add BIAS
- But chucking data can also introduce BIAS
- To avoid BIAS, you need a REALLY CLEAR plan about what you are chucking, and why.

Errors= not real data -> always discard
Outlier= real data -> test/ discard or transform
- When a data point is so extreme that it throws off the central tendency, the variance, or the distribution, it is called an influential outlier
- Statisticians have rules about discarding a data point. They also have multiple ways to transform a data point to make it more ‘manageable’ but still ‘different’/
A good statistician tries to use all available data whenever possible.

26
Q

When a data point is so extreme that it throws off the central tendency, the variance or the distribution, it is called an

A

Influential outlier