Chapter 24-25 Flashcards

1
Q

The Gaussian Distribution Is An Unreachable

Ideal

A
It is a symmetrical
distribution
• It extends infinitely
in both directions
• You may know that
one or both of these
traits is impossible
for your data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

However, the tests based on Gaussian distribution

are fairly robust to violations if

A

sample size is large.

- good performance from a variety of distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When sample size is small, it is hard to

A

tell what kind of

distribution it came from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The larger samples more closely approximate

A

the source

population but still don’t look perfectly Gaussian.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What a Gaussian Distribution Really Looks Like

A
Small samples
shown with
scatter plots.
• Some may be
more likely than
others, but all
came from a
Gaussian
population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When sample size is small, it is hard to tell

A

what kind

of distribution it came from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

There are several
types of normality
tests that ask

A
What skewness?
• Negative skew?
• Positive skew?
• How much kurtosis?
• How peaked is it?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Test: if you randomly sample
from a Gaussian population,
what is

A

the probability of
obtaining a sample that
deviates as much or more
than this one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

With a large sample
size, you are very
likely to

A
reject the
hypothesis that the
sample came from a
normal distribution
because most don’t.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

With a small sample
size, you are unlikely
to

A

reject even if it is
very different from
normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Comparing ranks

A
-Converts
observation values
to only ranks
-Has the effect of
downweighting outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Note that the median is the same middle sample in

both. But the mean depends on

A
the distribution (It
could be higher or lower than the median).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Problem with comparing ranks:

A
You are forced to ignore
one aspect of your data
(how much the values
differ from each other).
You are only looking at
what order they fall into.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Without removing values.

• Resampling: bootstrap

A
Randomizing: permutations
• e.g. when comparing a
control group to a
treatment, randomly
reassign the values to each
category.
Look to see if the real set of observations is very unusual
when compared to many different randomized versions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Problems With Nonparametric Options

A

Rank based methods discard data
• Like throwing out outliers
• Randomization and resampling are computer
intensive (not much of a problem anymore).
• Nonparametric methods have less power than
parametric (based on a Gaussian distribution).
• i.e. you’ll need higher sample size
• On the other hand, maybe assuming Gaussian
is akin to “making up data.”
• The problems with parametric methods decrease
as sample size increases anyway

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

• Both methods are bad when

A

sample size is

low.

17
Q

Parametric methods are

A

not robust to nonGaussian

distributions

18
Q

• Nonparametric methods do not have

A

enough
power to reject the null hypothesis when it is
false.

19
Q

• Both methods are pretty good

A

when sample

size is high.