Data Analysis week 3 Flashcards

1
Q

What is the purpose of taking samples

A

Making an estimation for the population value by calculation the sample value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is a sample random

A

If every element of the population has an equal chance of being included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you measure if an estimation is close to the population value

A

By precision and bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a high precision mean

A

A high precision means the estimate of a population value is not that different for different samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does high bias mean

A

High bias means the value of the estimate is very different from the population value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does low precision mean

A

Low precision means the values of the estimates of the population value are very different for different samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does low bias mean

A

Low bias means the value of the estimate is not that different from the population value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the following function in R to draw samples from a dataset, what does n stand for: s <- slice_sample( diamonds, n=50 )

A

The number of observations from the dataset you put in one sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a sampling distribution

A

A series of estimates obtained from a (large) number of repeated independent samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can you say about the sampling distribution if the sampling distribution is biased

A

The mean of the sampling distribution is different from the mean of the population (the population value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is standard error and what does it explain

A

Standard error is the standard deviation of the sampling distribution. It summarizes the precision of an estimate in one number. Calculates how likely we are to get the real value wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What influence does the sample size (generally) have on bias

A

If the sample size increases, the bias (generally) decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What influence does sample size have on precision (and variation between sample sizes)

A

If the sample size increases, the precision increases, because the variation between samples decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What influence does the sample size have on the standard error (in factors)

A

If the sample size increases with factor k^2, the standard error decreases with factor k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is an estimate asymptotically biased

A

If the estimate gets less and less biased if the sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For what is bootstrapping a method

A

Bootstrapping is a method to estimate the sampling distribution.

16
Q

What does bootstrapping result in and what does this name stand for

A

Bootstrapping results in a bootstrap distribution and this is an estimate of the sampling distribution.

16
Q

What method of sampling do you use in bootstrapping and how does this work

A

Sampling with replacement. You draw a sample from the population and from this sample you draw a new samples of the same sample size.

17
Q

What is the 95% interval, how can you interpret this and about what does this tell us something

A

The 95% confidence interval is the interval between the 2.5/100 and the 97.5/100 quantiles of the bootstrap distribution. You can interpret this as: we can be 95% certain that the estimate of the bootstrap distribution is within the standard error. This tells us something about the precision of the bootstrap distribution.