INTRODUCTION TO INFERENCE: THE SAMPLING DISTRIBUTION Flashcards

1
Q

What is statistical inference?

A

Statistical inference is the process of using data from a sample to make conclusions or predictions about a larger population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main components of inferential statistics?

A

1) Probability: the likelihood of getting a sample mean close to (or the same as) the population mean.
2) Representative Sample: The sample must accurately reflect the population to make valid conclusions about the whole group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is the probability of getting on score the same as getting an average?

A

No, the probability of getting an average is always smaller than the probability or getting a single score. This is because the sample mean takes into account all the other scores in the sample

example: head; 1/2
prob of getting head twice; 1/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two reasons why samples are studies instead of populations?

A

1) usually more practical
2) goal of science is to make generalizations or predictions about events beyond our reach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the general strategy in science when studying a sample?

A

Study a group of individuals who are believed to be REPRESENTATIVE of the general population (or some particular population of interest)

  • When researchers conduct a study, they want to make sure the people they include in the study (the sample) are similar to the general population in important ways that could affect the research topic.

ex: impact of exercice on health; dont let your sample be only young people

We try and avoid a biased sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What re the symbols for population parameters and sample statistics? For mean, variance and standard deviation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is sampling variation?

A

The extent to which a statistic (i.e. mean, standard deviation…) varies in samples taken
From the population

ANOTHER DEF:
Sampling variation refers to the natural differences that occur between samples taken from the same population.

EX: for the same population, the mean of one sample could be 25, an another 33, another 30, but the population mean remains the same (30)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the central limits theorem say?

A

Mathematical principle that the distribution of scores/ sample means taken at random from any distribution of individuals (which can be skewed) will tend to form a normal curve. The sampling distribution of sample means can be approximated by a normal distribution as the sample size becomes large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a random sampling distribution?

A

When you repeatedly take samples from a population and calculate the mean for each sample, those means (called sample means) will form a distribution.

  • It is a distribution of all possible means of samples of some fixed size (allows us to compare apples with apples)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

the larger the sample size…

A

the closer the sampling distribution of sample means will be to a normal distribution. (after repeating le processus a thousand and thousand of times)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A researcher is conducting a study where they take random samples from a population of exam scores. The population has a standard deviation of 15.

(a) Explain what happens to the sampling distribution of sample means as the sample size increases. In particular, describe how the shape of the normal curve changes and what this means for the variability of sample means.

(b) If the researcher increases the sample size from 25 to 100, what will happen to the standard error of the sample mean, and how does this affect the reliability of the sample means as estimates of the population mean?

A
  1. (a)

As the sample size increases, the sampling distribution of sample means becomes narrower (slimmer). The variance between the sample scores is lower

This happens because the standard error (the standard deviation of the sample means) decreases as the sample size increases.
This means that with a larger sample size, the sample means are less variable and are more likely to be closer to the true population mean, making them more reliable as estimates.

2) As the sample size increases from 25 to 100, The standard error will decrease. the smaller standard error means the sampling distribution will become narrower, and the sample means will be more tightly centered around the population mean. This leads to more accurate and reliable estimates of the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A researcher wants to compare the average test scores of two groups of students. Group A has 10 students, and Group B has 50 students. The researcher calculates the average score for each group.
Why might it be problematic to directly compare the average scores of these two groups without considering the sample sizes? How would the “distribution of sample means” help in making a fair comparison between the two groups?

A

When comparing groups with different sample sizes (like Group A with 10 students and Group B with 50), the larger group will tend to have a more reliable average, while the smaller group could be more influenced by random variations. The distribution of sample means helps us understand how averages behave when we take samples of a fixed size from the population, ensuring that when we compare groups, we’re considering how stable those averages are for different sizes. If the sample sizes differ, we’d need to adjust to account for the fact that larger groups generally give more consistent results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Meaning of: “sampling is done one at a time, with replacement”

A

1) Sample one at a time: For each sample, you pick one raw score from the population, record it, and then pick another one raw score from the population (so two raw scores total for each sample).

2) With replacement: After you pick the first raw score, you put it back into the population before picking the second one. This means the first score is still part of the population when you pick the second score, and the same score can be picked more than once.

EXAMPLE: For Sample 2, you could pick 3 again and then 3 (your sample mean would be (3 + 3)/2 = 3), and this is totally possible because you put the raw score back into the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Instead of manually creating a distribution of sample means (which is a lot of work), we can use math to figure out its mean, variance, and shape. We do this using two things:

A

1) The population’s characteristics (like its mean and variance).

2) The sample size (how many scores are in each sample).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three characteristics of a distribution of sample means?

** ALL BASED ON THE CENTRAL LIMITS THEOREM**

VA VOIR DIAPO 26, 27 ET 29 POUR RESUMÉ

A

1) The mean of the distribution of the sample means is the same (APPROXIMATIF) as the mean of the original population (the individual scores).

Équation: refer to diapo 17

2) The spread of the distribution of sample means is less than the spread of the distribution of the population
(ex. s2 = 1.00 < 2.00 &
s = 1.00 < 1.41)

Rule 2a: the distribution of sample means will be less spread out than the population of individual cases from which the sample are taken

Rule 2b: the standard deviation of a distribution of sample means is the square root of th variance of the distribution of means

3) the shape of the distribution of sample means is approximately normal if either

a) each sample is of 30 or more individuals (even If it’s a non normal population of individual scores). once your sample size is 30 or more people, the distribution of sample means will still look very close to a normal curve.

Why? Because of the Central Limit Theorem (our statistical bestie). It’s like a magic spell that smooths out the drama.

b) the distribution of the population of individual is normal

  1. tends to be unimodal (ONE MODE / ONE PEAK)
    WHY? extremes balance each other out giving rise to more middle values, and fewer extreme values
  2. tends to be symmetrical
    WHY? since skew is caused by extreme scores, if there are fewer extreme scores, there less skew
  3. larger sample sizes better approximate a normal distribution
    the more individuals in each sample the closer the distribution of means will be to a normal curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CHARACTERISTIC 1:
The expected value (E) of the sample mean equals …

A

the population mean:
𝑬(𝑿) = 𝝁.

17
Q

What does the “standard error of the mean” tell us?

A

The standard error tells you, on average, how far off these sample means are from the real deal (the population mean). It’s basically the standard deviation of the sampling distribution of the means.

18
Q

Are the standard deviation of a distribution of sample means and the standard error of the mean synonyms?

19
Q

Why do we calculate the Z score of a sample mean on a distribution of sample means instead of the Z score of a single subject on the distribution of individual subjects?

A

We calculate the Z score of a sample mean on the distribution of sample means because we’re comparing the average of a sample to the population mean, not a single individual to the population. This is a core step toward the goal of inferential statistics.