Week 6 Chapter 7 Flashcards
sampling error
natural discrepancy, or amount of error, between a statistic and its corresponding population parameter
distribution of sample means
all of these:
- collection for all the possible random data sets of a particular size
- the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population.
sampling distribution
group of statistics obtained by selecting all possible samples of a specific size
sampling distribution
distribution of statistics obtained by selecting all possible samples of a specific size from a population
central limit theorem
all of these:
- mathematical proposition which serves as a cornerstone for much of inferential statistics
- For any population with mean μ and standard deviation σ, the distribution of sample means for sample size n will have a mean of μ and a standard deviation of σ and will approach a normal distribution as n approaches infinity.
expected value of M
the mean of the distribution of sample means which is always equal to the population mean
standard error of M
all of these:
- measure of distance expected between sample mean and population mean
- symbol σM (big σ subscript M)
- is equal to σ/√n
- is the standard deviation for the distribution of sample means
law of large numbers
rule that larger sample size increases probability that sample and population means will be close. Aka in general, as the sample size increases, the error between the sample mean and the population mean should decrease
Whenever a score is selected from a population, you
should be able to compute a z-score that describes exactly where the score is located in the distribution.
If the population is normal, you also should be able to determine the probability value for obtaining any individual score. In a normal distribution, for example, any score located in the tail of the distribution beyond is an
extreme value, and a score this large has a probability of only p = 0.0228
samples are variable; they are not
all the same. If you take two separate samples from the same population, the samples will be different. They will contain different individuals, they will have different scores, and they will have different sample means.
To find the probability for any specific sample mean, you first must know
all the possible sample means. Therefore, we begin by defining and describing the set of all possible sample means that can be obtained from a particular population. Once we have specified the complete set of all possible sample means (i.e., the distribution of sample means), we will be able to find the probability of selecting any specific sample means.
distribution of sample means is an example of a
sampling distribution. In fact, it often is called the sampling distribution of M.
The sample means should pile up around the population mean. Samples are not expected to be perfect but they are representative of the population. As a result, most of the sample means should be relatively
close to the population mean.
The pile of sample means should tend to form a normal-shaped distribution. Logically, most of the samples should have means close to μ, and it should be relatively rare to find sample means that are substantially different from μ
As a result, the sample means should pile up in the center of the distribution (around μ) and the frequencies should taper off as the distance between M and μ increases. This describes a normal-shaped distribution.
In general, the larger the sample size, the closer the sample means should be to the population mean, μ. Logically, a large sample should be a better representative than a small sample.
Thus, the sample means obtained with a large sample size should cluster relatively close to the population mean; the means obtained from small samples should be more widely scattered.
The value of the central limit theorem comes from two simple facts.
First, it describes the distribution of sample means for any population, no matter what shape, mean, or standard deviation. Second, the distribution of sample means “approaches” a normal distribution very rapidly. By the time the sample size reaches n = 30, the distribution is almost perfectly normal.
the central limit theorem describes the distribution of sample means by identifying the three basic characteristics that describe any distribution:
shape, central tendency, and variability
the distribution of sample means tends to be a
normal distribution
the distribution of sample means is almost perfectly normal if either of the following two conditions is satisfied:
- The population from which the samples are selected is a normal distribution.
- The number of scores (n) in each sample is relatively large, around 30 or more. As n gets larger, the distribution of sample means will closely approximate a normal distribution. When n > 30, the distribution is almost normal regardless of the shape of the original population.
the average value of all the sample means is exactly equal to
the value of the population mean
Occasionally, the symbol μM (big μ subscript M) is used to represent the mean of the distribution of sample means. However, μM = μ and we will use the symbol μ to refer to
the mean of the distribution of sample means.
The sample mean is an example of an
unbiased statistic, which means that on average the sample statistic produces a value that is exactly equal to the corresponding population parameter. In this case, the average value of all the sample means is exactly equal to μ.
When the standard deviation was first introduced in Chapter 4, we noted that this measure of variability serves two general purposes. First, the standard deviation describes the distribution by telling whether the individual scores are clustered close together or scattered over a wide range. Second, the standard deviation measures how well any individual score represents the population by providing a measure of how much distance is reasonable to expect between a score and the population mean. The standard error serves the same two purposes for the distribution of sample means.
- The standard error describes the distribution of sample means. It provides a measure of how much difference is expected from one sample to another. When the standard error is small, all the sample means are close together and have similar values. If the standard error is large, the sample means are scattered over a wide range and there are big differences from one sample to another.
- Standard error measures how well an individual sample mean represents the entire distribution. Specifically, it provides a measure of how much distance is reasonable to expect between a sample mean and the overall mean for the distribution of sample means. However, because the overall mean is equal to μ, the standard error also provides a measure of how much distance to expect between a sample mean (M) and the population mean (μ).
the magnitude of the standard error is determined by two factors:
- the size of the sample
2. the standard deviation of the population from which the sample is selected.
At the extreme, the smallest possible sample (and the largest standard error) occurs when the sample consists of
n = 1 score
When n = 1,
σM = σ (standard error = standard deviation)
increasing sample size beyond n= 30 produces relatively
relatively small improvement in how well the sample represents the population.
First type of distribution
First, we have the original population of scores. This population contains the scores for thousands or millions of individual people, and it has its own shape, mean, and standard deviation. For example, the population of IQ scores consists of millions of individual IQ scores that form a normal distribution with a mean of μ = 100 and a standard deviation of σ = 15.
The second type of distribution
we have a sample that is selected from the population. The sample consists of a small set of scores for a few people who have been selected to represent the entire population. For example, we could select a sample of n = 25 people and measure each individual’s IQ score. The 25 scores could be organized in a frequency distribution and we could calculate the sample mean and the sample standard deviation. Note that the sample also has its own shape, mean, and standard deviation.
The third type of distribtution
the distribution of sample means. This is a theoretical distribution consisting of the sample means obtained from all the possible random samples of a specific size. For example, the distribution of sample means for samples of n = 25 IQ scores would be normal with a mean (expected value) of μ = 100 and a standard deviation (standard error) of σM = 15/√25 = 3. This distribution also has its own shape, mean, and standard deviation.
the three distributions are all
connected, but they are all distinct.
The primary use for the distribution of sample means is to
find the probability of selecting a sample with a specific mean.
the z-score formula for locating a sample mean
z = (M - μ)/σM. Caution: When computing z for a single score, use the standard deviation, σ. When computing z for a sample mean, you must use the standard error,
σM.
the standard error provides a method for defining and measuring sampling error. Knowing the standard error gives researchers a good indication of how accurately their sample data represent the populations they are studying. In most research situations, for example, the population mean is unknown, and the researcher selects a sample to help obtain information about the unknown population. Specifically, the sample mean provides information about the value of the unknown population mean. The sample mean is not expected to
give a perfectly accurate representation of the population mean; there will be some error, and the standard error tells exactly how much error, on average, should exist between the sample mean and the unknown population mean.
the standard error plays a very important role in inferential statistics. Because of its crucial role, the standard error for a sample mean, rather than the sample standard deviation, is often reported in scientific papers. Scientific journals vary in how they refer to the standard error, but frequently the symbols SE and SEM (for standard error of the mean) are used. The standard error is reported in two ways. Much like the standard deviation, it may be reported in a table along with the sample means. Alternatively, the standard error may be reported in
graphs
Which of the following would cause the standard error of M to get smaller?
Increasing the sample size and decreasing the standard deviation