2: Displaying and Exploring Data; Sampling Methods and Central Limit Theorem Flashcards
A ___ summarizes the distribution of one variable by stacking dots at points on a number line that shows all values of the variable; identical observations are stacked.; most useful in smaller data sets.
Dot plot
A ___ is a method used to display a variable distribution using every value; classified by data’s leading digit.
Stem-and-leaf display
Stem: the leading digit or digits
Leaf: the trailing digits
___ are values of an ordered data set (small to large) that divide the data into four intervals.
Quartiles
___ are values of an ordered data set (small to large) that divide the data into 10 equal parts.
Deciles
___ are values of an ordered data set (small to large) that divide the data into 100 intervals.
Percentiles
___ is the most widely used measure of dispersion.
Standard deviation
___ is a graphical display that shows the general shape of a variable’s distribution; based upon five statistics: the minimum value, the first and third quartile, and the maximum value.
Box plots
An ___ is a value on a box plot that is inconsistent with the rest of the data. It is defined as a value that is more than 1.5 times the interquartile range smaller than Q1 or larger than Q3.
Outlier
There are four shapes commonly observed:
- ___
- ___
- ___
- ___
- Symmetric
- Positively skewed
- Negatively skewed
- Bimodal
In a ___ distribution the mean and median are equal and the data values are evenly spread around these values. The shape of the distribution below the mean and median is a mirror image of distribution above the mean and median.
Symmetric
A distribution of values is ___ if there is a single peak, but the values extend much farther to the right of the peak than to the left of the peak; the mean is larger than the median.
Positively skewed or skewed to the right
In a ___ distribution there is a single peak, but the observations extend farther to the left, in the negative direction, than to the right; the mean is smaller than the median.
Negatively skewed
A ___ distribution will have two or more peaks; when the values are from two or more populations.
Bimodal
A ___ is a graph in which the values of two variables (X and Y) are plotted along two axes, the pattern of the resulting points revealing any correlation present; required to be at least interval scale.
Scatter diagram
A ___ is a table used to classify observations according to two identifiable characteristics.
Contingency table
What are the 5 reasons to sample?
- ___
- ___
- ___
- ___
- ___
- To contact the whole population would be time-consuming
- The cost of studying all the items in a population may be prohibitive
- The physical impossibility of checking all items in the population
- The destructive nature of some tests
- The sample results are adequate
A ___ is a sample selected so that each item or person in the population has the same chance of being included.
I.e., name in a hat OR a table of random numbers (not always effective)
Simple random sample
___ is when a random starting point is selected, and then every k’th member of the population is selected (can be biased)
Systematic random sampling
A ___ occurs when a population is divided into subgroups, called strata, and a sample is randomly selected from each stratum.
I.e., college students can be grouped as full time or part time; male or female; freshman, sophomore, junior, or senior.
Stratified random sample
When a population can be divided into groups based on some characteristic; the group is called ___.
Strata
___ occurs when a population is divided into groups using naturally occurring geographic or other boundaries. Then, are randomly selected and a sample is collected from each group; often employed to reduce the cost of sampling a population scattered over a large geographic area.
Cluster sampling
___ is the difference between a sample statistic and its corresponding population parameter.
Sampling error
The ___ is a probability distribution of all possible sample means of a given sample size.
Sampling distribution of the sample mean
The ___ theorem states that, for large random samples, the shape of the sampling distribution of the sample mean is close to the normal probability distribution; this theorem is true for all population distributions.
Central limit theorem
The sampling distribution will be normally distributed under two conditions:
- ___
- ___
- When the samples are taken from populations known to follow the normal distribution.
- When the shape of the population distribution is not known, sample size is important.
The ___ will be exactly equal to the population mean if we are able to select all possible samples of the same size from a given population
Mean of the distribution of sample means
μ=μx
There will be less dispersion in the sampling distribution of the sample mean than in the___.
Population
___ destroys a sample in the course of measuring it.
I.e., breaking a table to determine weight capacity
Destructive Testing