exam 1 review Flashcards
An investigation will typically focus on a well-defined collection of _________ constituting a ____________ of interest.
An investigation will typically focus on a well-defined collection of objects constituting a population of interest.
In one study, the population might consist of all gelatin capsules of a particular type produced during a specified period. Another investigation might involve the population consisting of all individuals who received a B.S. in engineering during the most recent academic year. When desired information is available for all objects in the population, we have what is called a ______ . Constraints on time, money, and other scarce resources usually make a _______ impractical or infeasible.
In one study, the population might consist of all gelatin capsules of a particular type produced during a specified period. Another investigation might involve the population consisting of all individuals who received a B.S. in engineering during the most recent academic year. When desired information is available for all objects in the population, we have what is called a census. Constraints on time, money, and other scarce resources usually make a census impractical or infeasible.
a subset of the population— a ______ —is selected in some prescribed manner.
a subset of the population— a sample —is selected in some prescribed manner.
Variable
A variable is any characteristic whose value may change from one object to another in the population.
Univariate
A univariate data set consists of observations on a single variable.
Bivariate
We have bivariate data when observations are made on each of two variables.
Multivariate
Multivariate data arises when observations are made on more than one variable (so bivariate is a special case of multivariate).
Descriptive Statistics
- the construction of histograms, boxplots, and scatter plots are primary examples.
- Other descriptive methods involve calculation of numerical summary measures, such as means, standard deviations, and correlation coef-ficients.
Inferential Statistics
Techniques for generalizing from a sample to a population are gathered within the branch of our discipline called inferential statistics
In a statistics problem, characteristics of a sample are available to the _____________, and this information enables the experimenter to draw conclusions about the ______________. The relationship between the two disciplines can be summarized by saying that probability reasons from the population to the ________ (deductive reasoning), whereas inferential statistics reasons from the sample to the ________ (_________ reasoning).
In a statistics problem, characteristics of a sample are available to the experimenter, and this information enables the experimenter to draw conclusions about the population. The relationship between the two disciplines can be summarized by saying that probability reasons from the population to the sample (deductive reasoning), whereas inferential statistics rea sons from the sample to the population (inductive reasoning).
“How Many People Do You Know? Efficiently Estimating Personal Network Size” ( JASA , 2010: 59–70): How many of the N individuals at your college do you know? You could select a random ________ of students from the population and use an estimate based on the fraction of people in this ________ that you know. Unfortunately this is very inefficient for large populations because the fraction of the ________ someone knows is typically very small. A “ _____ ________ _____” was proposed that the authors asserted remedied ________ in previously used techniques. A simulation study of the method’s effectiveness based on groups consisting of first names (“How many people named Michael do you know?”) was included as well as an application of the method to actual survey data. The article concluded with some practical guidelines for the construction of future surveys designed to estimate social network size.
“How Many People Do You Know? Efficiently Estimating Personal Network Size” ( JASA , 2010: 59–70): How many of the N individuals at your college do you know? You could select a random sample of students from the population and use an estimate based on the fraction of people in this sample that you know. Unfortunately this is very inefficient for large populations because the fraction of the population someone knows is typically very small. A “latent mixing model” was proposed that the authors asserted remedied deficiencies in previously used techniques. A simulation study of the method’s effectiveness based on groups consisting of first names (“How many people named Michael do you know?”) was included as well as an application of the method to actual survey data. The article concluded with some practical guidelines for the construction of future surveys designed to estimate social network size.
The number of observations in a single ______, that is, the ______ size, will often be denoted by n, so that n = 4 for the ______ of universities {Stanford, Iowa State, Wyoming, Rochester} and also for the ______ of pH measurements {6.3, 6.2, 5.9, 6.5}. If two ______ are simultaneously under consideration, either m and n or n1 and n2 can be used to ______the numbers of observations. An experiment to compare thermal efficiencies for two different types of diesel engines might result in ______ {29.7, 31.6, 30.9} and {28.7, 29.5, 29.4, 30.3}, in which case m = 3 and n = 4.
The number of observations in a single sample, that is, the sample size, will often be denoted by n, so that n = 4 for the sample of universities {Stanford, Iowa State, Wyoming, Rochester} and also for the sample of pH measurements {6.3, 6.2, 5.9, 6.5}. If two samples are simultaneously under consideration, either m and n or n1 and n2 can be used to denote the numbers of observations. An experiment to compare thermal efficiencies for two different types of diesel engines might result in samples {29.7, 31.6, 30.9} and {28.7, 29.5, 29.4, 30.3}, in which case m = 3 and n = 4.
Constructing a Stem-and-Leaf Display
Constructing a Stem-and-Leaf Display
- Select one or more leading digits for the stem values. The trailing digits become the leaves.
- List possible stem values in a vertical column.
- Record the leaf for each observation beside the corresponding stem value.
- Indicate the units for stems and leaves someplace in the display.
A stem-and-leaf display conveys information about the following aspects of the data:
● identification of a typical or representative value
● extent of spread about the typical value
● presence of any gaps in the data extent of symmetry in the distribution of values
● number and locations of peaks
● presence of any outliers —values far from the rest of the data
What is a dot plot?
A dotplot is an attractive summary of numerical data when the data set is reasonably small or there are relatively few distinct data values. Each observation is represented by a dot above the corresponding location on a horizontal measurement scale. When a value occurs more than once, there is a dot for each occurrence, and these dots are stacked vertically. As with a stem-and-leaf display, a dotplot gives information about location, spread, extremes, and gaps.