exam 1 review Flashcards

1
Q

An investigation will typically focus on a well-defined collection of _________ constituting a ____________ of interest.

A

An investigation will typically focus on a well-defined collection of objects constituting a population of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In one study, the population might consist of all gelatin capsules of a particular type produced during a specified period. Another investigation might involve the population consisting of all individuals who received a B.S. in engineering during the most recent academic year. When desired information is available for all objects in the population, we have what is called a ______ . Constraints on time, money, and other scarce resources usually make a _______ impractical or infeasible.

A

In one study, the population might consist of all gelatin capsules of a particular type produced during a specified period. Another investigation might involve the population consisting of all individuals who received a B.S. in engineering during the most recent academic year. When desired information is available for all objects in the population, we have what is called a census. Constraints on time, money, and other scarce resources usually make a census impractical or infeasible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

a subset of the population— a ______ —is selected in some prescribed manner.

A

a subset of the population— a sample —is selected in some prescribed manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variable

A

A variable is any characteristic whose value may change from one object to another in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Univariate

A

A univariate data set consists of observations on a single variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bivariate

A

We have bivariate data when observations are made on each of two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multivariate

A

Multivariate data arises when observations are made on more than one variable (so bivariate is a special case of multivariate).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Descriptive Statistics

A
  • the construction of histograms, boxplots, and scatter plots are primary examples.
  • Other descriptive methods involve calculation of numerical summary measures, such as means, standard deviations, and correlation coef-ficients.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Inferential Statistics

A

Techniques for generalizing from a sample to a population are gathered within the branch of our discipline called inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In a statistics problem, characteristics of a sample are available to the _____________, and this information enables the experimenter to draw conclusions about the ______________. The relationship between the two disciplines can be summarized by saying that probability reasons from the population to the ________ (deductive reasoning), whereas inferential statistics reasons from the sample to the ________ (_________ reasoning).

A

In a statistics problem, characteristics of a sample are available to the experimenter, and this information enables the experimenter to draw conclusions about the population. The relationship between the two disciplines can be summarized by saying that probability reasons from the population to the sample (deductive reasoning), whereas inferential statistics rea sons from the sample to the population (inductive reasoning).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

“How Many People Do You Know? Efficiently Estimating Personal Network Size” ( JASA , 2010: 59–70): How many of the N individuals at your college do you know? You could select a random ________ of students from the population and use an estimate based on the fraction of people in this ________ that you know. Unfortunately this is very inefficient for large populations because the fraction of the ________ someone knows is typically very small. A “ _____ ________ _____” was proposed that the authors asserted remedied ________ in previously used techniques. A simulation study of the method’s effectiveness based on groups consisting of first names (“How many people named Michael do you know?”) was included as well as an application of the method to actual survey data. The article concluded with some practical guidelines for the construction of future surveys designed to estimate social network size.

A

“How Many People Do You Know? Efficiently Estimating Personal Network Size” ( JASA , 2010: 59–70): How many of the N individuals at your college do you know? You could select a random sample of students from the population and use an estimate based on the fraction of people in this sample that you know. Unfortunately this is very inefficient for large populations because the fraction of the population someone knows is typically very small. A “latent mixing model” was proposed that the authors asserted remedied deficiencies in previously used techniques. A simulation study of the method’s effectiveness based on groups consisting of first names (“How many people named Michael do you know?”) was included as well as an application of the method to actual survey data. The article concluded with some practical guidelines for the construction of future surveys designed to estimate social network size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The number of observations in a single ______, that is, the ______ size, will often be denoted by n, so that n = 4 for the ______ of universities {Stanford, Iowa State, Wyoming, Rochester} and also for the ______ of pH measurements {6.3, 6.2, 5.9, 6.5}. If two ______ are simultaneously under consideration, either m and n or n1 and n2 can be used to ______the numbers of observations. An experiment to compare thermal efficiencies for two different types of diesel engines might result in ______ {29.7, 31.6, 30.9} and {28.7, 29.5, 29.4, 30.3}, in which case m = 3 and n = 4.

A

The number of observations in a single sample, that is, the sample size, will often be denoted by n, so that n = 4 for the sample of universities {Stanford, Iowa State, Wyoming, Rochester} and also for the sample of pH measurements {6.3, 6.2, 5.9, 6.5}. If two samples are simultaneously under consideration, either m and n or n1 and n2 can be used to denote the numbers of observations. An experiment to compare thermal efficiencies for two different types of diesel engines might result in samples {29.7, 31.6, 30.9} and {28.7, 29.5, 29.4, 30.3}, in which case m = 3 and n = 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Constructing a Stem-and-Leaf Display

A

Constructing a Stem-and-Leaf Display

  1. Select one or more leading digits for the stem values. The trailing digits become the leaves.
  2. List possible stem values in a vertical column.
  3. Record the leaf for each observation beside the corresponding stem value.
  4. Indicate the units for stems and leaves someplace in the display.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A stem-and-leaf display conveys information about the following aspects of the data:

A

● identification of a typical or representative value
● extent of spread about the typical value
● presence of any gaps in the data extent of symmetry in the distribution of values
● number and locations of peaks
​● presence of any outliers —values far from the rest of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a dot plot?

A

A dotplot is an attractive summary of numerical data when the data set is reasonably small or there are relatively few distinct data values. Each observation is represented by a dot above the corresponding location on a horizontal measurement scale. When a value occurs more than once, there is a dot for each occurrence, and these dots are stacked vertically. As with a stem-and-leaf display, a dotplot gives information about location, spread, extremes, and gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A numerical variable is ________ if its set of possible values either is finite or else can be listed in an infinite sequence (one in which there is a first number, a second number, and so on). A numerical variable is __________ if its possible values consist of an entire interval on the number line.

A

A numerical variable is discrete if its set of possible values either is finite or else can be listed in an infinite sequence (one in which there is a first number, a second number, and so on). A numerical variable is continuous if its possible values consist of an entire interval on the number line.

17
Q

Histograms

A

Some numerical data is obtained by counting to determine the value of a variable (the number of traffic citations a person received during the last year, the number of custom-ers arriving for service during a particular period), whereas other data is obtained by taking measurements (weight of an individual, reaction time to a particular stimulus). The prescription for drawing a histogram is generally different for these two cases.

18
Q

A ______ variable x almost always results from counting, in which case possible values are 0, 1, 2, 3, … or some _____ of these integers. Continuous variables arise from making measurements. For example, if x is the pH of a chemical substance, then in theory x could be any number between 0 and 14: 7.0, 7.03, 7.032, and so on. Of course, in practice there are limitations on the degree of accuracy of any measuring instrument, so we may not be able to determine pH, reaction time, height, and concentration to an arbitrarily large number of decimal places. However, from the point of view of creating mathematical models for distributions of data, it is helpful to imagine an entire continuum of possible values.

A

A discrete variable x almost always results from counting, in which case possible values are 0, 1, 2, 3, … or some subset of these integers. Continuous variables arise from making measurements. For example, if x is the pH of a chemical substance, then in theory x could be any number between 0 and 14: 7.0, 7.03, 7.032, and so on. Of course, in practice there are limitations on the degree of accuracy of any measuring instrument, so we may not be able to determine pH, reaction time, height, and concentration to an arbitrarily large number of decimal places. However, from the point of view of creating mathematical models for distributions of data, it is helpful to imagine an entire continuum of possible values.

19
Q

The frequency of any particular x value is the number of times that value occurs in the data set. The relative frequency of a value is the fraction or proportion of times the value occurs:

A
20
Q

Constructing a Histogram for Discrete Data

A

First, determine the frequency and relative frequency of each x value. Then mark possible x values on a horizontal scale. Above each value, draw a rectangle whose height is the relative frequency (or alternatively, the frequency) of that value; the rectangles should have equal widths.

21
Q

Constructing a Histogram for Continuous Data: Equal Class Widths

A

Determine the frequency and relative frequency for each class. Mark the class boundaries on a horizontal measurement axis. Above each class interval, draw a rectangle whose height is the corresponding relative frequency (or frequency).

22
Q

Define: Descriptive Statistics

A

Descriptive statistics are numbers that are used to summarize and describe data. The word “data” refers to the information that has been collected from an experiment, a survey, a historical record, etc. (By the way, “data” is plural. One piece of information is called a “datum.”)

23
Q

Descriptive statistics are just ________. They do not involve ________ beyond the data at hand. ________ from our data to another set of cases is the business of inferential statistics, which you’ll be studying in another section. Here we focus on (mere) ________ statistics.

A

Descriptive statistics are just descriptive. They do not involve generalizing beyond the data at hand. Generalizing from our data to another set of cases is the business of inferential statistics, which you’ll be studying in another section. Here we focus on (mere) descriptive statistics.

24
Q

Question 1 out of 3.
What are the two basic divisions of statistics

  • inferential and descriptive.
  • population and sample.
  • sampling and scaling.
  • mean and median.
A

The two divisions of statistics are inferential and descriptive

25
Q

Question 2 out of 3.
Check all that apply. Descriptive statistics

  • allow random assignment to experimental conditions.
  • use data from a sample to answer questions about a population.
  • summarize and describe data.
  • allow you to generalize beyond the data at hand.
A

Descriptive statistics summarize and describe data. Inferential statistics use data from a sample to answer questions about a population. Inferential statistics involves generalizing beyond the data at hand.

26
Q

Question 3 out of 3.
Which of the following are descriptive statistics?

  • The mean age of people in Detroit.
  • The number of people who watched the superbowl in the year 2002.
  • A prediction of next month’s unemployment rate.
  • The median price of new homes sold in Miami.

The height of the tallest woman in the world.

A

Descriptive statistics are numbers that are used to summarize and describe data. Predicting next month’s unemployment rate involves predicting future data, no describing the data at hand.

27
Q
A