statistics exam 2 Flashcards

1
Q

association

A

values of one variable tend to occur with certain values of another variable; detected when the conditional distributions differ from the marginal distribution and from each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

bias

A

a condition where the mean of the statistic values differs from the parameter and the statistic estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

bivariate data

A

data collected on two variables for each individual in a study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

central limit theorem

A

the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

conditional distribution

A

the distribution of the values in a single row (or a single column) of a two-way table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

control chart

A

a statistical tool for monitoring the input or output of a process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

control limits

A

u-3sigma/rt n and u+3sigma/rt n; used to detect out-of-control signals in a control chart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

correlation coefficient

A

a measure of the strength of the linear relationship between two quantitative variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

disjoint events

A

events that cannot occur simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

distribution of a variable

A

a list of the possible values of a variable together with the frequency of each value (probabilities can be given instead of frequencies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

event

A

a single outcome or a combination of outcomes from a random phenomenon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

extrapolation

A

predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

inference

A

using results from a sample statistic value to draw conclusions about the population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

influential observation

A

an observation that substantially alters the values of slope and y intercept in the regression equation when it is included in the computations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

law of large numbers

A

The fact that the average (x bar) of observed values in a sample will get closer and closer to u as the sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

laws of probability

A

the basis for hypothesis testing and confidence interval estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

least squares

A

a method for finding the equation of a line that minimizes the sum of squared residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

least squares regression line:

A

the line with the smallest sum of squared residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

lurking variable

A

a variable that is not measured but explains association between two variables that are measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

marginal distribution

A

the distribution of the values in the “total” row (or the “total” column) of a two-way table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

mean of the sampling distribution of x bar

A

the mean of all the sample means (x bars) from all possible samples of size n from a population; equals u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

u

A

the mean of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

no association

A

a condition where values of one variable occur independent of values of another variable; detected when the conditionals of a two-way table equal the marginal distribution (and each other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

out-of-control process

A

one sample mean outside three standard deviations of x bar or 9 sample means in a row above or below the center line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
outlier
an observation that falls outside the overall pattern of the data set
26
parameter
a characteristic of a population that is usually unknown; this could be mean, median, proportion, standard deviation computed on all the data from the population; a parameter does not have variability
27
parameter symbols
u, sigma, and p (mean of population, standard deviation of population, proportion of a population)
28
positive association
high values of one variable tend to associate with high values of another variable.
29
probability of an outcome
a measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome.
30
process
sequence of operations used in production, manufacturing, etc.
31
process in statistical control
a process whose inputs and outputs exhibit natural variation when observed over time
32
quality control chart
a chart plotting the means, x bar, of regular samples of size n against time; this chart is used to access whether the process is in control.
33
quantitative bivariate:
the type of data required for regression analysis
34
r
the symbol for correlation coefficient
35
r squared
the percentage of total variation in the response variable, y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, y, that is explained by the explanatory variable, X.
36
random
a phenomenon that describes the uncertainty of individuals outcomes but gives a regular distribution of the outcomes in the long run.
37
regression equation
a formula for a line that models a linear relationship between two quantitative variables
38
residual
the observed y minus the predicted y; denoted y-yhat
39
residual plot
a diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; complete scatter in a shoebox pattern is good whereas a megaphone pattern denotes unequal variance in Y's across all levels of X and curvature in the form of a smile or a frown denotes that the linear model isnot best for that data.
40
sample mean, x bar
the random variable ot the sampling distribution of x bar
41
sample space
the list of all possible outcomes of a random phenomenon
42
sampling distribution
a distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value
43
sampling distribution of x bar
a list of all the possible values for x bar together with the frequency (or probability) of each value; in other words, the distribution of all x bar's from all possible samples
44
sampling variability
the variability of sample results from one sample to the next; something we must measure in order to effectively do inference
45
scatterplot
a two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship.
46
Simpson's paradox
a condition where the percentages reverse when a third (lurking) variable is ignored; in other words, a condition leading to misinterpretation of the direction of association between two variables caused by ignoring a third variable that is associated with both of the reported variables.
47
simulation
using random numbers to imitate chance behavior
48
slope
a measure of the average change in the response variable for every one unit increase in the explanatory or independent variable
49
standard deviation (s):
a measure of the variability of data in a sample about x bar.
50
standard deviation of x bar, also called the standard deviation of the sampling distribution of x bar
a measure of the variability of the values of the statistic x bar about u; a measure of the variability of the sampling distribution of x bar; in other words, the "average" amount that the statistic, x bar, deviates from its associated parameter. computed as sigma/rt n
51
statistic
a number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter.
52
statistic symbols:
x bar, s, p hat (mean of sample, standard deviation of sample, proportion of sample)
53
statistical process control
a procedure used to check a process at regular intervals to detect problems and correct them before they become serious.
54
sum of squared residuals (or error)
the residuals are squared and added; denoted SSE.
55
total variation in Y:
the sum of the squared deviations of the Y observations about their mean, y hat
56
two-way table
a table containing counts for two categorical variables. It has r rows and c columns
57
unbiased
a condition where the mean of the statistic values equals the parameter that the statistic estimates
58
unexplained variation
the sum of squared residuals
59
X:
the symbol for explanatory variable
60
x bar-chart
a plot of sample means over time used to assess whether a process is in control
61
Y:
the symbol for response variable
62
y hat:
the symbol for predicted y
63
z-score
a measure of the number of standard deviations of a value or observation from the mean.