statistics exam 2 Flashcards
association
values of one variable tend to occur with certain values of another variable; detected when the conditional distributions differ from the marginal distribution and from each other.
bias
a condition where the mean of the statistic values differs from the parameter and the statistic estimates
bivariate data
data collected on two variables for each individual in a study.
central limit theorem
the name of the statement telling us that the sampling distribution of x bar is approximately normal whenever the sample is large and random.
conditional distribution
the distribution of the values in a single row (or a single column) of a two-way table.
control chart
a statistical tool for monitoring the input or output of a process
control limits
u-3sigma/rt n and u+3sigma/rt n; used to detect out-of-control signals in a control chart.
correlation coefficient
a measure of the strength of the linear relationship between two quantitative variables.
disjoint events
events that cannot occur simultaneously
distribution of a variable
a list of the possible values of a variable together with the frequency of each value (probabilities can be given instead of frequencies)
event
a single outcome or a combination of outcomes from a random phenomenon
extrapolation
predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off.
inference
using results from a sample statistic value to draw conclusions about the population parameter.
influential observation
an observation that substantially alters the values of slope and y intercept in the regression equation when it is included in the computations.
law of large numbers
The fact that the average (x bar) of observed values in a sample will get closer and closer to u as the sample size increases.
laws of probability
the basis for hypothesis testing and confidence interval estimation
least squares
a method for finding the equation of a line that minimizes the sum of squared residuals.
least squares regression line:
the line with the smallest sum of squared residuals
lurking variable
a variable that is not measured but explains association between two variables that are measured.
marginal distribution
the distribution of the values in the “total” row (or the “total” column) of a two-way table
mean of the sampling distribution of x bar
the mean of all the sample means (x bars) from all possible samples of size n from a population; equals u
u
the mean of the population
no association
a condition where values of one variable occur independent of values of another variable; detected when the conditionals of a two-way table equal the marginal distribution (and each other)
out-of-control process
one sample mean outside three standard deviations of x bar or 9 sample means in a row above or below the center line.
outlier
an observation that falls outside the overall pattern of the data set
parameter
a characteristic of a population that is usually unknown; this could be mean, median, proportion, standard deviation computed on all the data from the population; a parameter does not have variability
parameter symbols
u, sigma, and p (mean of population, standard deviation of population, proportion of a population)
positive association
high values of one variable tend to associate with high values of another variable.
probability of an outcome
a measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome.
process
sequence of operations used in production, manufacturing, etc.
process in statistical control
a process whose inputs and outputs exhibit natural variation when observed over time
quality control chart
a chart plotting the means, x bar, of regular samples of size n against time; this chart is used to access whether the process is in control.
quantitative bivariate:
the type of data required for regression analysis
r
the symbol for correlation coefficient
r squared
the percentage of total variation in the response variable, y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, y, that is explained by the explanatory variable, X.
random
a phenomenon that describes the uncertainty of individuals outcomes but gives a regular distribution of the outcomes in the long run.
regression equation
a formula for a line that models a linear relationship between two quantitative variables
residual
the observed y minus the predicted y; denoted y-yhat
residual plot
a diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; complete scatter in a shoebox pattern is good whereas a megaphone pattern denotes unequal variance in Y’s across all levels of X and curvature in the form of a smile or a frown denotes that the linear model isnot best for that data.
sample mean, x bar
the random variable ot the sampling distribution of x bar
sample space
the list of all possible outcomes of a random phenomenon
sampling distribution
a distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value
sampling distribution of x bar
a list of all the possible values for x bar together with the frequency (or probability) of each value; in other words, the distribution of all x bar’s from all possible samples
sampling variability
the variability of sample results from one sample to the next; something we must measure in order to effectively do inference
scatterplot
a two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship.
Simpson’s paradox
a condition where the percentages reverse when a third (lurking) variable is ignored; in other words, a condition leading to misinterpretation of the direction of association between two variables caused by ignoring a third variable that is associated with both of the reported variables.
simulation
using random numbers to imitate chance behavior
slope
a measure of the average change in the response variable for every one unit increase in the explanatory or independent variable
standard deviation (s):
a measure of the variability of data in a sample about x bar.
standard deviation of x bar, also called the standard deviation of the sampling distribution of x bar
a measure of the variability of the values of the statistic x bar about u; a measure of the variability of the sampling distribution of x bar; in other words, the “average” amount that the statistic, x bar, deviates from its associated parameter. computed as sigma/rt n
statistic
a number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter.
statistic symbols:
x bar, s, p hat (mean of sample, standard deviation of sample, proportion of sample)
statistical process control
a procedure used to check a process at regular intervals to detect problems and correct them before they become serious.
sum of squared residuals (or error)
the residuals are squared and added; denoted SSE.
total variation in Y:
the sum of the squared deviations of the Y observations about their mean, y hat
two-way table
a table containing counts for two categorical variables. It has r rows and c columns
unbiased
a condition where the mean of the statistic values equals the parameter that the statistic estimates
unexplained variation
the sum of squared residuals
X:
the symbol for explanatory variable
x bar-chart
a plot of sample means over time used to assess whether a process is in control
Y:
the symbol for response variable
y hat:
the symbol for predicted y
z-score
a measure of the number of standard deviations of a value or observation from the mean.