Final: Glossary Flashcards
alternative hypothesis
alternative hypothesis: A statement about the value of a parameter that is either “less than,” “greater than,” or “not equal to” a hypothesized number or another parameter; the hypothesis that the researcher usually wants to prove or verify.
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA): A procedure used to test equality of three or more means.
association
association: For quantitative data, large values of one variable tend to occur with large (or small) values of another variable. For categorical data, certain responses for one variable tend to occur with certain responses of the other variable.
association vs. causation
association vs. causation: We can only argue causation from association if the results having significant association are from an experiment.
bar graph
bar graph: A graphical representation of categorical data. Names of each category are listed on the x axis and a bar that has height representing the frequency (or percentage) in that category is placed over each category name.
bias (sampling)
bias (sampling): A condition that occurs when the design of a study systematically favors certain outcomes.
bivariate data
bivariate data: Two measurements are made on each unit.
block
block: A group of experimental units sharing some common characteristic. In a randomized complete block design, random allocation of treatments is carried out separately within each group.
boxplot
boxplot: A plot of data that incorporates the maximum observation, the minimum observation, the first quartile, the second quartile (median), and the third quartile.
categorical (or qualitative) variable
categorical (or qualitative) variable: A variable that can be classified into groups or categories such as gender and religion.
causation
causation: Changes in the explanatory variable directly affect the response variable. Experiments are needed to verify causation.
census
census: The enumeration of every unit in a population.
center
center: A summary number about which observations tend to cluster. Measures of center include the mean and
the median.
center line
center line: The middle line on a control chart. Its value is the target value of the mean when the process is in control.
Central Limit Theorem (CLT):
Central Limit Theorem (CLT): The name of the theorem stating that the sampling distribution of a statistic (e.g. x ) is approximately normal whenever the sample is large and random.
Chi-distribution
Chi-distribution: The theoretical distribution that models the test statistic for doing Chi-Square tests.
Chi-square test statistic
Chi-square test statistic: A test statistic computed from data that has an approximate Chi-square distribution.
claimed parameter value
claimed parameter value: The value of the parameter as given in the null hypothesis.
comparison study
comparison study: A study that compares only active treatments to determine which works best.
conditions
conditions: The basic premises that must be checked before using a statistical procedure.
conditional distribution
conditional distribution: The distribution of one variable restricted to a single row (or column) of another variable in a two way table. A conditional distribution is found by dividing the values in the row (or column) by the row (or column) total.
conditional percentage
conditional percentage: In a contingency table, the percentage of a category in a row (or column) found by dividing the appropriate cell count by the row (or column) total.
confidence interval
confidence interval: An estimate of the value of a parameter in interval form with an associated level of confidence; it gives a list of plausible values for the parameter based on the value of the statistic.
confidence level
confidence level: The percentage of all possible samples for which the confidence intervals will contain the parameter being estimated; selected subjectively by the researcher.
confounding
confounding: A situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable.
control treatment
control treatment: A treatment where no experimental condition is applied to the units in order to determine whether the active treatments affect the response. This enables the researcher to “control” for lurking variables.
control chart
control chart: A chart plotting the means ( x ’s) of regular samples of size n against time. It has a center line and upper and lower control limits to determine whether a process is in control or out of control.
control limits
control limits: Lines on either side of the center line computed using μ − 3σ /sqrt(n) and μ + 3σ/sqrt(n) . A sample mean outside of these bounds signals that the process is out of control.
convenience sample
convenience sample: A sample type where the researcher contacts those subjects who are readily available and does not use any random selection. The results are almost always biased.
correlation coefficient
correlation coefficient: A measure of the strength of the linear relationship between two quantitative variables, symbolized with the letter r.
data
data: Information collected on individuals.
degrees of freedom
degrees of freedom: A characteristic of the t-distribution (and other distributions like F and χ2) indicating the amount of information available in the data. A complete definition of “degrees of freedom” is beyond the scope of an introductory statistics course.
density curve
density curve: A mathematical model used to describe the overall pattern of the distribution of a random variable
deviation
deviation: The difference (distance) between an observation and the mean of all the observations in a data set, or the difference between an observation and the corresponding regression model estimate.
direction of relationship
direction of relationship: A characteristic of data in a scatterplot that is identified as either a positive or negative association
distribution
distribution: A list of all possible values of a variable together with the frequency (or probability) of each value.
dotplot
dotplot: A one dimensional plot of a quantitative data set where each value in the data set is represented by a dot
above its corresponding location on the x axis.
double blind study
double blind study: An experiment where neither the subjects nor the diagnosticians (e.g. doctor or nurse) know which treatment is administered to whom.
equal variance or equal standard deviation)
equal variance or equal standard deviation): Variances (or standard deviations) for each of the treatment groups (or samples) in ANOVA are all equal. In regression, the variances of the y’s at each x are all assumed to be equal.
estimate of a parameter
estimate of a parameter: A single value or a range of values used to estimate a parameter.
expected count
expected count: An estimate of how many observations should be in a cell of a two way table if there were no
association between the row and column variables.
experiment
experiment: A study where treatments are deliberately imposed on the individuals in the study before data is gathered in order to observe their responses to the treatment.
explained variation
explained variation: The amount of total variation in the y’s that is accounted for by a regression model; it is equalto∑(yˆ−y)2 .
explanatory variable
explanatory variable: A variable that may or may not explain the outcomes (responses) of a study, also called independent or predictor variable.
extrapolation
extrapolation: Using a model to predict a y value for an x value that is outside the range of observed x’s. Extrapolation is dangerous and strongly discouraged because the relationship between x and y may be different outside the range of observed x’s.
factor
factor: A term synonymous with explanatory variable in an experiment.
fail to reject Ho
fail to reject Ho: The appropriate statistical conclusion in hypothesis testing when the P-value is greater
than α; equivalently, conclude that “There is not enough evidence to believe Ha.”
failure
failure: Any category that is not of primary interest in a categorical data set.
F distribution
F distribution: The distribution that models the ratio of two variance estimates; used in ANOVA for obtaining the P-value for testing equality of three or more means.
five number summary
five number summary: These five values: minimum, Q1, median, Q3, maximum; preferred numerical summary when data are very skewed or outliers are present.
follow-up analysis
follow-up analysis: The analysis performed on data after an overall test on the equality of multiple means or the equality of multiple proportions is found to be significant. It determines which means or which proportions differ from which.
form of relationship
form of relationship: A description of data in a scatterplot indicating whether the data have a linear relationship, a curved relationship or no relationship.
F test statistic
F test statistic: A test statistic that has an F distribution.
histogram
histogram: A graphical display of a quantitative data set; data are grouped into intervals (usually of equal width) and a bar is drawn over each interval having height proportional to the frequency (or percentage) of values in the interval. Values of the variable are given on the x axis and frequencies (or percentages) are given on the y axis. Histograms are examined to determine shape, center and spread.
in control
in control: A process functioning within acceptable limits.
independent samples
independent samples: SRS’s collected separately from each of two (or more) disjoint populations; matched pairs
data are considered to be dependent samples.
individual
individual: Each object or unit described or examined in a data set.
inference
inference: Using results from a sample statistic value to draw conclusions about the population parameter.
influential point
influential point: An observation that substantially alters the fitted regression equation.
interquartile range (IQR)
interquartile range (IQR): The difference between Q3 and Q1 (i.e. Q3 – Q1); the length of the box in a boxplot; contains 50% of the data.
interviewer bias
interviewer bias: Bias introduced into survey results by body language, voice intonation, gender, race, etc. of an interviewer.
lack of realism
lack of realism: A weakness in experiments where the setting of the experiment does not realistically duplicate the conditions we really want to study.
law of large numbers
law of large numbers: The fact that the average of observed values in a sample ( x ) will tend to get closer and closer to μ as the sample size increases.
least squares regression line
least squares regression line: The line that minimizes the sum of squared residuals.
left skewed
left skewed: A density curve where the left side of the distribution extends in a long tail. (Mean < median.)
left-tailed alternative hypothesis
left-tailed alternative hypothesis: An alternative hypothesis that states the parameter value is less than some number or the parameter from another treatment or population. (e.g. H a : μ < 85 )
lower tailed alternative hypothesis
lower tailed alternative hypothesis: Another name for a left-tailed alternative hypothesis.
lurking variable
lurking variable: A variable that the researcher is not necessarily interested in studying but which affects the relationship between the explanatory variable and the response variable.
mall-intercept sample
mall-intercept sample: A sample where respondents are contacted in a shopping mall or similar location. Often the method of selection is haphazard although occasionally systematic.
margin of error for 95% confidence
margin of error for 95% confidence: The maximum amount that a statistic value will differ from the parameter value for the middle 95% of the statistics. (Note: Changing the level of confidence changes the percentage of interest, e.g. 95%.)
marginal distribution
marginal distribution: The distribution of only one variable in a two way table using counts found by summing over the categories of the other variable.
marginal percentage
marginal percentage: The percentage for a row (or column) total in a two table found by dividing the row (or column) total by the table total.
matched pairs
matched pairs: A design of experiment that combines matching of subject or measurements with randomization. Either two measurements taken on each unit (such as pre and post) OR measurements taken on two individuals matched by some characteristics different from the explanatory variable and the response variable.
matched pairs t procedure for mean
matched pairs t procedure for mean: The hypothesis testing method for matched pairs data. The standard null hypothesis is H0: μd = 0 where μd is the mean difference between treatments.
maximum
maximum: The largest value in a data set.
mean
mean: A measure of the center of the data; a value that “balances” the data; found by summing all the data and
dividing by the number of data points.