Chapter 1 Flashcards

Question 1

Q

What is “big data”?

Answer

A

explosion in secondary data typified by increases in the volume, variety, and velocity of the data being made available from a myriad set of sources

Question 2

Q

What is “bivariate partial correlation”?

Answer

A

simple (two-variable) correlation between 2 sets of residuals (unexplained variance) that remain after the association of other independent variables is removed

Question 3

Q

What is “bootstrapping”?

Answer

A

approach to validating a multivariate model by drawing a large number of subsamples and estimating models for each subsample
● Doesn;t rely on statistical assumptions about the population to assess statistical significance, instead makes assessment based solely on the sample data

Question 4

Q

What is “causal inference”?

Answer

A

methods that move beyond statistics inference to the stronger statement of “cause and effect” in non-experimental situations

Question 5

Q

What is “cross validation”?

Answer

A

original sample is divided into a number of smaller-subsamples (validation samples), the validation fit is the “average” fit across all sub-samples

Question 6

Q

What are “data mining models”?

Answer

A

based on algorithms that are widely iused in big data applications
● Emphasis on predictive accuracy rather than statistical inference and explanation as seen in satisical/data models such as multiple regression

Question 7

Q

What is “dependence technique”?

Answer

A

classification of statistical techniques distinguished by having a variable or set of variables identified as the dependent variable(s) and the remaining variables as independent
● Objective = prediction of the DV(s) by IV(s)
● Depedent variable → presiumed effect of, or response to, a change in the IV(s)
● Independent variable → presumed cause of any change in the DV

Question 8

Q

What is “dimensional reduction”?

Answer

A

reduction of multicollinearity among variables by forming composite measures of multicollinear variable through such methods as exploratory factor analysis

Question 9

Q

What is “directed acyclic graph (DAG)”?

Answer

A

Graphical portrayal of causal relationships used in causal inference analysis to identify all “threats” to causal inference. Similar in some ways to path diagrams used in structural equation modeling.

Question 10

Q

What is a “dummy variable”?

Answer

A

non metrically measured variable transformed into a metric variable
○ Assigning a 1 or 0 to a subject
○ Always have one dummy variable less than the number of levels for the nonmetric variable
■ The omitted category is the reference category

Question 11

Q

Effect size

Answer

A

estimate of the degree to which the phenomenon being studied (e.g. correlation or difference in means) exists in the population

Question 12

Q

Estimation sample

Answer

A

portion of original sample used for model estimation in conjunction with validation sample

Question 13

Q

Validation sample

Answer

A

potion of the sample “held out” from estimation and then used for an independent assessment of model fit on data that wasn’t used in estimation (holdout sample)

Question 14

Q

General linear model (GLM)

Answer

A

Fundamental linear dependence model which can be used to estimate many model types (e.g., multiple regression, ANONA/MANOVA, discriminant analysis) with the assumption of a normally distributed dependent measure.

Question 15

Q

Generalized linera model (GLZ or GLIM)

Answer

A

similar in form to GLM, but able to accommodate non-normal depedent measures such as binary variables
● Logistic regression model
● Uses maximum likelihood estimation rather than ordinary least squares

Question 16

Q

Indicator

Answer

A

single variable used in conjunction with one or more others variables to form a
● Composite measure → combination of two or more indicators

Question 17

Q

Measurement error

Answer

A

inaccuracies of measuring the “true” variable values due to the fallibility of the measurement instrument, data entry errors, or respondent errors

Question 18

Q

Metric data

Answer

A

Also called quantitative data, interval data, or ratio data, these measurements identify or describe subjects (or objects)
not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by the
attribute. For example, a person’s age and weight are metric data.
● = Quantitative data, interval data, or ratio data

Question 19

Q

Non-metric Data

Answer

A

Also called qualitative data, these are attributes, characteristics, or categorical properties that identify or describe a subject or object. They differ from metric data by indicating the presence of an attribute, but not the amount.
Examples are occupation (physician, attorney, professor) or buyer status (buyer, non-buyer). Also called nominal data or
ordinal data.
● Difference from metric → these indicate the presence of an attribute, but not the amount

Question 20

Q

Multicollinearity

Answer

A

Extent to which a variable can be explained by the other variables in the analysis.
- As multicollinearity increases, it complicates the interpretation of the variate because it is more difficult to ascertain the effect of any single variable, owing to their interrelationships.

Question 21

Q

Mutivariate analysis

Answer

A

Analysis of multiple variables in a single relationship or set of relationships.

Question 22

Q

Multivariate measurement

Answer

A

the use of two or more variables as indicators of a single composite measure
- For example, a personality
test may provide the answers to a series of individual questions (indicators), which are then combined to form a single score
(summated scale) representing the personality trait.

Question 23

Q

Overfitting

Answer

A

estimation of model parameters that over-represent the characteristics of the sample at the expense of generalizability to the population

Question 24

Q

Practical significance

Answer

A

assessing multivariate analysis results based on the substantive findings rather than their statistical significance
● E.g. assesses whether the result is useful in achieving research objectives vs just finding whether the result is attributable to chance

Question 25

Q

Reliability

Answer

A

extent to which a (set of) variable(s) is consistent in what it’s intended to measure
● If multiple measurements are taken, reliable measures will all be consistent in their values
- It differs from validity in that it relates not to what should be measured, but instead to how it is measured.
● Consistency of the measure

Question 26

Q

Validity

Answer

A

extent to which a (set of) measure(s) correctly represents the concept of study
● Degree to which it’s free from any systematic or nonrandom error
● Concerned with how well the concept is defined by the measure(s) (vs teh consistency of measures, as with reliability)

Question 27

Q

Specificaiton error

Answer

A

omitting a key variable from the analysis, affecting the estimated effects of included variables

Question 28

Q

Statistical model

Answer

A

specific model is proposed, then estimated and a statistical inference is made as to its generalizability to the population through statistical tests

Question 29

Q

Summated scales

Answer

A

method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement through multivariate measurement
- In most instances, the separate variables are
summed and then their total or average score is used in the analysis.

Question 30

Q

Treatment

Answer

A

Independent variable the researcher manipulates to see the effect (if any) on the dependent variable(s), such as in an
experiment (e.g., testing the appeal of color versus black-and-white advertisements).

Question 31

Q

Type I error

Answer

A

Type I error → probability of incorrectly rejecting H0
● Saying an effect exists when it actually doesn’t
● = Alpha (α)

Question 32

Q

Type II error

Answer

A

Type II error → probability of incorrectly failing to reject H0
● Chance of not finding an effect when it does exist
● = Beta (β)
● 1 - β = power

Question 33

Q

Power

Answer

A

probability of correctly rejecting H0 (null hypothesis) when it’s false → correctly finding a hypothesized relationship when it exists
● Function of
1. Statistical significance set by researcher for a type 1 error (α)
2. Sample size used
3. Effect size being examined

Question 34

Q

Univariate analysis of variance (ANOVA)

Answer

A

statistical technique used to determine, on the basis of one DV whether samples are from populations with equal means

Question 35

Q

Variate

Answer

A

linear combination of variables formed in the multivariate technique by deriving empirical weghts applied to a set of variables specified by the researcher