Chapter 1 Flashcards

1
Q

What is “big data”?

A

explosion in secondary data typified by increases in the volume, variety, and velocity of the data being made available from a myriad set of sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is “bivariate partial correlation”?

A

simple (two-variable) correlation between 2 sets of residuals (unexplained variance) that remain after the association of other independent variables is removed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is “bootstrapping”?

A

approach to validating a multivariate model by drawing a large number of subsamples and estimating models for each subsample
● Doesn;t rely on statistical assumptions about the population to assess statistical significance, instead makes assessment based solely on the sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is “causal inference”?

A

methods that move beyond statistics inference to the stronger statement of “cause and effect” in non-experimental situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is “cross validation”?

A

original sample is divided into a number of smaller-subsamples (validation samples), the validation fit is the “average” fit across all sub-samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are “data mining models”?

A

based on algorithms that are widely iused in big data applications
● Emphasis on predictive accuracy rather than statistical inference and explanation as seen in satisical/data models such as multiple regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is “dependence technique”?

A

classification of statistical techniques distinguished by having a variable or set of variables identified as the dependent variable(s) and the remaining variables as independent
● Objective = prediction of the DV(s) by IV(s)
● Depedent variable → presiumed effect of, or response to, a change in the IV(s)
● Independent variable → presumed cause of any change in the DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is “dimensional reduction”?

A

reduction of multicollinearity among variables by forming composite measures of multicollinear variable through such methods as exploratory factor analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is “directed acyclic graph (DAG)”?

A

Graphical portrayal of causal relationships used in causal inference analysis to identify all “threats” to causal inference. Similar in some ways to path diagrams used in structural equation modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a “dummy variable”?

A

non metrically measured variable transformed into a metric variable
○ Assigning a 1 or 0 to a subject
○ Always have one dummy variable less than the number of levels for the nonmetric variable
■ The omitted category is the reference category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Effect size

A

estimate of the degree to which the phenomenon being studied (e.g. correlation or difference in means) exists in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Estimation sample

A

portion of original sample used for model estimation in conjunction with validation sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Validation sample

A

potion of the sample “held out” from estimation and then used for an independent assessment of model fit on data that wasn’t used in estimation (holdout sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

General linear model (GLM)

A

Fundamental linear dependence model which can be used to estimate many model types (e.g., multiple regression, ANONA/MANOVA, discriminant analysis) with the assumption of a normally distributed dependent measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Generalized linera model (GLZ or GLIM)

A

similar in form to GLM, but able to accommodate non-normal depedent measures such as binary variables
● Logistic regression model
● Uses maximum likelihood estimation rather than ordinary least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Indicator

A

single variable used in conjunction with one or more others variables to form a
● Composite measure → combination of two or more indicators

17
Q

Measurement error

A

inaccuracies of measuring the “true” variable values due to the fallibility of the measurement instrument, data entry errors, or respondent errors

18
Q

Metric data

A

Also called quantitative data, interval data, or ratio data, these measurements identify or describe subjects (or objects)
not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by the
attribute. For example, a person’s age and weight are metric data.
● = Quantitative data, interval data, or ratio data

19
Q

Non-metric Data

A

Also called qualitative data, these are attributes, characteristics, or categorical properties that identify or describe a subject or object. They differ from metric data by indicating the presence of an attribute, but not the amount.
Examples are occupation (physician, attorney, professor) or buyer status (buyer, non-buyer). Also called nominal data or
ordinal data.
● Difference from metric → these indicate the presence of an attribute, but not the amount

20
Q

Multicollinearity

A

Extent to which a variable can be explained by the other variables in the analysis.
- As multicollinearity increases, it complicates the interpretation of the variate because it is more difficult to ascertain the effect of any single variable, owing to their interrelationships.

21
Q

Mutivariate analysis

A

Analysis of multiple variables in a single relationship or set of relationships.

22
Q

Multivariate measurement

A

the use of two or more variables as indicators of a single composite measure
- For example, a personality
test may provide the answers to a series of individual questions (indicators), which are then combined to form a single score
(summated scale) representing the personality trait.

23
Q

Overfitting

A

estimation of model parameters that over-represent the characteristics of the sample at the expense of generalizability to the population

24
Q

Practical significance

A

assessing multivariate analysis results based on the substantive findings rather than their statistical significance
● E.g. assesses whether the result is useful in achieving research objectives vs just finding whether the result is attributable to chance

25
Q

Reliability

A

extent to which a (set of) variable(s) is consistent in what it’s intended to measure
● If multiple measurements are taken, reliable measures will all be consistent in their values
- It differs from validity in that it relates not to what should be measured, but instead to how it is measured.
● Consistency of the measure

26
Q

Validity

A

extent to which a (set of) measure(s) correctly represents the concept of study
● Degree to which it’s free from any systematic or nonrandom error
● Concerned with how well the concept is defined by the measure(s) (vs teh consistency of measures, as with reliability)

27
Q

Specificaiton error

A

omitting a key variable from the analysis, affecting the estimated effects of included variables

28
Q

Statistical model

A

specific model is proposed, then estimated and a statistical inference is made as to its generalizability to the population through statistical tests

29
Q

Summated scales

A

method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement through multivariate measurement
- In most instances, the separate variables are
summed and then their total or average score is used in the analysis.

30
Q

Treatment

A

Independent variable the researcher manipulates to see the effect (if any) on the dependent variable(s), such as in an
experiment (e.g., testing the appeal of color versus black-and-white advertisements).

31
Q

Type I error

A

Type I error → probability of incorrectly rejecting H0
● Saying an effect exists when it actually doesn’t
● = Alpha (α)

32
Q

Type II error

A

Type II error → probability of incorrectly failing to reject H0
● Chance of not finding an effect when it does exist
● = Beta (β)
● 1 - β = power

33
Q

Power

A

probability of correctly rejecting H0 (null hypothesis) when it’s false → correctly finding a hypothesized relationship when it exists
● Function of
1. Statistical significance set by researcher for a type 1 error (α)
2. Sample size used
3. Effect size being examined

34
Q

Univariate analysis of variance (ANOVA)

A

statistical technique used to determine, on the basis of one DV whether samples are from populations with equal means

35
Q

Variate

A

linear combination of variables formed in the multivariate technique by deriving empirical weghts applied to a set of variables specified by the researcher