Chapter 2 Flashcards

1
Q

Binning

A

Categorizing a metric variable into a smaller number of categories/bins and thus converting the variable into a nonmetric form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Boxplot

A

box reporesents the major portion of the distribution and the extensions (whiskers) reach to teh extreme points of the distribution
- useful in making comparisons of one or more metric variables across groups formed by a nonmetric variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cardinality

A

the number of distinct values for a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Censored data

A

observations that are incomplete in a systematic and known way
- censored data are an example of “ignorable missing data”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

curse of dimensionality

A

problems associated with including a very large number of variables in the analysis
- distance measures becomign less useful
- higher potential for irrelevant variables
- differing scales of measurement for the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data management

A

all activities associated with assembling a dataset for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data quality

A

accuracy of the informatin in a dataset
● 8 dimensions → completeness / availability and accessibility / currency / accuracy / validity / usability and interpretability / reliability and credibility / consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

dCor

A

newer measure of association that is distance-based and more sensitive to nonlinear pattenrs in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

dichotomization

A

dividing cases into classes basedon being above or below a specific value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

elasticity

A

measure of ratio of % change in Y for a % change in X.
Obtained by using a log-log transformation of both DVs and IVs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Heat map

A

form of a scatterplot of nonmetric variables where frequency within each cell is color-coded to depic relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of equal variance of the population error E

A

E is estimtaed from e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Homoscedasticity

A

variance of the error terms (e) appears constant over a range of predictor variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Heteroscedasticitiy

A

error terms have increasing or modulating variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Histogram

A

graphical display of the distribution of a single variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hoeffding’s D

A

new measure of association/correlation, based on distance measures between the variables and thus more likely to incorporate nonlinear components