Predictive Analytics Flashcards

1
Q

Marital status and eye color are examples of what sort of data?

A

Nominal scales - categorical data divided into distinct categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Survey responses (e.g. strong agree, agree, neutral…) are examples of what sort of data?

A

Ordinal scales - categorical data that aims to rank data in a specific order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three patterns of missing data?

A

MCAR - Missing completely at random
MAR - Missing at random
MNAR - Missing not at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data missing due to test sensors losing connectivity is an example of which missing data pattern?

A

MCAR - Missing completely at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Assuming more women than men will answer a survey regarding skin routine would create an example of which missing data pattern?

A

MAR - Missing at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When there is a pattern to the missing data however it is not on the primary dependent variable is a trait of which missing data pattern?

A

MAR - Missing at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data missing due to patients refusing to disclose info on sensitive topics is an example of which missing data pattern?

A

MNAR - Missing not at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A __________ is a supposition or observation regarding the results of sampling data.

A

Hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The null hypothesis being rejected in error when it is actually true is which type of error?

A

Type I error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The null hypothesis is not rejected in error when it is actually false is which type of error?

A

Type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

This statistic is used for normally distributed data and a known population standard deviation.

A

z-statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

This statistic is used for normally distributed data and an unknown population standard deviation.

A

t-statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: P-values measure the probability that the null hypothesis is true.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which statistical summary value provides the measure of location, or central tendency?

A

Arithmetic mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The _______ is the middle number in a set of observations that are in order.

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The _______ is the number in a set of observations that occurs most often.

A

Mode

17
Q

The _______ measure the spread of the distribution of a set of observations.

A

Standard deviation

18
Q

Most often in predictive modeling, _______ is considered in the context of normal distributions and provides a measure of statistical dispersion.

A

Standard deviation

19
Q

Which statistical summary measurement measures the balance of a distribution?

A

Skewness

20
Q

_______ measures the shape (how fat or thin compared to a normal distribution) of the distribution.

A

Kurtosis

21
Q

_______ is a measure of the numerical relationship of one variable to another.

A

Correlation

22
Q

What does PCA stand for?

A

Principal component analysis

23
Q

From chapter 3, what are the three key measures of central tendency?

A

Mean, median, mode

24
Q

From chapter 3, what are the five key measures of variability?

A

Standard deviation, variance, range, kurtosis, and skewness

25
Q

As a measure of central tendency the _______ is not influenced by outliers.

A

Median

26
Q

What is the average of the squared deviations of each observation from the mean?

A

Variance

27
Q

What is the square root of the variance?

A

Standard deviation

28
Q

The _______ is the most widely used transformation method to deal with skewed data.

A

Logarithmic function

29
Q

By _______ data there is a loss of information and power and, if not necessary, it should be avoided.

A

Binning

30
Q

An r-squared value of 0.25 indicates what?

A

That 25% of the variation in target variable (y) is explained by the input variable (x)

31
Q

_______ identifies the correlations and covariances between the input variables and creates groups or clusters of similar variables.

A

Variable clustering

32
Q

When using variable clustering, the clusters are determined by calculating the _______ distance.

A

Euclidean distance

33
Q

When using variable clustering, the _______ is the average of the points in the cluster; a representative point that lies at the center.

A

Centroid

34
Q

The _______ for a given cluster measures the variance in all the variables.

A

Eigenvalue

35
Q

_______ is a variable reduction strategy that is used when there are several redundant variables or variables that are correlated with one another and may be measuring the same construct.

A

Principal component analysis (PCA)

36
Q

The _______ defines the condition that the researchers need to discredit before suggesting an effect exists.

A

Null hypothesis

37
Q
A