Stats SA3 Flashcards

1
Q

What does the data say will happen?

A

Predictive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What has happened or what is happening now?

A

Descriptive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why it happened?

A

Diagnostic Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What will likely happen?

A

Predictive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Predictive Analytics Process Order

A

Project Design, Data Sampling, Data Exploration, Data Modification, Model Validation, Model Development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Kickoff meeting
Understand modeling objective
Define acceptance criteria
Document data and deployment requirement

A

Project Design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data extraction
Apply filters and exclusions
Identify external data sources

A

Data Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Exploratory data analysis
Identify data dependencies and correlations
Identify trends or anomalies in the data

A

Data Exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Cleaning
Data augmentation and transformation
Feature selection

A

Data Modification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model performance review
Feedback based on business knowledge and inputs from subject matter experts (SME’s)

A

Model Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Apply different modeling techniques and select final methodology

A

Model Development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Linear Regression Analysis Formula

A

y = 6x + a + ε

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dependent Variable (Value to be predicted)

A

y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Beta coefficient (Rate multiplied to X)

A

6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Independent variable (Value driving prediction)

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Alpha intercept (Baseline figure for y)

A

α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Error term (Balancing figure)

A

ε

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Reasons for Inclusion for the Error Term (1) :

A

To account for unexplained variability in the dependent variable for other relevant independent variables, which may not have been included in the model

19
Q

Reasons for Inclusion for the Error Term (2) :

A

To capture measurement error in both the dependent and independent variables

20
Q

You can have more than one predictor variable (x1 - xn)

A

Multiple Linear Regression

21
Q

You still need to investigate the model’s _______

A

goodness-of-fit

22
Q

You need to prove if your predictors are _______

A

significant

23
Q

The _________, R^2, is a goodness-of-fit measure

A

coefficient of multiple determination

24
Q

_____ is a figure of merit; the higher the ____, the better is the success of the model in explaining the variation in the response using the set of predictors

A

R^2

25
Q

R^2 is normally expressed as a percentage and is interpreted as the amount of _____ in the response explained by the independent variables

A

variability

26
Q

is a decomposition of the total variation in the response into explained (pattern) and unexplained (error) parts

A

Analysis of Variance (ANOVA)

27
Q

The ______ variability is the amount of variation in the response variable that may be attributed to the predictors explicitly state in the model

A

explained

28
Q

The ______ variability is the amount of variation attribute to random error

A

unexplained

29
Q

SS refers to _____

A

Sum of Squares

30
Q

The df column refers to the _____

A

degrees of freedom

31
Q

The df for _____ is always the number of regression parameters minus one

A

Regression

32
Q

The df for ______, it is the sample size minus the number of regression parameters

A

Residual

33
Q

The total df is the ___ of those two degrees of freedom

A

sum

34
Q

MS refers to _____

A

Mean Squares

35
Q

The values in this column are the ratio of each sum of square to their respective degrees of freedom

A

Mean Squares

36
Q

_______ have no physical meaning but are instrumental in computing the F-statistic

A

Mean squares

37
Q

The _____ determines if regression is meaningful for the data at hand

A

F-test

38
Q

When the ____ is small. it means that there is at least one significant predictor in the analysis

A

p-value

39
Q

When _ is low, __ must go

A

p, Ho

40
Q

The p-value is low if it is less than the ____

A

a significance level

41
Q

The _____ helps in assessing if an individual predictor is significant

A

t-test

42
Q

In t-test, if p < 0.05, ____

A

Significant Predictor

43
Q

In t-test, if p > 0.05, ____

A

Insignificant Predictor