Study Guide - questions B Flashcards

1
Q

What type of sampling has been shown to lead to significant bias?

A

Convenience Sampling - asking subjects easy to identify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In rare event analysis, it may be advantageous to bias the sampling toward sampling those individuals most likely to have….

A

Experience the event of interest. Known as stratified random sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

This sampling method ensures that each subgroup of a given population is adequately represented within the whole sample population of a research study?

A

Stratified random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

It is common to use __________ (and especially regression) to specify the value of interest as a function of the covariates (characteristics).

A

Response surface modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When the variable is ratio scale, _________ are often used to achieve normality.

A

Box-Cox Transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When the dependent variable is categorical, the regression model is typically ______?

A

logistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When the dependent variable is ordinal, the regression model is typically ordered ______?

A

logit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When the dependent variable is ratio, ________ is often used?

A

standard regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If Y is the dependent variable and X1…Xn represent the independent variables, then the typical regression model has the form ________?

A

y=E[Y] + e

where is e is a normally distributed error term, and E[Y] the expected value of Y is a parameterized function (X1…Xn).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Time series analysis typically corrects for _____?

A

Season patterns, and provides a natural way of identifying trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sampling plan - A simple rule of thumb is that _______ the number of individuals sampled reduces the uncertainty in half.

A

Quadrupling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sampling plan - _______ is a common way to measure uncertainty

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sampling plan - If standard deviation does not exist, then the difference between the ____________, is more appropriate.

A

third and first fractile of the uncertainty distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sampling plan - If our uncertainty is described by an exponential family distribution, it will have how may parameters?

A

two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Determining questions to be asked - A key issue in designing the experiment is determining what?

A

The nature of the variable being assessed - i.e. categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of scale asks YES/NO questions or multiple choice for ______

A

Nominal scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For ordinal scales, it is possible to define the normalized quantity for each response x by the fraction of responses __________?

A

Less than or equal to x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Semantic differential survey responses with a form “very hard, somewhat hard, OKAY, somewhat easy, very easy”, where two ends of the scale represent opposites, the response is _______?

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What survey approach asks individuals to rate various factors in order of importance?

A

Rank- order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Determining a control group - measurements are typically only meaningful if there is reference to some kind of _____________?

A

Underlying standard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When the item is an uncertain quantity, the score of an item is the probability of the item outranking a randomly chosen item from the __________?

A

Benchmark group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The benchmark group is commonly referred to as a _____ with the item’s score being called its _______?

A

control……effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The purpose of extraction is to collect all this data from the many sources so that it can eventually be loaded into a common ________.

A

database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In extracting data, it is critical to know the _______ from which each data element was taken.

A

data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is it called if there is a change in the clients analysis, and its important to transition the database to reflect the data sources which the new clients consider important?

A

traceability - and typically requires careful documentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are three reasons why survey quality may be deficient?

A
  1. Respondents get fatigued and put in any value
  2. Respondents may be offended by questions and deliberately fill in false answers
  3. Respondents refuse to fill out the survey
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Data cleaning involves the following 6 items:

A
  1. Identifying the range of valid responses
  2. Identifying invalid data responses
  3. Identifying inconsistent data encodings
  4. Identifying suspicious data responses
  5. Identifying suspicious distribution of values
  6. Identifying suspicious interrelationships between fields.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A key part of data cleaning is determining whether the data makes sense, and also involves handling _______.

A

Null or missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are four possible solutions to missing values?

A
  1. Deletion
  2. Deletion when necessary
  3. Imputing a value
  4. Randomly imputing a value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the 10 “Cs” checks on quality of the data?

A
  1. Completeness
  2. Correctness
  3. Consistency (is data under a given field consistent with definition of that field?)
  4. Currency (is data obsolete?)
  5. Collaborative (is data based on one opinion or a consensus of experts?)
  6. Confidential
  7. Clarity (is data legible and comprehensible)
  8. Common format
  9. Convenient (can data be conveniently and quickly accessed)
  10. Cost-effective (is cost of collecting data commensurate with its value).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

A data warehouse is generally used to describe these three things:

A
  1. A staging area
  2. Data integration in centralized source
  3. Access layers in OLAP data marts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Data marts are organized along a single point of view for efficient data retrieval. It allows analysts to do these 5 things:

A
  1. Slice data (filtering)
  2. Dice data (grouping)
  3. Drill down
  4. Roll-up
  5. pivot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are three examples of fact tables?

A
  1. Transaction fact tables
  2. Snapshot fact tables (at point in time)
  3. Accumulating fact tables (aggregate facts)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Do dimension tables have a larger or smaller number of records compared to fact tables?

A

smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are 5 examples of dimension tables?

A
  1. time
  2. geography
  3. product
  4. employee
  5. range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Discovering relationships in data - what are 5 methods to reduce dimensions in the data?

A
  1. PCA or factor analysis (can determine if there is correlation across different dimensions)
  2. Frequency-inverse document frequency
  3. Feature hashing (creating fixed number of features)
  4. Sensitivity analysis and wrapper methods
  5. Self-organizing maps and Bayes nets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

When data has a variable number of features, _________ is an efficient method of creating a fixed number of features which form the indices of an array.

A

Feature hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

For unstructured text data, __________ identifies the importance of a word in some document in a collection by comparing the frequency with which the word appears in the document…

A

frequency-inverse document frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

_______ and _______ are typically essential when you don’t know which features of your data are important.

A

Sensitivity analysis and wrapper methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Wrapper methods, unlike sensitivity analysis, typically involving identifying a set of features on a small sample and then testing that set on a ________.

A

holdout sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

________ and _______ are helpful in understanding the probability distribution of the data.

A

Self-organizing maps and Bayes nets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Extracting features - ________ is required to ensure your data stays within common ranges.

A

Normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Format conversion is typically required when data is in __________?

A

binary format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Fast Fourner Transformations and Discrete wavelet transformations are used for _________?

A

frequency data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Coordinate transformations are used for geometric data defined over ________?

A

Euclidian

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Collecting and summarizing data - These three plots provide compact representations of how data is distributed?

A
  1. Box plots
  2. Scatter plots
  3. box and whisker plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Collecting and summarizing data - when the data can be reasonably described in parametric distributions, ___________ are even more efficient ways of summarizing data.

A

distribution fitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Collecting and summarizing data - ___________ aggregation is an effective way of summarizing all the information available on an entity

A

Baseball card

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Adding new information to the data - ________ is recommended for tracking source information and other use-defined parameters.

A

Annotation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Adding new info to the data - ____________ and _______ can be helpful in processing certain data fields together or in using one field to compute the value of another.

A

Relational algebra rename and feature addition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What are the 6 methods for segmenting data to find natural groupings?

A
  1. Connectivity-based methods (hierarchical clustering)
  2. Centroid-based methods
  3. Distribution-based methods
  4. Density-based method
  5. Graph-based methods
  6. Topic modeling (text data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

segmentation -A connectivity-based method called _________ generates an ordered set of clusters with variable precision.

A

Hierarchical clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

segmentation - A centroid-based method with a known number of clusters

A

K-means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

segmentation - A centroid-based method with an unknown number of clusters.

A

x-means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

segmentation - A centroid-based method that is an alternate way of enhancing k-means when the number of cluster is unknown

A

canopy clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

segmentation - A distribution-based method that typically uses the expectation-maximization (EM) algorithm and is appropriate if you want any data elements’ membership in a segment to be ‘soft’

A

Gaussian mixture models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

segmentation - Two density-based methods used for non-elliptical clusters are _________?

A

fractal and DB scan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

segmentation - _________ methods are often based on constructing cliques and semi-cliques, and are useful when you only have knowledge of how one item is connected to another.

A

Graph-based models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

segmentation - For text data, this method allows for segmentation of the data.

A

topic modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

variable importance - When the structure of the data is unknown, these methods are helpful.

A

tree-based methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

variable importance - If statistical measures of importance are needed, these models are appropriate.

A

Generalized linear models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

variable importance - if statistical measures of importance are NOT needed, these two methods are useful.

A
  1. regression with shrinkage (e.g. Lasso or elastic net)

2. stepwise regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

classifying data into groups - These two methods are helpful if you’re unsure of feature importance.

A
  1. neural nets

2. random forests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

classifying data into groups - If you require a highly transparent model, this type of model can be preferable.

A

decision trees (i.e. CART, CHAID)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

classifying data into groups - What method should you use if the number of data dimensions is less than 20?

A

k nearest neighbor methods

66
Q

classifying data into groups - If you have a large dataset with an unknown classification, what method should you use?

A

Naive Bayes

67
Q

classifying data into groups - These models are useful in estimating an unobservable state based on observable values.

A

Hidden Markov models

68
Q

Refining BP and AP statements- You may find at this point that the true _______ of the system isn’t what you thought it was, and that therefore the analytics problem needs to be reframed around the newly surfaced constraint.

A

constraint

69
Q

APF - When reformulating the “what” of the business problem into the “how” of the analytics problem, what are the four questions you need to ask?

A
  1. What result do we want?
  2. Who will act?
  3. What will they do?
  4. What will change in the organization as a result of the new information generated?
70
Q

APF - This formal method of decomposition is a rigorous process that maps the translation of requirements from one level to the next (i.e. from business level to the first analytics level)

A

quality function deployment

71
Q

APF - If you are formally decomposing and parsing a complex business statement, or less formally brainstorming with a project sponsor, it is critical to account for these two types of requirements.

A

Tacit and Formal

72
Q

APF - This is the best known model for decomposing and parsing requirements.

A

Kano’s requirements model

73
Q

APF - Kano’s requirements model distinguishes between unexpected customer delights, known customer requirements, and customer ________ that are not explicitly stated.

A

must-haves

74
Q

APF - When you ask business stakeholders for a list of requirements, they will tend to focus on the “normal” requirements not the _______ requirements.

A

expected

75
Q

APF - Your _________ functions are strongly related to your assumptions about what is important about this problem as well as the key metrics by which you’ll measure the organizational response to the problem.

A

input/output functions

76
Q

APF - Once you have inputs and general sense of their predicted effects, what is the next step?

A

communicate them to the team

77
Q

APF - What are two simple approaches for communicating back to the team?

A
  1. Input table

2. black box sketch

78
Q

APF - Key business metrics need to be negotiated, published, committed to, and _______

A

tracked

79
Q

APF - the output of the stakeholder agreement will vary by organization, but should include the following 5 items:

A
  1. budget
  2. timeline
  3. interim milestones
  4. goals
  5. any known effort that is excluded as out of scope
80
Q

APF - translation of problems from business domain to analytics domain requires that all parties agree to __________

A

definitions and terms

81
Q

APF - Requirements should be these three things:

A
  1. unitary (no conjunctions such as and, but, or)
  2. positive
  3. testable
82
Q

APF - __________ is the act of breaking down a higher-level requirement to multiple lower-level requirements.

A

decomposition

83
Q

Methodology - Almost all analytical methods can be classified into one of these three categories

A
  1. Descriptive
  2. Predictive
  3. Prescriptive
84
Q

Methodology - Generally speaking, this type of model answers the question “what is the best action or outcome?”

A

prescriptive

85
Q

Methodology - three types of prescriptive techniques are:

A
  1. Optimization
  2. Simulation-Optimization
  3. Stochastic Optimization
86
Q

Methodology - 7 types of Optimization techniques:

A
  1. Linear programming
  2. Integer programming
  3. non-linear programming
  4. Mixed integer programming
  5. Network optimization
  6. Dynamic programming
  7. Metaheuristics
87
Q

Methodology - These types of methodologies include any forecasting models such as time-series models, moving averages, and auto-regression models. Answers the question “What could happen?”

A

predictive models

88
Q

Methodology - List 7 types of predictive models:

A
  1. Simulation
  2. Regression
  3. Statistical inferences
  4. Classification
  5. Clustering
  6. Artificial Intelligence
  7. Game Theory
89
Q

Methodology - List three types of simulation techniques:

A
  1. Discrete event
  2. Monte Carlo
  3. Agent-based modeling
90
Q

Methodology - List 4 types of statistical inference techniques:

A
  1. Confidence intervals
  2. Hypothesis testing
  3. Analysis of variance
  4. Design of experiments
91
Q

Methodology - Descriptive methodologies can be conveyed through these 2 methods:

A
  1. Charts and graphs

2. numerical presentations (mean, median, mode, etc)

92
Q

Methodology - These techniques answer the question “What happened?”

A

Descriptive

93
Q

Methodology - Prescriptive analytics evaluates and determines new ways to operate, targets business objectives, and balances __________.

A

constraints

94
Q

Methodology - What are the 7 primary factors that an analyst generally considers to select an appropriate methodology?

A
  1. Time
  2. Accuracy of the model
  3. Relevance of the methodology and scope of project
  4. Accuracy of the data
  5. Data availability and readiness
  6. Staff and resource availability
  7. Methodology popularity
95
Q

Methodology - ________ methods are most helpful when there is a need to pinpoint certain decisions to the level of quantifying the variables that enhance the performance under study.

A

Prescriptive

96
Q

Methodology - common methods - This type of method is often used to understand bottlenecks in systems, handles cases that cannot be handled in queueing theory, and is often used for multistage processes modeling with variations in their arrivals and service times and utilizing shared resources to perform multiple operations.

A

Discrete event simulation

97
Q

Methodology - common methods - This method is designed to identify the most efficient pathway to solution. i.e. it might identify the number of tellers needed to satisfy customers in a particular time frame such as no more than 10 minutes waiting

A

Queuing model

98
Q

Methodology - common methods - This method is used primarily to estimate dependent variable randomness out of a set of independent variable randomness. This is necessary when distributions of the input variables are not normally distributed and the relationship to estimate the dependent variable is not simple (i.e. additive). Use when Queuing model is not needed.

A

Monte Carlo simulation

99
Q

Methodology - common methods - This method is simulated as a collection of autonomous decision making entities that are used to discover emergent behavior that is hard to predict without simulation.

A

Agent-based modeling (ABM)

100
Q

Methodology - common methods - This is a simulation approach used to understand the interaction of a complex system over time.

A

System dynamics (SD)

101
Q

Methodology - common methods - This is the study of strategic decision-making processes through competition and collaboration

A

Game theory

102
Q

Methodology - common method (econ) - Discounted rate used in capital budgeting to compare returns on investment opportunities.

A

IRR - Internal rate of return

103
Q

Methodology - common method (econ) - Difference between present value of income vs. outgo

A

NPV - Net present value

104
Q

Methodology - value of a future event or item based on current value that is adjusted by some standard

A

FV - Future value

105
Q

Methodology - period of time after which an expenditure is fully amortized and income begins to accrue in excess of expsense

A

Payback period

106
Q

Methodology - common methods - A class of statistical methods used to map dependent variables with independent variables and understand the significance between the variables and their correlations.

A

Regression

107
Q

Methodology - common methods - method of model building that successively adds or deletes variables based on performance

A

Stepwise regression

108
Q

Methodology - common methods - a regression analysis often used to predict the outcome of categorical variables

A

logistic regression

109
Q

Methodology - common methods - What are two types of statistical inferences:

A
  1. Confidence intervals

2. Hypothesis testing

110
Q

Methodology - what are three types of AI models?

A
  1. Artificial neural networks
  2. Fuzzy logic
  3. Expert systems
111
Q

Methodology - common methods - What is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

A

Markov chains

112
Q

Methodology - ________ mapping requires more aggregate data compared to a discrete-event simulation model.

A

Value-stream

113
Q

Methodology - Lower level aggregation is more accurate and descriptive, but is harder to ______ and will certainly lead to more mistakes.

A

validate

114
Q

Methodology - Higher level aggregation usually provides _______ results that are easier to understand.

A

faster

115
Q

Methodology - The general rule of thumb is to model at the highest level of aggregation possible that will ensure a satisfactory level of _______

A

accuracy

116
Q

Methodology - It is often advisable to run scenarios on the “back of the envelope” often referred to as QnD

A

quick and dirty

117
Q

Methodology - products that specialize in visualization, optimization, simulation, data mining, and statistical are ________

A

software tools

118
Q

Methodology - After the model is developed, the _______ step refers to making certain that the model is built the way it was designed and meant to be.

A

verification

119
Q

Methodology - After the model is developed, the _______ step refers to making certain that the model is representing real-life to a certain level of accuracy.

A

validation

120
Q

Methodology - The help the testing process, it is advisable to divide data into these three portions:

A
  1. Building
  2. Testing
  3. Validation
121
Q

Model building - A logistic regression, a decision tree, and a neural network can all predict in _____ target.

A

binary - (this is NOT typically done a priori. Instead you might identify several types of models, fit them all, and select a champion)

122
Q

Model building - It should be possible to perform ______ in a real-time production environment where specialized analytics software might not be available.

A

Scoring

123
Q

Model building - A predictive model should always be selected using an honest assessment of the model on ______ data.

A

holdout

124
Q

Model building - A model that will be used to select the “top x%” from a sample should be assessed using a metric that evaluates the rank order of predicted values such as these three things:

A
  1. concordance
  2. discordance
  3. ROC/c-statistic
125
Q

Model building - validation assessment techniques can vary and include the following 3 types:

A
  1. data splitting
  2. k-fold cross validation
  3. leave-one-out cross validation
126
Q

Model building - Honest validation assessment - It is critical that the observations used to fit the model and estimate parameters are not observations that are ________ in the assessment.

A

scored

127
Q

Model building - honest assessment with data splitting on binary target - You must select a large sample of data for modeling, for a binary target a good practice is to ensure that you have at least _______ observations in the small of the two classes.

A

2000

128
Q

Model building - honest assessment - use stratified random sampling without replacement to create two data sets with appx. the same proportion of ______ target levels.

A

0 and 1

129
Q

Model building - Fit models and estimate parameters using the _______ data.

A

training

130
Q

Model building - using assessment statistic, score observations in the _______ data set.

A

validation

131
Q

Model building - If the model uses stop training, pruning, or model selection without stopping rules, then those selection should always be based on the ________ data performance

A

validation

132
Q

Model building - selection - you might select the champion based on a combination of model performance and ________

A

interpretability - (models like neural networks might not be selected because they are difficult to interpret, but might be used as a benchmark against which other models are compared)

133
Q

Model building - Segmentation through clustering, rule generation through market basket/association analysis, deriving links among nodes through social network analysis, measurement of latent variables through common factor analysis are all examples of:

A

unsupervised techniques

134
Q

Model building - Techniques for validating unsupervised analyses are not as straightforward and typically rely on the analysts _________

A

best judgement

135
Q

BPF - Popular way to frame business opportunity or problem is to obtain reliable info on the 5 Ws. What are they?

A
  1. Who are the stakeholders
  2. What problem are we trying to solve
  3. Where does the problem occur
  4. When does the problem occur
  5. Why does the problem occur
136
Q

BPF - Of the 5 Ws, which is the most critical to the long term success of the project

A

Who - stakeholders

137
Q

BPF - In determining if problem is amenable to analytics solution, most important question is can the organization accept and ______ the answer.

A

deploy

138
Q

BPF - If there is no feasible way forward, the ethical analyst will notify who?

A

stakeholders

139
Q

BPF - After initial analysis, it may be necessary to refine the problem statement to make it more accurate, more appropriate to the stakeholders, or _______

A

more amenable to available analytic tools/methods

140
Q

BPF - It will be necessary to define constraints. These constraints could be any of the three:

A
  1. analytical
  2. financial
  3. political
141
Q

BPF - If an optimization problem has a large number of constraints, it may need to be restated with fewer constraints and/or a less complex _________.

A

objective function

142
Q

BPF - List 4 types of potential constraints:

A
  1. Desired accuracy and repeatability
  2. Program cost
  3. Timeframe
  4. Number of stakeholders impacted
143
Q

BPF - After problem statement is set, you define business benefits which can be quantitative or qualitative. This is also known as the _______

A

business case

144
Q

overall - What are the 5 E’s that are the pillars of the Certified Analytics Professional?

A
  1. Ethics
  2. Education
  3. Experience
  4. Examination
  5. Effectiveness
145
Q

Model Building - what are the 4 overall objectives in the Model Building phase?

A
  1. Identify and build effective model structures
  2. Run and evaluate
  3. Calibrate models and data
  4. Integrate the models
146
Q

Deployment - what are the two methods used for deployment?

A
  1. CRISP-DM (cross industry standard process for data mining)
  2. DMAIC (6 sigma - define, measure, analyze, improve, control)
147
Q

Deployment - What are the four steps for CRISP-DM deployment?

A
  1. Planning deployment - your methods for integrating data mining discoveries into use
  2. Planning monitoring and maintenance
  3. Reporting final results
  4. Reviewing final results
148
Q

Deployment - After deployment, it is necessary to ensure your answer is still tied to the original question. However, discrepancies can creep in. It is common for business context to have changed, which can invalidate key ________?

A

Assumptions

149
Q

Deployment - For organizations to accept the results of the process, those results must be integral and acknowledged as having _______?

A

Integrity (not just what senior mgmt wants to hear)

150
Q

Deployment - What are the 2 key items to consider as a model becomes the basis for an organization taking action?

A
  1. Plan the deployment

2. Plan monitoring and maintenance

151
Q

Deployment - When surveying key stakeholders use of the model, pay attention to functional areas where the model is being ignored - this will tell you where key assumptions have been invalidated and use that as a way to ____________ the model.

A

Strengthen and update

152
Q

Life cycle mgmt - A good lifecycle process helps with the following 3 items:

A
  1. Keep the process orderly
  2. minimizes cost and efforts
  3. provides business users with clear roles
153
Q

Life cycle mgmt - An effective process requires defining the roles of the various departments involved and the _______ process that will be used to iron out differences and make decisions.

A

Governance

154
Q

Life cycle mgmt - For the model to be trusted it has to be _______.

A

Repeatable

155
Q

Life cycle mgmt - Documentation should include the following 6 items:

A
  1. Key assumptions made about the business context and analytics problem
  2. Data sources and schema
  3. Methods used to clean and harmonize the data
  4. Model approach and model review artifacts
  5. Documentation for any software code written
  6. Recommendations for future improvements to the model
156
Q

Life cycle mgmt - Evalution criteria should be created up front both in terms of the business results expected and the ______ and ______ expected from the model.

A

Accuracy and Confidence

157
Q

Life cycle mgmt - What are 5 useful model evaluation criteria?

A
  1. Value of the model in terms of the business
  2. Does the model discover/predict something that is new and useful?
  3. Is the model reliable across a wide range of data?
  4. Can a “lift” or “gain” graph be constructed to show how well the model is predicting?
  5. Check if the model’s predictions on unknown data vs. train/test data
158
Q

Life cycle mgmt - When the model quality starts to decay, it is time for the next step of _______ the model and rechecking its _______.

A

Recalibrating, assumptions

159
Q

Life cycle mgmt - The results of the model should be tracked over the long term because a model may degrade if either of these 2 things happen:

A
  1. input data changes

2. user requirements change

160
Q

Life cycle mgmt - If there has been a fundamental change in a key assumption or two, then the project needs to be….

A

revalidated against the business problem (to see if the overall approach is still valid)

161
Q

Life cycle mgmt - One of the keys to a successful analytics project or engagement is appropriate _______ for the users of the model and its results

A

training

162
Q

Life cycle mgmt - One way to demonstrate business benefits of a model is to compare how your organization is doing against industry _______ during the time period in question.

A

benchmarks