Booz Selection Flashcards

Question 1

Q

What is the key question for 1. Describe?

Answer

A

How do I develop an understanding of the content of my data?

Question 2

Q

What is the key question for 1. Describe | Processing?

Answer

A

How do I clean and separate my data?

Question 3

Q

What is the key question for 1. Describe | Processing | Filtering?

Answer

A

How do I identify data based on its absolute or relative values?

Question 4

Q

What is the key question for 1. Describe | Processing | Imputation?

Answer

A

How do I fill in missing values in my data?

Question 5

Q

What is the key question for 1. Describe | Processing | Dimensionality Reduction?

Answer

A

How do I reduce the number of dimensions in my data?

Question 6

Q

What is the key question for 1. Describe | Processing | Normalization & Transformation?

Answer

A

How do I reconcile duplication representations in the data?

Question 7

Q

What is the key question for 1. Describe | Processing | Feature Extraction?

Answer

A

Really depends on the domain of the information. Variety of methods.

Question 8

Q

For 1. Describe | Processing | Filtering, If you want to add or remove data based on its value, start with:

Answer

A

Relational algebra projection and selection

Question 9

Q

For 1. Describe | Processing | Filtering, If early results are uninformative and duplicative, start with:

Answer

A

Outlier removal, Exponential smoothing, Gaussian filter, Median filter

Question 10

Q

For 1. Describe | Processing | Imputation, If you want to generate values from other observations in your dataset, start with:

Answer

A

Random sampling, Markov Chain Monte Carlo (MC)

Question 11

Q

For 1. Describe | Processing | Imputation, If you want to generate values without using other observations in your dataset, start with:

Answer

A

Mean, Statistical distributions, Regression models

Question 12

Q

For 1. Describe | Processing | Dimensionality Reduction, If you need to determine whether there is multi-dimensional correlation, start with:

Answer

A

PCA and other factor analysis

Question 13

Q

For 1. Describe | Processing | Dimensionality Reduction, If you can represent individual observations by membership in a group, start with:

Answer

A

K-means clustering, Canopy clustering

Question 14

Q

For 1. Describe | Processing | Dimensionality Reduction, If you have unstructured text data, start with:

Answer

A

Term Frequency/Inverse Document Frequency (TF IDF)

Question 15

Q

For 1. Describe | Processing | Dimensionality Reduction, If you have a variable number of features but your algorithm requires a fixed number, start with:

Answer

A

Feature hashing

Question 16

Q

For 1. Describe | Processing | Dimensionality Reduction, If you are not sure which features are the most important, start with:

Answer

A

Wrapper methods, Sensitivity analysis

Question 17

Q

For 1. Describe | Processing | Dimensionality Reduction, If you need to facilitate understanding of the probability distribution of the space, start with:

Answer

A

Self organizing maps

Question 18

Q

For 1. Describe | Processing | Normalization & Transformation, If you suspect duplicate data elements, start with:

Answer

A

Deduplication

Question 19

Q

For 1. Describe | Processing | Normalization & Transformation, If you want your data to fall within a specified range, start with:

Answer

A

Normalization

Question 20

Q

For 1. Describe | Processing | Normalization & Transformation, If your data is stored in a binary format, start with:

Answer

A

Format Conversion

Question 21

Q

For 1. Describe | Processing | Normalization & Transformation, If you are operating in frequency space, start with:

Answer

A

Fast Fourier Transform (FFT), Discrete wavelet transform

Question 22

Q

For 1. Describe | Processing | Normalization & Transformation, If you are operating in Euclidian space, start with:

Answer

A

Coordinate transform

Question 23

Q

What is the key question for 1. Describe | Aggregation?

Answer

A

How do I collect and summarize my data?

Question 24

Q

For 1. Describe | Aggregation, If you are unfamiliar with the dataset, start with:

Answer

A

basic statistics: Count, Mean, Standard deviation, Range, Scatter Plots, Box plots

Question 25

Q

For 1. Describe | Aggregation, If your approach assumes the data follows a distribution, start with:

Answer

A

Distribution fitting

Question 26

Q

For 1. Describe | Aggregation, If you want to understand all the information available on an entity, start with:

Answer

A

“Baseball card” aggregation

Question 27

Q

What is the key question for 1. Describe | Enrichment?

Answer

A

How do I add new information to my data?

Question 28

Q

For 1. Describe | Enrichment, If you need to keep track of source information or other user-defined parameters, start with:

Answer

A

Annotation

Question 29

Q

For 1. Describe | Enrichment, If you often process certain data fields together or use one field to compute the value of another, start with:

Answer

A

Relational algebra rename, Feature addition (e.g., Geography, Technology, Weather)

Question 30

Q

What is the key question for 2. Discover?

Answer

A

What are the key relationships in the data?

Question 31

Q

What is the key question for 2. Discover | Clustering?

Answer

A

How do I segment the data to find natural groupings?

Question 32

Q

For 2. Discover | Clustering, If you want an ordered set of clusters with variable precision, start with:

Answer

A

Hierarchical

Question 33

Q

For 2. Discover | Clustering, ? If you have an unknown number of clusters, start with:

Answer

A

X-means, Canopy, Apriori

Question 34

Q

For 2. Discover | Clustering, If you have text data, start with:

Answer

A

Topic modeling

Question 35

Q

For 2. Discover | Clustering, If you have non-elliptical clusters, start with:

Answer

A

Fractal, DB Scan

Question 36

Q

For 2. Discover | Clustering, If you want soft membership in the clusters, start with:

Answer

A

Gaussian mixture models

Question 37

Q

For 2. Discover | Clustering, If you have an known number of clusters, start with:

Question 38

Q

What is the key question for 2. Discover | Regression?

Answer

A

How do I determine which variables may be important?

Question 39

Q

For 2. Discover | Regression, If your data has unknown structure, start with:

Answer

A

Tree-based methods

Question 40

Q

For 2. Discover | Regression, If statistical measures of importance are needed, start with:

Answer

A

Generalized linear models

Question 41

Q

For 2. Discover | Regression, If statistical measures of importance are not needed, start with:

Answer

A

Regression with shrinkage (e.g., LASSO, Elastic net), Stepwise regression

Question 42

Q

What is the key question for 2. Discover | Hypothesis Testing?

Answer

A

How do I test ideas?

Question 43

Q

For 2. Discover | Hypothesis Testing, If you want to compare two groups

Question 44

Q

For 2. Discover | Hypothesis Testing, If you want to compare multiple groups

Question 45

Q

What is the key question for 3. Predict?

Answer

A

What are the likely future outcomes?

Question 46

Q

What is the key question for 3. Predict | Classification?

Answer

A

How do I predict group membership?

Question 47

Q

For 3. Predict | Classification, If you have known dependent relationships between variables

Answer

A

Bayesian network

Question 48

Q

For 3. Predict | Classification, If you are unsure of feature importance, start with:

Answer

A

Neural nets, Random forests, Deep learning

Question 49

Q

For 3. Predict | Classification, If you require a highly transparent model, start with:

Answer

A

Decision trees

Question 50

Q

For 3. Predict | Classification, If you have less than 20 data dimensions, start with:

Answer

A

K-nearest neighbors

Question 51

Q

For 3. Predict | Classification, If you have a large dataset with an unknown classification signal, start with:

Answer

A

Naive bayes

Question 52

Q

For 3. Predict | Classification, If you want to estimate an unobservable state based on observable variables, start with:

Answer

A

Hidden markov model

Question 53

Q

For 3. Predict | Classification, If you don’t know where else to begin, start with:

Answer

A

Support vector machines (SVM), Random forests

Question 54

Q

What is the key question for 3. Predict | Regression?

Answer

A

How do I predict a future value?

Question 55

Q

For 3. Predict | Regression, If the data structure is unknown, start with:

Answer

A

Tree-based methods

Question 56

Q

For 3. Predict | Regression, If you require a highly transparent model, start with:

Answer

A

Generalized linear models

Question 57

Q

For 3. Predict | Regression, If you have less than 20 data dimensions, start with:

Answer

A

K-nearest neighbors

Question 58

Q

What is the key question for 3. Predict | Recommendation?

Answer

A

How do I predict relevant conditions?

Question 59

Q

For 3. Predict | Recommendation, If you only have knowledge of how people interact with items, start with:

Answer

A

Collaborative filtering

Question 60

Q

For 3. Predict | Recommendation, If you have a feature vector of item characteristics, start with:

Answer

A

Content-based methods

Question 61

Q

For 3. Predict | Recommendation, If you only have knowledge of how items are connected to one another, start with:

Answer

A

Graph-based methods

Question 62

Q

What is the key question for 4. Advise?

Answer

A

What course of action should I take?

Question 63

Q

What is the key question for 4. Advise | Logical Reasoning?

Answer

A

How do I sort through different evidence?

Question 64

Q

For 4. Advise | Logical Reasoning, If you have expert knowledge to capture

Answer

A

Expert systems

Answer 62

A

Logical reasoning

Answer 63

A

How do I identify the best course of action when my objective can be expressed as a utility function?

Answer 64

A

Stochastic search

Answer 65

A

Genetic algorithms, Simulated annealing, Gradient search

Answer 66

A

Linear programming, Integer programming, Non-linear programming

Answer 67

A

Active learning

Answer 68

A

Ensemble learning

Answer 69

A

How do I characterize a system that does not have a closed-form representation?

Answer 70

A

Discrete event simulation (DES)

Answer 71

A

Markov models

Answer 72

A

Agent-based simulation

Answer 73

A

Monte Carlo simulation

Answer 74

A

Systems dynamics

Answer 75

A

Activity-based simulation

Answer 76

A

ODES, PDES

Answer 77

A

Fuzzy logic